A Portfolio of my Work with Analytics, Finance and Technology: January 2016

Sunday, 24 January 2016

Use of Bar Graphs in R Programming to Investigate Ethnicities in Toronto (Part 2)

The purpose of this post is to learn to make bar graphs in R. As practice, I used the public data of the demographics within Toronto's many neighborhoods. R found four neighborhoods with the greatest percent of population speaking Chinese, Tamil and Tagalog. Results show that visible minorities tend to live close to each other, just outside of the downtown core of a major city.

Languages in Toronto by Neighborhood

Cross referencing with Google Earth shows that, with regards to Chinese speakers, the neighborhoods were in the Northeast end of Toronto. For Tamil speakers, prominent neighborhoods were in the East end. For Tagalog, prominent neighborhoods were in the North end. Public data came from a 2011 survey conducted by Wellbeing Toronto, a program with the City of Toronto. I attached a link to all Excel and coding files.

For the coding, I had to learn to create bar graphs using the barplot function, create columns in a dataset and order values by size. My coding is as follows:

#Language in Toronto

#By Matthew Mano

lang<-read.csv("torontoLanguage.csv", header=T)

Import the file

lang$PCh<-(lang$Ch/lang$Tot)*100

lang$PTl<-(lang$Tam/lang$Tot)*100

lang$PTg<-(lang$Tag/lang$Tot)*100

Create three additional columns with the percent population

lang.newch<-lang[order(-lang$PCh),c(1,6)]

lang.newtl<-lang[order(-lang$PTl),c(1,7)]

lang.newtg<-lang[order(-lang$PTg),c(1,8)]

Order the percent population by decreasing size

par(mfrow=c(3,1))

Display 3 charts in 1 window

barplot(height=lang.newch$PCh[1:4],names.arg=lang.newch$Ne[1:4],ylab="Percent out of Total Population (%)",border=NA)

Create a barplot of the first four neighborhoods for Chinese speakers. Did the same for the other two languages.

title(main="Toronto Neighborhoods with the Greatest Percent of Chinese Speakers")

barplot(height=lang.newtl$PTl[1:4],names.arg=lang.newtl$Ne[1:4],ylab="Percent out of Total Population (%)",border=NA)

title(main="Toronto Neighborhoods with the Greatest Percent of Tamil Speakers")

barplot(height=lang.newtg$PTg[1:4],names.arg=lang.newtg$Ne[1:4],ylab="Percent out of Total Population (%)",border=NA)

title(main="Toronto Neighborhoods with the Greatest Percent of Tagalog Speakers",)

Link for downloads: http://bit.ly/20xMtvZ

Saturday, 23 January 2016

Use of Bar Graphs in R Programming to Investigate Ethnicities in Toronto

The goal for my next project is to look at the major ethnicities in Toronto - by neighborhood. I'll measure ethnicity by looking at languages spoken at home. Ignoring English, some of the major languages spoken in Toronto are Cantonese, Tagalog and Tamil. The purpose is to identify neighborhoods with the greatest percent of foreign speakers out of total population. The dataset I will be using comes from Wellbeing Toronto, a program run by City of Toronto. The CSV and Excel files are attached.

Download Link for CSV files: http://bit.ly/20xMtvZ

Sunday, 17 January 2016

Basic Line Graph Coding in R

Using public information available from the Toronto Police, I analyzed vehicle collisions in Toronto from 1998-2012. I created a basic line graph in R to visualize the data.

I graphed the Percent of Fatal Collisions because I believe that fatal collisions are a stronger indicator of danger than total number of collisions. Collisions are unavoidable. But if drivers follow the law and, more importantly, our car engineers develop safer cars, there should be no fatalities. Looking at the above graph shows a decrease in both number of collisions and the percent of fatal collisions. The fall in fatal collisions could be because of greater strides in safety engineering, greater penalties for reckless driving and more public awareness for better drivers.

Here is my coding:

#Vehicle Collisions in Toronto
#By Matthew Mano (matthewm3109@gmail.com)
collisions=read.csv("TorontoCollisions.csv", header=T) #upload csv file year=collisions$Year #Identify specific columns
total=collisions$C
fatal=collisions$Fpercent=(fatal/total)*100 #Find percent of Fatal Collisions to Total Number of Collisions
plot(year,total, type="l", xlab="Year", ylab="Number of Collisions") #Use type="l" to create a line plot
par(new=TRUE) #keep working on the original graph
plot(year,percent,type="l",xaxt="n",yaxt="n",xlab="",ylab="",col="red")mtext("Percent that were Fatal Collisions (%)",side=4,line=3)axis(4) #Use right hand axix to display percent
legend("topright",col=c("black","red"),lty=1,legend=c("Number of Collisions","Percent that were Fatal Collisions")) #Add legend title(main="Vehicle Collisions in Toronto") #Add title

The trickiest part was to generate a graph with two y-axis. I plotted the percent normally, then I added two special lines of code:

mtext("Percent that were Fatal Collisions (%)",side=4,line=3)

Added the axis name to the margins.

axis(4) #Use right hand axis to display percent

Used the 4th 'side' of the graph - and the desired axis - to display gridlines

I added link to download my full coding. Please let me know if I can improve this or add anything else. Let me know if you have any questions.

Link to download code & download CSV and Excel file: http://bit.ly/1oBVFnj

Saturday, 16 January 2016

Basic Line Graphs

Hey everyone,

The goal today is to familiarize myself with basic graphing functions in R. As a proud Torontonian, I'll be using public data pulled from the Toronto Police Service. I'll be analyzing the number of traffic collisions from 1998-2012.

Link to public documents: http://www.torontopolice.on.ca/publications/#reports
Link for csv/excel download: https://drive.google.com/folderview?id=0B4ylNmE1KdhQR05mcm5Cc0MzOXM&usp=sharing

I'll be uploading my code soon. If you have any questions or any ideas for improvement, shoot me an email at matthewm3109@gmail.com.

Thanks,

Matthew Mano