Tuesday, 20 September 2016

Using R to Model the Spread of Diseases

Using data from a class assignment, I attempted to use R to model the spread of diseases in a fixed population. I was given sample data assuming an unlimited population and asked to predict the number of infected for every round within a group of 90 people. My simulation showed that within 20 interactions, 99% of the population will be infected (Figure 1).
Figure 1, Spread of Test Disease within Fixed Population
I extracted data from Excel (Table 1) into R for better flexibility.

Table 1, Information on Infected 
From the given data, the first step would be to find the transmission parameter. The transmission parameter describes how quick a disease would spread. The greater the parameter, the faster it spreads. The parameter can be found via the below equation: 

n(t+1)=1+bn(t) 
Where b is the transmission parameter, n(t) is the present infected population value and n(t+1) is the infected population in the next round, or interaction. 

Using R, I iterated between different values of b until I found the parameter that gave the least squares regression, or the minimum squared residual, with data in Table 1. In order to understand least square regression better, I stated it explicitly instead of simply using R's library. 

I plotted least squares with respect the the transmission parameter to illustrate this (Figure 2).
Figure 2, Least Square Regression for Different Transmission Parameters
The minimum squared residual occurs at 0.366. I applied the parameter into the Logistic Growth equation that took into consideration the maximum infected population (N). 

n(t+1)= (-b/N * n(t) + 1 + b)*n(t)

Plotting the number of possibly infected, at this stage, is easy. I learned that diseases spreads fastest during the middle stages. As it reaches maximum population, the growth rate slows. The next stage would be to consider susceptibility and recovery rate.  

The coding is given below:

q1_data = read.csv("a1q1.csv", header = T) #read data 
length = length(q1_data$Cumulative.Cases) #take length of data 
y=q1_data$Cumulative.Cases #Base vector of the cumulative cases 
z=y #create a copy fo the cumulative case vector 
z=c(z,NA) #add NA value to the copy vector 
rss=0 #initialize variable to hold residual sum square value 

m = seq(0, 2,0.001) #range of B values tested 
rss_list = vector() #create vector to hold the rss from different b values
for (b in m)
{
  for(a in 1:6)
  {
    z[a+1] = z[a]*(1+b)
    rss = (z[a]-y[a])^2 + rss
  }
  rss_list=c(rss_list,rss)
  rss=0 #reset rss value 
}
plot(m,rss_list, type="l", xlab="Transmission Parameter", ylab="Squared Residual with Observed Data",ylim=c(0,10000),xlim=c(0,0.8))

m[which.min(rss_list)] #find b value that corresponds to the minimum 
title("Graphical Representation of Least Square Regression")