Data comes from Chung-Hua University, China. Input variables measured were cement, slag, fly ash, water, super plasticizer(SP), coarse aggregate and fine aggregates. Input variables were measured in kg/m3 of concrete. The output variable is compressive strength after 28 days, measured in MPa. Results show that water is the strongest influencer of compressive strength. Slag is the weakest influencer of compressive strength. Super plasticizer had little to no impact and was completely removed from the model. The compressive strength was determined to follow below equation:
Compressive strength
= 0.04970*(Cement) - 0.04519*(Slag) + 0.03859*(Fly ash) - 0.27055*(Water) - 0.06986*(Coarse Aggregate) - 0.05358*(Fine Aggregate)
Normalized Histogram of Residuals |
The correlation coefficient shows a strong fit (R2 = 0.8962) and the probability values are low for each variable. The normalized histogram shows a normal distribution of residuals. The distribution of residuals strongly support the linear model and removes the risk of systematic error.
The problem was approached by creating a multivariable linear regression of all the input variables:
Initial Regression |
A high correlation coefficient exists. Some of the probability values, however, do not show strong evidence against the null hypothesis - notably slag, fine aggregates and SP. Fortunately, the step() function only selects feasible variables.
Final Regression |
The coefficients are listed in the column. The full coding are as follows:
#Multivariable regression of Concrete Compression Test
#By Matthew Mano (matthewm3109@gmail.com)
#import data
concrete<-read.csv("slump.csv")
#remove incomplete tests
concretec<-concrete[complete.cases(slump),]
#generate linear model
concreter<-lm(CS~ Cement+Slag+Fly.ash+Water+SP+CA+FA, data=concretec)
#get information of initial model
summary(concreter)
#remove unnecessary variables
concreter2=step(concreter)
#get information of secondary model
summary(concreter2)
r<-residuals(concreter2)
#graphing residuals in histogram
hist(r, prob=TRUE,main="Normalized Histograms of Residuals",xlab="Standard Deviations")
#adding reference normal curve
curve(dnorm(x, mean=mean(r), sd=sd(r)), add=TRUE, col="red")
Link to code & csv: http://bit.ly/1QRzyjr
Link to original data: https://archive.ics.uci.edu/ml/datasets/Concrete+Slump+Test