Linear regression in R with codes: Analysis and interpretation

R is a collaborative project with many contributors.
Type ‘contributors()’ for more information and
‘citation()’ on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from D:/INTRO_DATA SCI_ R/0.0 Course content/0.02 DATA SETS/Longley linear regression/Longley_linear regression/.RData]

data(longley)
str(longley)
‘data.frame’: 16 obs. of 7 variables:
$ GNP.deflator: num 83 88.5 88.2 89.5 96.2 …
$ GNP : num 234 259 258 285 329 …
$ Unemployed : num 236 232 368 335 210 …
$ Armed.Forces: num 159 146 162 165 310 …
$ Population : num 108 109 110 111 112 …
$ Year : int 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 …
$ Employed : num 60.3 61.1 60.2 61.2 63.2 …
View(longley)
set.seed(130)

pd <- sample(2,nrow(longley),replace=TRUE,prob = c(0.7,0.3))
train <- longley[pd==1,] # means All columns
View(train)
test <- longley[pd==2,]
View(test)

linear regression

Employ_model <- lm(Employed ~ GNP, data = train)
summary(Employ_model)

Call:
lm(formula = Employed ~ GNP, data = train)

Residuals:
Min 1Q Median 3Q Max
-0.66805 -0.47824 0.07029 0.39669 1.04992

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.678285 0.678136 76.21 2.47e-16 ***

GNP 0.034873 0.001648 21.16 2.91e-10 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5707 on 11 degrees of freedom
Multiple R-squared: 0.976, Adjusted R-squared: 0.9739
F-statistic: 448 on 1 and 11 DF, p-value: 2.908e-10

Linear regression with multiple variables

Employ_model1 <- lm(Employed ~ GNP + Armed.Forces, data = train)
summary(Employ_model)

Call:
lm(formula = Employed ~ GNP, data = train)

Residuals:
Min 1Q Median 3Q Max
-0.66805 -0.47824 0.07029 0.39669 1.04992

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.678285 0.678136 76.21 2.47e-16 ***

GNP 0.034873 0.001648 21.16 2.91e-10 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5707 on 11 degrees of freedom
Multiple R-squared: 0.976, Adjusted R-squared: 0.9739
F-statistic: 448 on 1 and 11 DF, p-value: 2.908e-10

Prediction

this gives us 3 values the best fit, lower range and upper range. If we do not give the

3rd parameter i.e. predict(model, newdata=dataset), the output will be the best fil line only.

predict(Employ_model1, newdata=test, interval = “prediction”)
fit lwr upr
1947 59.74355 58.21843 61.26867
1952 64.01026 62.45998 65.56055
1956 66.33905 64.97929 67.69881

 

Leave a Reply

Your email address will not be published. Required fields are marked *