Linear regression is one of the fundamental models for data mining. The
model describes a linear relationship between a number of numerical
attributes x= (x1, x2, …, xn) and a predicted variable y in the form of
y = ‘x+b,
where a ∈ Rn and b ∈ R are parameters to be determined by training. The
training process takes a set of K training examples
(X, Y) = {(x^1, y1^), (x^2, y^2), …, (x^K, y^K)},
where each xi ∈ Rn is a vector of attributes. The parameters a and b are
determined by minimizing the mean squared error (MSE):
MSE = ∑ Ki=1 [yi – (a‘xi+b)]2
Build a linear regression for the following program effort data. Each
training sample consists of an index of social setting, an index of family
planning effort, and the percentage change in the crude birth rate (CBR)
between 1965 and 1975, for 20 countries in Latin America. Here, we want to
predict change (y) using setting (x1) and effort (x2). Therefore, we have that
n = 2 and K =20.
Write an AMPL model for the optimization problem, and submit it to NEOS
to obtain the optimal parameters a and b in the linear regression model. You
need to choose a suitable solver in NEOS. You cannot use any other existing
software for linear regression. Submit the following:
1) The AMPL model file (and data file, if any)
2) A print–out of the solution from your NEOS solver.
3) A table listing the model error yi–(a‘xi+b) for all the 20 countries.
4) Discuss the insights you gained from this analysis, such as: How does
each attribute influence the change? Which attribute seems to have stronger
correlation with the change? Does the linear regression model seem accurate
to you?


0 comments