In this assignment, we will perform several linear regression analyses on the
Boston dataset. Load the dataset as follows (requires installing the scikit-learn
library) and read the description:
from sklearn import datasets boston = datasets . load_boston () print ( boston . DESCR ) The dataset contains 506 observations of 13 features, with the target value being the median value of homes. Load the observations and targets into separate numpy arrays: data = boston . data target = boston . target We will use the first 450 observations as training data and the remaining 56 as testing data: X_train = data [:450 ,] y_train = target [:450 ,] X_test = data [450:,] y_test = target [450:,]
Note: Please use torch to complete the problems in this assignment.
(10 points) Explore some of the relationships between the features of the data.
Which features appear to have the strongest relationship with the target? Which
features have the weakest relationship? Use a few plots to describe the data and
(20 points) Perform a multivariate linear regression on the Boston dataset without regularization. Report the coefficients of your trained model. Report the
following testing error metrics: RMSE, MAPE, MAE, MBE, R2
. Use plots to
show how your model performs.
(20 points) Perform a linear regression on the Boston dataset with l2-norm
regularization (i.e., ridge regression). Report the results as before.
(20 points) Perform a linear regression on the Boston datset with l1-norm regularization (i.e., lasso). Report the results as before. Compare the performances
of the three models from Problems 2, 3, and 4 and comment on the results.
(30 points) Perform a non-regularized linear regression on the Boston dataset using 5-fold cross validation. Report the results as before. Does the non-regularized model perform better with or without cross-validation? Does this agree with your expectations? Note: If the size of the dataset is not evenly divisible by the number of folds k, you may need to either (1) choose a different value for k or (2) exclude some observations from the dataset in order to use np.split() as we discussed in class.