2021-09-27 21:40
采纳率: 0%
浏览 444


In this assignment, we will perform several linear regression analyses on the
Boston dataset. Load the dataset as follows (requires installing the scikit-learn
library) and read the description:

from sklearn import datasets
boston = datasets . load_boston ()
print ( boston . DESCR )
The dataset contains 506 observations of 13 features, with the target value being
the median value of homes. Load the observations and targets into separate
numpy arrays:
data = boston . data
target = boston . target
We will use the first 450 observations as training data and the remaining 56 as
testing data:
X_train = data [:450 ,]
y_train = target [:450 ,]
X_test = data [450:,]
y_test = target [450:,]

Note: Please use torch to complete the problems in this assignment.

Problem 1
(10 points) Explore some of the relationships between the features of the data.
Which features appear to have the strongest relationship with the target? Which
features have the weakest relationship? Use a few plots to describe the data and
these relationships.

Problem 2
(20 points) Perform a multivariate linear regression on the Boston dataset without regularization. Report the coefficients of your trained model. Report the
following testing error metrics: RMSE, MAPE, MAE, MBE, R2
. Use plots to
show how your model performs.

Problem 3
(20 points) Perform a linear regression on the Boston dataset with l2-norm
regularization (i.e., ridge regression). Report the results as before.

Problem 4
(20 points) Perform a linear regression on the Boston datset with l1-norm regularization (i.e., lasso). Report the results as before. Compare the performances
of the three models from Problems 2, 3, and 4 and comment on the results.

Problem 5

(30 points) Perform a non-regularized linear regression on the Boston dataset
using 5-fold cross validation. Report the results as before. Does the non-regularized model perform better with or without cross-validation? Does this
agree with your expectations?
Note: If the size of the dataset is not evenly divisible by the number of folds
k, you may need to either (1) choose a different value for k or (2) exclude some
observations from the dataset in order to use np.split() as we discussed in

  • 收藏

1条回答 默认 最新

  • 小木木木木木mu 2021-09-28 11:12


    打赏 评论