ML之LassoR&RidgeR:基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测
ML之LassoR&RidgeR:基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测
相关文章
ML之LassoR&RidgeR:基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测
ML之LassoR&RidgeR:基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测实现
基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测
设计思路
输出结果
.. _diabetes_dataset:
Diabetes dataset
----------------
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
**Data Set Characteristics:**
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, T-Cells (a type of white blood cells)
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, thyroid stimulating hormone
- s5 ltg, lamotrigine
- s6 glu, blood sugar level
Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
age sex bmi bp ... s4 s5 s6 target
0 0.038076 0.050680 0.061696 0.021872 ... -0.002592 0.019908 -0.017646 151.0
1 -0.001882 -0.044642 -0.051474 -0.026328 ... -0.039493 -0.068330 -0.092204 75.0
2 0.085299 0.050680 0.044451 -0.005671 ... -0.002592 0.002864 -0.025930 141.0
3 -0.089063 -0.044642 -0.011595 -0.036656 ... 0.034309 0.022692 -0.009362 206.0
4 0.005383 -0.044642 -0.036385 0.021872 ... -0.002592 -0.031991 -0.046641 135.0
[5 rows x 11 columns]
alphas: 50 [1.00000000e-03 1.16779862e-03 1.36375363e-03 1.59258961e-03
1.85982395e-03 2.17189985e-03 2.53634166e-03 2.96193630e-03
3.45894513e-03 4.03935136e-03 4.71714896e-03 5.50868007e-03
6.43302900e-03 7.51248241e-03 8.77306662e-03 1.02451751e-02
1.19643014e-02 1.39718947e-02 1.63163594e-02 1.90542221e-02
2.22514943e-02 2.59852645e-02 3.03455561e-02 3.54374986e-02
4.13838621e-02 4.83280172e-02 5.64373920e-02 6.59075087e-02
7.69666979e-02 8.98816039e-02 1.04963613e-01 1.22576363e-01
1.43144508e-01 1.67163960e-01 1.95213842e-01 2.27970456e-01
2.66223585e-01 3.10895536e-01 3.63063379e-01 4.23984915e-01
4.95129000e-01 5.78210965e-01 6.75233969e-01 7.88537299e-01
9.20852773e-01 1.07537060e+00 1.25581631e+00 1.46654056e+00
1.71262404e+00 2.00000000e+00]
{'alpha': 0.07696669794067007}
0.472 (+/-0.177) for {'alpha': 0.0010000000000000002}
0.472 (+/-0.177) for {'alpha': 0.0011677986237376523}
0.472 (+/-0.177) for {'alpha': 0.0013637536256035543}
0.472 (+/-0.177) for {'alpha': 0.0015925896070970653}
0.472 (+/-0.177) for {'alpha': 0.0018598239513468405}
0.472 (+/-0.176) for {'alpha': 0.002171899850777162}
0.472 (+/-0.176) for {'alpha': 0.0025363416566335814}
0.472 (+/-0.176) for {'alpha': 0.002961936295945173}
0.472 (+/-0.176) for {'alpha': 0.0034589451300033745}
0.472 (+/-0.176) for {'alpha': 0.004039351362401994}
0.472 (+/-0.176) for {'alpha': 0.004717148961805858}
0.472 (+/-0.176) for {'alpha': 0.0055086800655623795}
0.472 (+/-0.176) for {'alpha': 0.00643302899917478}
0.472 (+/-0.176) for {'alpha': 0.007512482411700719}
0.471 (+/-0.175) for {'alpha': 0.008773066621237415}
0.471 (+/-0.174) for {'alpha': 0.010245175126239786}
0.471 (+/-0.174) for {'alpha': 0.011964301412374057}
0.472 (+/-0.173) for {'alpha': 0.013971894723352857}
0.472 (+/-0.173) for {'alpha': 0.016316359428938842}
0.473 (+/-0.172) for {'alpha': 0.01905422208552364}
0.473 (+/-0.171) for {'alpha': 0.02225149432786609}
0.474 (+/-0.170) for {'alpha': 0.025985264452188187}
0.474 (+/-0.169) for {'alpha': 0.03034555606472431}
0.475 (+/-0.167) for {'alpha': 0.03543749860893881}
0.476 (+/-0.166) for {'alpha': 0.04138386210422369}
0.477 (+/-0.164) for {'alpha': 0.04832801721026122}
0.477 (+/-0.162) for {'alpha': 0.05643739198611263}
0.478 (+/-0.160) for {'alpha': 0.06590750868872472}
0.478 (+/-0.157) for {'alpha': 0.07696669794067007}
0.477 (+/-0.154) for {'alpha': 0.08988160392874607}
0.476 (+/-0.151) for {'alpha': 0.10496361336732249}
0.475 (+/-0.148) for {'alpha': 0.12257636323289021}
0.472 (+/-0.145) for {'alpha': 0.1431445082861357}
0.469 (+/-0.143) for {'alpha': 0.1671639597721522}
0.466 (+/-0.139) for {'alpha': 0.19521384216045554}
0.462 (+/-0.134) for {'alpha': 0.22797045620951942}
0.455 (+/-0.129) for {'alpha': 0.2662235850143214}
0.447 (+/-0.126) for {'alpha': 0.31089553618622834}
0.441 (+/-0.122) for {'alpha': 0.3630633792844568}
0.434 (+/-0.117) for {'alpha': 0.42398491465793015}
0.425 (+/-0.112) for {'alpha': 0.49512899982305664}
0.412 (+/-0.107) for {'alpha': 0.5782109645659657}
0.395 (+/-0.105) for {'alpha': 0.675233968650155}
0.374 (+/-0.103) for {'alpha': 0.7885372992905638}
0.348 (+/-0.102) for {'alpha': 0.9208527728773261}
0.314 (+/-0.095) for {'alpha': 1.075370600831142}
0.272 (+/-0.089) for {'alpha': 1.2558163076585396}
0.212 (+/-0.084) for {'alpha': 1.466540555750942}
0.129 (+/-0.087) for {'alpha': 1.7126240426614014}
0.018 (+/-0.098) for {'alpha': 2.0}
m_log_alphas: 100 [-0.77418297 -0.70440767 -0.63463236 -0.56485706 -0.49508175 -0.42530644
-0.35553114 -0.28575583 -0.21598053 -0.14620522 -0.07642991 -0.00665461
0.0631207 0.132896 0.20267131 0.27244662 0.34222192 0.41199723
0.48177253 0.55154784 0.62132314 0.69109845 0.76087376 0.83064906
0.90042437 0.97019967 1.03997498 1.10975029 1.17952559 1.2493009
1.3190762 1.38885151 1.45862681 1.52840212 1.59817743 1.66795273
1.73772804 1.80750334 1.87727865 1.94705396 2.01682926 2.08660457
2.15637987 2.22615518 2.29593048 2.36570579 2.4354811 2.5052564
2.57503171 2.64480701 2.71458232 2.78435763 2.85413293 2.92390824
2.99368354 3.06345885 3.13323415 3.20300946 3.27278477 3.34256007
3.41233538 3.48211068 3.55188599 3.6216613 3.6914366 3.76121191
3.83098721 3.90076252 3.97053783 4.04031313 4.11008844 4.17986374
4.24963905 4.31941435 4.38918966 4.45896497 4.52874027 4.59851558
4.66829088 4.73806619 4.8078415 4.8776168 4.94739211 5.01716741
5.08694272 5.15671802 5.22649333 5.29626864 5.36604394 5.43581925
5.50559455 5.57536986 5.64514517 5.71492047 5.78469578 5.85447108
5.92424639 5.99402169 6.063797 6.13357231]
交叉验证选择的alpha: 0.06176875494949271
(100, 10) (100,)
alphas: 50 [1.00000000e-04 1.22398508e-04 1.49813947e-04 1.83370035e-04
2.24442186e-04 2.74713886e-04 3.36245696e-04 4.11559714e-04
5.03742947e-04 6.16573850e-04 7.54677190e-04 9.23713617e-04
1.13061168e-03 1.38385182e-03 1.69381398e-03 2.07320303e-03
2.53756957e-03 3.10594728e-03 3.80163312e-03 4.65314220e-03
5.69537661e-03 6.97105597e-03 8.53246847e-03 1.04436141e-02
1.27828277e-02 1.56459904e-02 1.91504587e-02 2.34398757e-02
2.86900580e-02 3.51162028e-02 4.29817081e-02 5.26089693e-02
6.43925932e-02 7.88155731e-02 9.64690852e-02 1.18076721e-01
1.44524144e-01 1.76895395e-01 2.16517323e-01 2.65013972e-01
3.24373147e-01 3.97027891e-01 4.85956213e-01 5.94803152e-01
7.28030181e-01 8.91098077e-01 1.09069075e+00 1.33498920e+00
1.63400685e+00 2.00000000e+00]
m_log_alphas: 50 [ 9.21034037 9.00822838 8.80611639 8.6040044 8.40189241 8.19978042
7.99766843 7.79555644 7.59344445 7.39133245 7.18922046 6.98710847
6.78499648 6.58288449 6.3807725 6.17866051 5.97654852 5.77443653
5.57232454 5.37021255 5.16810055 4.96598856 4.76387657 4.56176458
4.35965259 4.1575406 3.95542861 3.75331662 3.55120463 3.34909264
3.14698065 2.94486866 2.74275666 2.54064467 2.33853268 2.13642069
1.9343087 1.73219671 1.53008472 1.32797273 1.12586074 0.92374875
0.72163676 0.51952476 0.31741277 0.11530078 -0.08681121 -0.2889232
-0.49103519 -0.69314718]
交叉验证选择的alpha: 0.0046531422008170295
核心代码
class Ridge Found at: sklearn.linear_model._ridge
class Ridge(MultiOutputMixin, RegressorMixin, _BaseRidge):
"""Linear least squares with l2 regularization.
Minimizes the objective function::
||y - Xw||^2_2 + alpha * ||w||^2_2
This model solves a regression model where the loss function is
the linear least squares function and regularization is given by
the l2-norm. Also known as Ridge Regression or Tikhonov regularization.
This estimator has built-in support for multi-variate regression
(i.e., when y is a 2d-array of shape (n_samples, n_targets)).
Read more in the :ref:`User Guide <ridge_regression>`.
Parameters
----------
alpha : {float, ndarray of shape (n_targets,)}, default=1.0
Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to ``1 / (2C)`` in other linear models such as
:class:`~sklearn.linear_model.LogisticRegression` or
:class:`sklearn.svm.LinearSVC`. If an array is passed, penalties are
assumed to be specific to the targets. Hence they must correspond in
number.
fit_intercept : bool, default=True
Whether to fit the intercept for this model. If set
to false, no intercept will be used in calculations
(i.e. ``X`` and ``y`` are expected to be centered).
normalize : bool, default=False
This parameter is ignored when ``fit_intercept`` is set to False.
If True, the regressors X will be normalized before regression by
subtracting the mean and dividing by the l2-norm.
If you wish to standardize, please use
:class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.
copy_X : bool, default=True
If True, X will be copied; else, it may be overwritten.
max_iter : int, default=None
Maximum number of iterations for conjugate gradient solver.
For 'sparse_cg' and 'lsqr' solvers, the default value is determined
by scipy.sparse.linalg. For 'sag' solver, the default value is 1000.
tol : float, default=1e-3
Precision of the solution.
solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}, default='auto'
Solver to use in the computational routines:
- 'auto' chooses the solver automatically based on the type of data.
- 'svd' uses a Singular Value Decomposition of X to compute the Ridge
coefficients. More stable for singular matrices than 'cholesky'.
- 'cholesky' uses the standard scipy.linalg.solve function to
obtain a closed-form solution.
- 'sparse_cg' uses the conjugate gradient solver as found in
scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
more appropriate than 'cholesky' for large-scale data
(possibility to set `tol` and `max_iter`).
- 'lsqr' uses the dedicated regularized least-squares routine
scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative
procedure.
- 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
its improved, unbiased version named SAGA. Both methods also use an
iterative procedure, and are often faster than other solvers when
both n_samples and n_features are large. Note that 'sag' and
'saga' fast convergence is only guaranteed on features with
approximately the same scale. You can preprocess the data with a
scaler from sklearn.preprocessing.
All last five solvers support both dense and sparse data. However, only
'sag' and 'sparse_cg' supports sparse input when `fit_intercept` is
True.
.. versionadded:: 0.17
Stochastic Average Gradient descent solver.
.. versionadded:: 0.19
SAGA solver.
random_state : int, RandomState instance, default=None
Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
See :term:`Glossary <random_state>` for details.
.. versionadded:: 0.17
`random_state` to support Stochastic Average Gradient.
Attributes
----------
coef_ : ndarray of shape (n_features,) or (n_targets, n_features)
Weight vector(s).
intercept_ : float or ndarray of shape (n_targets,)
Independent term in decision function. Set to 0.0 if
``fit_intercept = False``.
n_iter_ : None or ndarray of shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.
.. versionadded:: 0.17
See also
--------
RidgeClassifier : Ridge classifier
RidgeCV : Ridge regression with built-in cross validation
:class:`sklearn.kernel_ridge.KernelRidge` : Kernel ridge regression
combines ridge regression with the kernel trick
Examples
--------
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge()
"""
@_deprecate_positional_args
def __init__(self, alpha=1.0, *, fit_intercept=True, normalize=False,
copy_X=True, max_iter=None, tol=1e-3, solver="auto",
random_state=None):
super().__init__(alpha=alpha, fit_intercept=fit_intercept,
normalize=normalize, copy_X=copy_X, max_iter=max_iter, tol=tol,
solver=solver, random_state=random_state)
def fit(self, X, y, sample_weight=None):
"""Fit Ridge regression model.
Parameters
----------
X : {ndarray, sparse matrix} of shape (n_samples, n_features)
Training data
y : ndarray of shape (n_samples,) or (n_samples, n_targets)
Target values
sample_weight : float or ndarray of shape (n_samples,), default=None
Individual weights for each sample. If given a float, every sample
will have the same weight.
Returns
-------
self : returns an instance of self.
"""
return super().fit(X, y, sample_weight=sample_weight)
赞 (0)