sklearn:sklearn.preprocessing的MinMaxScaler简介、使用方法之详细攻略
sklearn:sklearn.preprocessing的MinMaxScaler简介、使用方法之详细攻略MinMaxScaler简介MinMaxScaler函数解释"""Transforms features by scaling each feature to a given range.This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one.The transformation is given by::X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))X_scaled = X_std * (max - min) + minwhere min, max = feature_range.This transformation is often used as an alternative to zero mean, unit variance scaling.Read more in the :ref:`User Guide <preprocessing_scaler>`.“”通过将每个特性缩放到给定范围来转换特性。这个估计量对每个特征进行了缩放和单独转换,使其位于训练集的给定范围内,即在0和1之间。变换由::X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))X_scaled = X_std * (max - min) + min其中,min, max = feature_range。这种转换经常被用来替代零均值,单位方差缩放。请参阅:ref: ' User Guide '。</preprocessing_scaler>Parameters----------feature_range : tuple (min, max), default=(0, 1)Desired range of transformed data.copy : boolean, optional, default TrueSet to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).参数feature_range: tuple (min, max),默认值=(0,1)所需的转换数据范围。复制:布尔值,可选,默认为真设置为False执行插入行规范化并避免复制(如果输入已经是numpy数组)。Attributes----------min_ : ndarray, shape (n_features,)Per feature adjustment for minimum.scale_ : ndarray, shape (n_features,)Per feature relative scaling of the data... versionadded:: 0.17*scale_* attribute.data_min_ : ndarray, shape (n_features,)Per feature minimum seen in the data.. versionadded:: 0.17*data_min_*data_max_ : ndarray, shape (n_features,)Per feature maximum seen in the data.. versionadded:: 0.17*data_max_*data_range_ : ndarray, shape (n_features,)Per feature range ``(data_max_ - data_min_)`` seen in the data.. versionadded:: 0.17*data_range_*属性----------min_: ndarray, shape (n_features,)每个功能调整为最小。scale_: ndarray, shape (n_features,)每个特征数据的相对缩放。. .versionadded:: 0.17* scale_ *属性。data_min_: ndarray, shape (n_features,)每个特征在数据中出现的最小值. .versionadded:: 0.17* data_min_ *data_max_: ndarray, shape (n_features,)每个特征在数据中出现的最大值. .versionadded:: 0.17* data_max_ *data_range_: ndarray, shape (n_features,)在数据中看到的每个特性范围' ' (data_max_ - data_min_) ' '. .versionadded:: 0.17* data_range_ *MinMaxScaler底层代码class MinMaxScaler Found at: sklearn.preprocessing.dataclass MinMaxScaler(BaseEstimator, TransformerMixin): def __init__(self, feature_range=(0, 1), copy=True): self.feature_range = feature_range self.copy = copy def _reset(self): """Reset internal data-dependent state of the scaler, if necessary. __init__ parameters are not touched. """ # Checking one attribute is enough, becase they are all set together # in partial_fit if hasattr(self, 'scale_'): del self.scale_ del self.min_ del self.n_samples_seen_ del self.data_min_ del self.data_max_ del self.data_range_ def fit(self, X, y=None): """Compute the minimum and maximum to be used for later scaling. Parameters ---------- X : array-like, shape [n_samples, n_features] The data used to compute the per-feature minimum and maximum used for later scaling along the features axis. """ # Reset internal state before fitting self._reset() return self.partial_fit(X, y) def partial_fit(self, X, y=None): """Online computation of min and max on X for later scaling. All of X is processed as a single batch. This is intended for cases when `fit` is not feasible due to very large number of `n_samples` or because X is read from a continuous stream. Parameters ---------- X : array-like, shape [n_samples, n_features] The data used to compute the mean and standard deviation used for later scaling along the features axis. y : Passthrough for ``Pipeline`` compatibility. """ feature_range = self.feature_range if feature_range[0] >= feature_range[1]: raise ValueError( "Minimum of desired feature range must be smaller" " than maximum. Got %s." % str(feature_range)) if sparse.issparse(X): raise TypeError("MinMaxScaler does no support sparse input. " "You may consider to use MaxAbsScaler instead.") X = check_array(X, copy=self.copy, warn_on_dtype=True, estimator=self, dtype=FLOAT_DTYPES) data_min = np.min(X, axis=0) data_max = np.max(X, axis=0) # First pass if not hasattr(self, 'n_samples_seen_'): self.n_samples_seen_ = X.shape[0] else: data_min = np.minimum(self.data_min_, data_min) data_max = np.maximum(self.data_max_, data_max) self.n_samples_seen_ += X.shape[0] # Next steps data_range = data_max - data_min self.scale_ = (feature_range[1] - feature_range[0]) / _handle_zeros_in_scale(data_range) self.min_ = feature_range[0] - data_min * self.scale_ self.data_min_ = data_min self.data_max_ = data_max self.data_range_ = data_range return self def transform(self, X): """Scaling features of X according to feature_range. Parameters ---------- X : array-like, shape [n_samples, n_features] Input data that will be transformed. """ check_is_fitted(self, 'scale_') X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) X *= self.scale_ X += self.min_ return X def inverse_transform(self, X): """Undo the scaling of X according to feature_range. Parameters ---------- X : array-like, shape [n_samples, n_features] Input data that will be transformed. It cannot be sparse. """ check_is_fitted(self, 'scale_') X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) X -= self.min_ X /= self.scale_ return X MinMaxScaler的使用方法1、基础案例>>> from sklearn.preprocessing import MinMaxScaler >>> >>> data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]] >>> scaler = MinMaxScaler() >>> print(scaler.fit(data)) MinMaxScaler(copy=True, feature_range=(0, 1)) >>> print(scaler.data_max_) [ 1. 18.] >>> print(scaler.transform(data)) [[ 0. 0. ] [ 0.25 0.25] [ 0.5 0.5 ] [ 1. 1. ]] >>> print(scaler.transform([[2, 2]])) [[ 1.5 0. ]]