ML之LightGBM：基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

2024-05-20 07:55:00

相关文章
ML之LightGBM：基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)
ML之LightGBM：基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)实现

基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

设计思路

更新……

输出结果

核心代码

# flake8: noqa

import warnings
import sys

__version__ = '0.37.0'

# check python version
if (sys.version_info < (3, 0)):
    warnings.warn("As of version 0.29.0 shap only supports Python 3 (not 2)!")

from ._explanation import Explanation, Cohorts

# explainers
from .explainers._explainer import Explainer
from .explainers._kernel import Kernel as KernelExplainer
from .explainers._sampling import Sampling as SamplingExplainer
from .explainers._tree import Tree as TreeExplainer
from .explainers._deep import Deep as DeepExplainer
from .explainers._gradient import Gradient as GradientExplainer
from .explainers._linear import Linear as LinearExplainer
from .explainers._partition import Partition as PartitionExplainer
from .explainers._permutation import Permutation as PermutationExplainer
from .explainers._additive import Additive as AdditiveExplainer
from .explainers import other

# plotting (only loaded if matplotlib is present)
def unsupported(*args, **kwargs):
    warnings.warn("matplotlib is not installed so plotting is not available! Run `pip install matplotlib` to fix this.")

try:
    import matplotlib
    have_matplotlib = True
except ImportError:
    have_matplotlib = False
if have_matplotlib:
    from .plots._beeswarm import summary_legacy as summary_plot
    from .plots._decision import decision as decision_plot, multioutput_decision as multioutput_decision_plot
    from .plots._scatter import dependence_legacy as dependence_plot
    from .plots._force import force as force_plot, initjs, save_html, getjs
    from .plots._image import image as image_plot
    from .plots._monitoring import monitoring as monitoring_plot
    from .plots._embedding import embedding as embedding_plot
    from .plots._partial_dependence import partial_dependence as partial_dependence_plot
    from .plots._bar import bar_legacy as bar_plot
    from .plots._waterfall import waterfall as waterfall_plot
    from .plots._group_difference import group_difference as group_difference_plot
    from .plots._text import text as text_plot
else:
    summary_plot = unsupported
    decision_plot = unsupported
    multioutput_decision_plot = unsupported
    dependence_plot = unsupported
    force_plot = unsupported
    initjs = unsupported
    save_html = unsupported
    image_plot = unsupported
    monitoring_plot = unsupported
    embedding_plot = unsupported
    partial_dependence_plot = unsupported
    bar_plot = unsupported
    waterfall_plot = unsupported
    text_plot = unsupported

# other stuff :)
from . import datasets
from . import utils
from . import links

#from . import benchmark

from .utils._legacy import kmeans
from .utils import sample, approximate_interactions

# TODO: Add support for hclustering based explanations where we sort the leaf order by magnitude and then show the dendrogram to the left
def summary_legacy(shap_values, features=None, feature_names=None, max_display=None, plot_type=None,
                 color=None, axis_color="#333333", title=None, alpha=1, show=True, sort=True,
                 color_bar=True, plot_size="auto", layered_violin_max_num_bins=20, class_names=None,
                 class_inds=None,
                 color_bar_label=labels["FEATURE_VALUE"],
                 cmap=colors.red_blue,
                 # depreciated
                 auto_size_plot=None,
                 use_log_scale=False):
    """Create a SHAP beeswarm plot, colored by feature values when they are provided.

    Parameters
    ----------
    shap_values : numpy.array
        For single output explanations this is a matrix of SHAP values (# samples x # features).
        For multi-output explanations this is a list of such matrices of SHAP values.

    features : numpy.array or pandas.DataFrame or list
        Matrix of feature values (# samples x # features) or a feature_names list as shorthand

    feature_names : list
        Names of the features (length # features)

    max_display : int
        How many top features to include in the plot (default is 20, or 7 for interaction plots)

    plot_type : "dot" (default for single output), "bar" (default for multi-output), "violin",
        or "compact_dot".
        What type of summary plot to produce. Note that "compact_dot" is only used for
        SHAP interaction values.

    plot_size : "auto" (default), float, (float, float), or None
        What size to make the plot. By default the size is auto-scaled based on the number of
        features that are being displayed. Passing a single float will cause each row to be that
        many inches high. Passing a pair of floats will scale the plot by that
        number of inches. If None is passed then the size of the current figure will be left
        unchanged.
    """

    # support passing an explanation object
    if str(type(shap_values)).endswith("Explanation'>"):
        shap_exp = shap_values
        base_value = shap_exp.base_value
        shap_values = shap_exp.values
        if features is None:
            features = shap_exp.data
        if feature_names is None:
            feature_names = shap_exp.feature_names
        # if out_names is None: # TODO: waiting for slicer support of this
        #     out_names = shap_exp.output_names

    # deprecation warnings
    if auto_size_plot is not None:
        warnings.warn("auto_size_plot=False is deprecated and is now ignored! Use plot_size=None instead.")

    multi_class = False
    if isinstance(shap_values, list):
        multi_class = True
        if plot_type is None:
            plot_type = "bar" # default for multi-output explanations
        assert plot_type == "bar", "Only plot_type = 'bar' is supported for multi-output explanations!"
    else:
        if plot_type is None:
            plot_type = "dot" # default for single output explanations
        assert len(shap_values.shape) != 1, "Summary plots need a matrix of shap_values, not a vector."

    # default color:
    if color is None:
        if plot_type == 'layered_violin':
            color = "coolwarm"
        elif multi_class:
            color = lambda i: colors.red_blue_circle(i/len(shap_values))
        else:
            color = colors.blue_rgb

plot参数

上篇介绍了如何用plot函数来画折线图,以及如何将多个图画在同一个图片上,本篇介绍的是plot函数一些参数的设置. 1.linewidth和linestyle参数上篇用了color来修改折线的颜色, ...
XGB模型可解释性SHAP包实战

可解释机器学习在这几年慢慢成为了机器学习的重要研究方向.作为数据科学家需要防止模型存在偏见,且帮助决策者理解如何正确地使用我们的模型.越是严苛的场景,越需要模型提供证明它们是如何运作且避免错误的证据 ...
【机器学习】NeuralProphet，这个时序工具包也太强了吧...

作者:杰少 NeuralProphet 简介几乎绝大多数做时间序列的朋友都了解Facebook的Prophet模型,因为其在准确性.可解释性等方面有着良好的性能,而且可以为用户自动化许多元素(如超参 ...
应用SHAP可解释框架对多种分类问题模型进行解释

模型可解释性成为机器学习流水线的一个基本部分.将机器学习模型作为"黑盒子"不再是一种选择.幸运的是,像lime.ExplainerDashboard.Shapash.Dalex等分 ...
折线图

折线图
python模块

模块在python里面是很实用的东西,类似于其他语言里的包或头文件. 一个模块其实就是一个保存了python代码的文件. 1.导入模块用关键字import导入模块. 如,要使用数学函数,可以导入ma ...
plt参数二

我还是曾经那个少年, 没有一丝丝改变, 天马行空的想象, 没有一丁点儿实现...... 本篇还介绍plt的参数,接上篇plt设置. 3.坐标轴标题可以用plt.xlabel()或者plt.ylabe ...
用TSNE进行数据降维并展示聚类结果

TSNE提供了一种有效的数据降维方式,让我们可以在2维或3维的空间中展示聚类结果. #-*- coding: utf-8 -*-from __future__ import unicode_liter ...
【机器学习】总结了九种机器学习集成分类算法(原理代码)

大家好,我是云朵君! 导读: 本文是分类分析(基于Python实现五大常用分类算法(原理+代码))第二部分,继续沿用第一部分的数据.会总结性介绍集成分类算法原理及应用,模型调参数将不在本次讨论范围内. ...
ML之K-means：基于DIY数据集利用K-means算法聚类(测试9种不同聚类中心的模型性能)

ML之K-means:基于DIY数据集利用K-means算法聚类(测试9种不同聚类中心的模型性能) 输出结果设计思路 1.使用均匀分布函数随机三个簇,每个簇周围10个数据样本. 2.绘制30个数据样 ...
ML之LiR&Lasso：基于datasets糖尿病数据集利用LiR和Lasso算法进行(9→1)回归预测(三维图散点图可视化)

ML之LiR&Lasso:基于datasets糖尿病数据集利用LiR和Lasso算法进行(9→1)回归预测(三维图散点图可视化) 相关文章 ML之LiR&Lasso:基于dataset ...
ML之LassoR&RidgeR：基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测

ML之LassoR&RidgeR:基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测相关文章 ML之LassoR&RidgeR ...
ML：基于自定义数据集利用Logistic、梯度下降算法GD、LoR逻辑回归、Perceptron感知器、SVM支持向量机、LDA线性判别分析算法进行二分类预测(决策边界可视化)

ML:基于自定义数据集利用Logistic.梯度下降算法GD.LoR逻辑回归.Perceptron感知器.支持向量机(SVM_Linear.SVM_Rbf).LDA线性判别分析算法进行二分类预测(决策 ...
ML之NB：基于news新闻文本数据集利用朴素贝叶斯算法实现文本分类预测daiding

ML之NB:基于news新闻文本数据集利用朴素贝叶斯算法实现文本分类预测基于news新闻文本数据集利用朴素贝叶斯算法实现文本分类预测设计思路更新-- 输出结果 <class 'pandas ...
ML之FE：基于BigMartSales数据集利用Featuretools工具实现自动特征工程之详细攻略daiding

ML之FE:基于BigMartSales数据集利用Featuretools工具实现自动特征工程之详细攻略daiding 基于BigMartSales数据集利用Featuretools工具实现自动特征工 ...
ML之FE：基于BigMartSales数据集利用Featuretools工具(1个dataframe表结构切为2个Entity表结构)实现自动特征工程之详细攻略

ML之FE:基于BigMartSales数据集利用Featuretools工具(1个dataframe表结构切为2个Entity表结构)实现自动特征工程之详细攻略相关文章 ML之FE:基于BigMa ...
DL之DNN：基于自定义数据集利用深度神经网络(输入层(10个unit)→2个隐藏层(10个unit)→输出层1个unit)实现回归预测实现代码

DL之DNN:基于自定义数据集利用深度神经网络(输入层(10个unit)→2个隐藏层(10个unit)→输出层1个unit)实现回归预测实现代码基于自定义数据集利用深度神经网络(输入层(10个uni ...
ML之回归预测：利用十类机器学习算法(线性回归、kNN、SVM、决策树、随机森林、极端随机树、SGD、提升树、LightGBM、XGBoost)对波士顿数据集回归预测(模型评估、推理并导到csv)

ML之回归预测:利用十类机器学习算法(线性回归.kNN.SVM.决策树.随机森林.极端随机树.SGD.提升树.LightGBM.XGBoost)对波士顿数据集[13+1,506]回归预测(模型评估.推 ...

ML之LightGBM：基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

设计思路

输出结果

核心代码

相关推荐