Stata：一文读懂倾向得分匹配(PSM)中协变量选择问题 / 四六文摘

前言：

众所周知，倾向匹配得分需要前提条件是依可观测变量，那么里面选择的变量个数就可能比较多，那么如何选择协变量呢？若是模型中出现的协变量不显著，该如何选择呢，这个时候就涉及倾向得分匹配(PSM)中协变量选择问题。

psestimate：Estimate the propensity score proposed by Imbens and Rubin (2015)

psestimate命令估计基于Imbens and Rubin (2015)提出的倾向评分。特别是，它实现了Imbens(2015)提出的算法，该算法估计了表示治疗状态的二元因变量的倾向分数。该程序的主要目的是选择一个含协变量的线性或二次函数倾向分数的估计函数。

语法格式

psestimate depvar [indepvars] [if] [in] [, options]

选项含义介绍

depvar 表示处理变量

indepvars表示放入模型中的基本变量X

totry(indepvars)表示specify list of covariates to try; default is all

notry(varlist)表示specify list of covariates to exclude; default is none

nolin表示阻止进行一阶线性多项式的选择，即进行二阶prevent algorithm of testing linear terms

noquad表示prevent algorithm of testing quadratic terms进行一阶多项式选择

clinear(real)表示threshold value for likelihood ratio test of first order covariates; default is 1

cquadratic(real)表示指定了用于添加二阶(二次)项的阈值。该决策基于空值的似然比检验统计量假设附加的二阶项的系数为零。更多信息参见[R] lrtest。如果指定了noquad选项，则表示选择是无关紧要的。默认值是2.71。

iterate(#)表示指定每个logit评估中的最大迭代次数。Stata的默认值是1600。参见[R] logit和[R] maximize获得更多信息

genpscore(newvar)表示指定生成一个带有估计倾向分数的新变量newvar。generate new variable with propensity score estimation

genlor(newvar)表示 generate new variable with log odds ratio指定生成一个对数似然比的新变量，名为newvar。

案例介绍

1、下载安装外部命令及数据

help psestimate

出现结果为：

根据出现界面下载ado以及data

2、导入数据

use nswre74

查看数据：

3、Select PS model for treatment variable

 psestimate treat

4、Select PS model from restricted list of covariates and lowered quadratic threshold

psestimate treat, totry(age-nodeg re*) cquad(.8)

在倾向得分匹配中，我们应该选取的一阶协变量为 nodeg re78 hisp ed，二阶协变量为c.ed#c.nodeg c.ed#c.hisp。

5、 Estimate propensity score with no quadratic terms

 psestimate treat, genpscore(ps) noquad

6、 Estimate log odds ratio with explicit selection of linear terms

psestimate treat age-nodeg, nolin genlor(logodds)

Stata：一文读懂倾向得分匹配(PSM)中协变量选择问题