Stata17:面板数据多元logit模型
引言
多元logit (MNL)模型是一种流行的方法,用于建立没有自然排序结果的分类选择模型,如职业、政党或餐厅选择。
在logit/panel数据中,我们随时间观察一系列结果。比如说,我们每周都会观察个人对餐厅的选择。你认为每周的餐厅选择是独立的吗?可能不会。喜欢意大利菜的人可能会多次选择意大利餐厅。这些选择是由潜在的个人偏好和特征驱动的,其中一些是没有观察到的。
Stata新的xtmlogit命令适用于随机效应和条件固定效应MNL模型,适用于随时间推移观察到的分类结果。
为了估计随机效应多项logit模型,我们可以键入
xtset subject
xtmlogit restaurant age
并通过包括特定于每个结果水平的随机效应来估计标准多项logit系数,以考虑时不变的特定主题特征。
根据上面的命令,随机效应被假定是正态分布的,并且独立于结果水平(餐厅选择),但是几个方差-协方差结构被支持,包括一个完全不受限制的协方差:
xtmlogit restaurant age, covariance(unstructured)
如果你怀疑特定主题效应可能与年龄相关,你可以使用条件固定效应估计来解释这一点:
. xtmlogit restaurant age, fe
1、让我们看看它如何工作
我们想知道,如果一个人家里有5岁以下的孩子,他是否更有可能脱离劳动力。我们将使用一个(虚构的)数据集,每两年询问一次男性和女性的就业状况。
我们有一个(虚构的)不平衡的面板数据集,收集了第一次采访时年龄在18岁至40岁之间的800名女性。我们希望估计家庭中有18岁以下子女对妇女就业状况的影响。具体来说,我们希望了解女性在有了孩子后是否更有可能不参加劳动。如果是这样,可能性有多大?
这项调查每两年重复一次,调查对象会被问及她们在访谈前一年的主要就业状况。就业状况反应类别包括有工作(全职、兼职或自雇)、失业(找工作)和非劳动力。以下是数据集的摘录,显示了三个人的就业历史:
use https://www.stata-press.com/data/r17/estatus# (虚构的就业状况数据)
数据介绍:
被解释变量 estatus是妇女当下的就业状况,有三个层次:就业、失业(待业)和不在劳动力(不待业)。包括「1.非劳动力(out of the labor force);2.失业(但仍在找工作)(unemployed);3.在职(employed)」。
我们的兴趣变量/解释变量:解释变量包括
hhchild:表示在采访时家里是否有18岁以下的孩子
hhincome:表示家庭年收入
hhsigno:表示配偶是否在家中居住
bwinner:表示是否是家庭中唯一(或主要)的劳动力
age:表示年龄
2、数据结构查看
以下是数据集的摘录,显示了三个个体的就业历史:
use https://www.stata-press.com/data/r17/estatus
list id year estatus hhchild age in 22/41, sepby(id) noobs
结果为:
. list id year estatus hhchild age in 22/41, sepby(id) noobs
+------------------------------------------------+ | id year estatus hhchild age | |------------------------------------------------| | 5 2002 Employed Yes 38 | | 5 2004 Employed No 40 | | 5 2006 Employed No 42 | | 5 2008 Employed No 44 | | 5 2010 Out of labor force No 46 | | 5 2012 Out of labor force No 48 | | 5 2014 Unemployed No 50 | |------------------------------------------------| | 6 2002 Unemployed Yes 31 | | 6 2004 Employed Yes 33 | | 6 2006 Out of labor force Yes 35 | | 6 2008 Unemployed Yes 37 | | 6 2010 Out of labor force Yes 39 | | 6 2012 Unemployed No 41 | |------------------------------------------------| | 7 2002 Out of labor force Yes 33 | | 7 2004 Employed Yes 35 | | 7 2006 Employed Yes 37 | | 7 2008 Out of labor force Yes 39 | | 7 2010 Employed No 41 | | 7 2012 Employed No 43 | | 7 2014 Employed No 45 | +------------------------------------------------+
.
上面摘录的第一个人(id==5)是在2002年至2014年间观察到的。
可变状态记录了这些年来的就业历史。在这个案例中,这个人在2002年至2008年间一直有工作,在2010年至2012年间不在劳动力范围内,在2014年面试之前一直处于失业状态。
变量hhchild记录了在采访时是否至少有一个18岁以下的孩子与被调查者住在同一个家庭。看看上面节选的第一个人的数据,我们看到2002年家里有一个或多个孩子,但在2004年到2014年之间没有孩子。可变年龄记录了每次访谈中女性的年龄。在这个案例中,该妇女被观察到年龄在38到50岁之间。
为了检查整个样本的就业状况分布,我们可以使用tabulate命令:
tabulate estatus
结果为:
tabulate estatus
Employment status | Freq. Percent Cum.-------------------+-----------------------------------Out of labor force | 1,682 35.33 35.33 Unemployed | 703 14.77 50.09 Employed | 2,376 49.91 100.00-------------------+----------------------------------- Total | 4,761 100.00
我们可以看到,在35%的观察中,被采访的女性报告说她们不属于劳动力,15%的女性失业,50%的女性就业。
3、面板设定
在估计模型之前,需要使用xtset指定面板标识符变量id。
代码为:
. . xtset id
结果为:
. . xtset id
Panel variable: id (unbalanced)
4、面板多元logit模型
现在我们可以继续使用xtmlogit来拟合我们的模型。我们还将包括一些控制变量:年龄,一个人的家庭年收入(hhincome),另一半是否还生活在家庭(hhsigno)和surveyee是否唯一或主要养家糊口在她的家庭)。
我们使用变量status作为因变量,而hhchild是我们感兴趣的自变量。因为hhchild、hhsigno和bwinner是二进制变量,所以我们将它们指定为因子变量。
我们将从一个随机效应模型(默认)开始,并使用rrr选项来得到可以解释为胜算比的指数系数。
xtmlogit estatus i.hhchild age hhincome i.hhsigno, rrr
结果为:
. xtmlogit estatus i.hhchild age hhincome i.hhsigno, rrr
Fitting comparison model ...
Refining starting values:
Grid node 0: log likelihood = -4504.5591Grid node 1: log likelihood = -4538.6352
Fitting full model:
Iteration 0: log likelihood = -4504.5591 Iteration 1: log likelihood = -4495.871 Iteration 2: log likelihood = -4490.5098 Iteration 3: log likelihood = -4490.4197 Iteration 4: log likelihood = -4490.4196
Random-effects multinomial logistic regression Number of obs = 4,761Group variable: id Number of groups = 800
Random effects u_i ~ Gaussian Obs per group: min = 5 avg = 6.0 max = 7
Integration method: mvaghermite Integration pts. = 7
Wald chi2(8) = 199.25Log likelihood = -4490.4196 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------ estatus | RRR Std. err. z P>|z| [95% conf. interval]-------------------+----------------------------------------------------------------Out_of_labor_force | hhchild | Yes | 1.579937 .1513905 4.77 0.000 1.309414 1.906349 age | .9947946 .0065832 -0.79 0.430 .981975 1.007781 hhincome | .9954927 .0018251 -2.46 0.014 .9919221 .9990762 | hhsigno | Yes | 1.642859 .1550291 5.26 0.000 1.365452 1.976625 _cons | .4949307 .1392991 -2.50 0.012 .2850836 .859244-------------------+----------------------------------------------------------------Unemployed | hhchild | Yes | .9607243 .1148148 -0.34 0.737 .7601038 1.214296 age | 1.004257 .008211 0.52 0.603 .9882918 1.02048 hhincome | .9696874 .0025722 -11.60 0.000 .964659 .9747421 | hhsigno | Yes | 1.099323 .1310654 0.79 0.427 .8702452 1.388701 _cons | .8078165 .280628 -0.61 0.539 .4088963 1.595924-------------------+----------------------------------------------------------------Employed | (base outcome)-------------------+---------------------------------------------------------------- var(u1)| .8573133 .1083915 .6691459 1.098394 var(u2)| .7378532 .1388652 .5102376 1.067008------------------------------------------------------------------------------------Note: Estimates are transformed only in the first 3 equations to relative-risk ratios.Note: _cons estimates baseline relative risk (conditional on zero random effects).LR test vs. multinomial logit: chi2(2) = 227.68 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
.
输出的前两部分显示了我们的预测者相对于已使用的基本类别的胜算比估计。
最后一节给出了随机效应的估计方差。默认情况下,随机效果是不相关的,但是可以使用covariance() 选项更改它们的协方差结构。
例如,可以使用协方差(非结构化)估计随机效应之间的相关性,或者每个类别可以使用协方差(共享)共享一个共同的随机效应。
调整了年龄、家庭收入和另一半在家,相对风险比为1.6
为了从概率的角度理解这些影响,我们可以使用margin命令。
margins hhchild
结果为:
. margins hhchild
Predictive margins Number of obs = 4,761Model VCE: OIM
1._predict: Pr(estatus==Out_of_labor_force), predict(pr outcome(1))2._predict: Pr(estatus==Unemployed), predict(pr outcome(2))3._predict: Pr(estatus==Employed), predict(pr outcome(3))
---------------------------------------------------------------------------------- | Delta-method | Margin std. err. z P>|z| [95% conf. interval]-----------------+----------------------------------------------------------------_predict#hhchild | 1#No | .3025675 .0131546 23.00 0.000 .276785 .32835 1#Yes | .3912476 .0120405 32.49 0.000 .3676486 .4148466 2#No | .1628713 .0101131 16.11 0.000 .1430501 .1826925 2#Yes | .1398537 .0079462 17.60 0.000 .1242794 .1554279 3#No | .5345612 .0136994 39.02 0.000 .5077108 .5614116 3#Yes | .4688987 .0116594 40.22 0.000 .4460468 .4917507----------------------------------------------------------------------------------
对于一个没有孩子的个体,脱离劳动力的期望概率(标注1#No)是0.30,失业的期望概率(标注2#No)是0.16,被雇佣的期望概率是0.53(标注3#No)。我们还发现有孩子的家庭使他们脱离劳动力的可能性增加了9个百分点。
我们可以看到这些概率如何变化的家庭收入使用额外的边际命令和可视化的结果使用边际图。
quietly margins hhchild, at(hhincome=(20(20)100))
. marginsplot, by(_predict, label('Out of labor force' 'Unemployed' 'Employed')) ///
> byopts(rows(1) title('Marginal probabilities of employment status')) ///
> legend(order(4 'Child under 5 at home' 3 'No child under 5 at home'))
Variables that uniquely identify margins: hhincome hhchild
结果为:
在我们刚刚拟合的模型中,我们使用了随机效应来解释我们数据集中没有观察到的个体特征。随机效应模型要求随机效应与预测因子不相关,随机效应MNL模型也不例外。
一个广泛使用的替代方法是固定效应估计。为了使我们的模型符合条件固定效果,我们简单地添加fe选项。
xtmlogit estatus i.hhchild age hhincome i.hhsigno, fe rrr
结果为:
. xtmlogit estatus i.hhchild age hhincome i.hhsigno, fe rrr
note: 80 groups (451 obs) omitted because of no variation in the outcome variable over time.
Computing initial values ...
Setting up 26,168 permutations:
....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Fitting full model:
Iteration 0: log likelihood = -2154.4175
Iteration 1: log likelihood = -2154.2058
Iteration 2: log likelihood = -2154.2057
Fixed-effects multinomial logistic regression Number of obs = 4,310
Group variable: id Number of groups = 720
Obs per group:
min = 5
avg = 6.0
max = 7
LR chi2(8) = 67.42
Log likelihood = -2154.2057 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------
estatus | RRR Std. err. z P>|z| [95% conf. interval]
-------------------+----------------------------------------------------------------
Out_of_labor_force |
hhchild |
Yes | 1.784236 .2237128 4.62 0.000 1.395488 2.28128
age | .9977834 .0146507 -0.15 0.880 .9694778 1.026915
hhincome | .9895225 .0086923 -1.20 0.231 .9726318 1.006707
|
hhsigno |
Yes | 1.658753 .1654425 5.07 0.000 1.364217 2.016878
-------------------+----------------------------------------------------------------
Unemployed |
hhchild |
Yes | 1.181866 .1933766 1.02 0.307 .8576197 1.628702
age | 1.004991 .0194887 0.26 0.797 .967511 1.043924
hhincome | .9717411 .0116616 -2.39 0.017 .9491514 .9948684
|
hhsigno |
Yes | 1.11936 .1454154 0.87 0.385 .8677426 1.443939
-------------------+----------------------------------------------------------------
Employed | (base outcome)
------------------------------------------------------------------------------------
.
结果与随机效应估计的结果相似。它们可以用同样的方式来解释。
xtmlogit estatus i.hhchild age hhincome i.hhsigno i.bwinner
文章代码汇总:
*========================================. * 高级计量经济学. *========================================. . . * 计量经济学服务中心. . . *------------------------------------------------------------------------------- . * 参考资料:. * 《初级计量经济学及Stata应用:Stata从入门到进阶》 . * 《高级计量经济学及Stata应用:Stata回归分析与应用》. * 《量化社会科学方法》. * 《社会科学因果推断》. * 《面板数据计量分析方法》. * 《时间序列计量分析方法》. * 《高级计量经济学及Eviews应用》. * 《R、Python、Mtalab初高级教程》. * 《空间计量入门:空间计量在Geoda、GeodaSpace中的应用》 . * 《零基础|轻松搞定空间计量:空间计量及GeoDa、Stata应用》. * 《空间计量第二部:空间计量及Matlab应用课程》. * 《空间计量第三部:空间计量及Stata应用课程》. * 《空间计量第四部:《空间计量及ArcGis应用课程》. * 《空间计量第五部:空间计量经济学》. * 《空间计量第六部:《空间计量及Python应用》. * 《空间计量第七部:《空间计量及R应用》. * 《空间计量第八部:《高级空间计量经济学》. *-------------------------------------------------------------------------------. . . . *-------------------------------------------------------------------------------. *高级计量经济学--面板数据计量分析* . *数量经济学&计量经济学服务中心. *-------------------------------------------------------------------------------. end of do-file
. use estatus.dta,clear(Fictional employment status data)
. desc
Contains data from estatus.dta,clear Observations: 4,761 Fictional employment status data Variables: 8 14 May 2021 22:12-------------------------------------------------------------------------------------------------------------------------------------Variable Storage Display Value name type format label Variable label-------------------------------------------------------------------------------------------------------------------------------------id int %9.0g Respondent IDyear int %9.0g Year of surveyestatus byte %18.0g alt Employment statushhchild byte %9.0g noyes Children <18 years old in householdhhincome int %10.0g Annual household income (in $1,000s)hhsigno byte %10.0g noyes Significant other living in householdbwinner byte %10.0g noyes Primary/sole breadwinner in householdage byte %10.0g Age (in years)-------------------------------------------------------------------------------------------------------------------------------------Sorted by: id year
. list id year estatus hhchild age in 22/41, sepby(id) noobs
+------------------------------------------------+ | id year estatus hhchild age | |------------------------------------------------| | 5 2002 Employed Yes 38 | | 5 2004 Employed No 40 | | 5 2006 Employed No 42 | | 5 2008 Employed No 44 | | 5 2010 Out of labor force No 46 | | 5 2012 Out of labor force No 48 | | 5 2014 Unemployed No 50 | |------------------------------------------------| | 6 2002 Unemployed Yes 31 | | 6 2004 Employed Yes 33 | | 6 2006 Out of labor force Yes 35 | | 6 2008 Unemployed Yes 37 | | 6 2010 Out of labor force Yes 39 | | 6 2012 Unemployed No 41 | |------------------------------------------------| | 7 2002 Out of labor force Yes 33 | | 7 2004 Employed Yes 35 | | 7 2006 Employed Yes 37 | | 7 2008 Out of labor force Yes 39 | | 7 2010 Employed No 41 | | 7 2012 Employed No 43 | | 7 2014 Employed No 45 | +------------------------------------------------+
. . xtset id
Panel variable: id (unbalanced)
. . tabulate estatus
Employment status | Freq. Percent Cum.-------------------+-----------------------------------Out of labor force | 1,682 35.33 35.33 Unemployed | 703 14.77 50.09 Employed | 2,376 49.91 100.00-------------------+----------------------------------- Total | 4,761 100.00
. xtmlogit estatus i.hhchild age hhincome i.hhsigno, rrr
Fitting comparison model ...
Refining starting values:
Grid node 0: log likelihood = -4504.5591Grid node 1: log likelihood = -4538.6352
Fitting full model:
Iteration 0: log likelihood = -4504.5591 Iteration 1: log likelihood = -4495.871 Iteration 2: log likelihood = -4490.5098 Iteration 3: log likelihood = -4490.4197 Iteration 4: log likelihood = -4490.4196
Random-effects multinomial logistic regression Number of obs = 4,761Group variable: id Number of groups = 800
Random effects u_i ~ Gaussian Obs per group: min = 5 avg = 6.0 max = 7
Integration method: mvaghermite Integration pts. = 7
Wald chi2(8) = 199.25Log likelihood = -4490.4196 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------ estatus | RRR Std. err. z P>|z| [95% conf. interval]-------------------+----------------------------------------------------------------Out_of_labor_force | hhchild | Yes | 1.579937 .1513905 4.77 0.000 1.309414 1.906349 age | .9947946 .0065832 -0.79 0.430 .981975 1.007781 hhincome | .9954927 .0018251 -2.46 0.014 .9919221 .9990762 | hhsigno | Yes | 1.642859 .1550291 5.26 0.000 1.365452 1.976625 _cons | .4949307 .1392991 -2.50 0.012 .2850836 .859244-------------------+----------------------------------------------------------------Unemployed | hhchild | Yes | .9607243 .1148148 -0.34 0.737 .7601038 1.214296 age | 1.004257 .008211 0.52 0.603 .9882918 1.02048 hhincome | .9696874 .0025722 -11.60 0.000 .964659 .9747421 | hhsigno | Yes | 1.099323 .1310654 0.79 0.427 .8702452 1.388701 _cons | .8078165 .280628 -0.61 0.539 .4088963 1.595924-------------------+----------------------------------------------------------------Employed | (base outcome)-------------------+---------------------------------------------------------------- var(u1)| .8573133 .1083915 .6691459 1.098394 var(u2)| .7378532 .1388652 .5102376 1.067008------------------------------------------------------------------------------------Note: Estimates are transformed only in the first 3 equations to relative-risk ratios.Note: _cons estimates baseline relative risk (conditional on zero random effects).LR test vs. multinomial logit: chi2(2) = 227.68 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. margins hhchild
Predictive margins Number of obs = 4,761Model VCE: OIM
1._predict: Pr(estatus==Out_of_labor_force), predict(pr outcome(1))2._predict: Pr(estatus==Unemployed), predict(pr outcome(2))3._predict: Pr(estatus==Employed), predict(pr outcome(3))
---------------------------------------------------------------------------------- | Delta-method | Margin std. err. z P>|z| [95% conf. interval]-----------------+----------------------------------------------------------------_predict#hhchild | 1#No | .3025675 .0131546 23.00 0.000 .276785 .32835 1#Yes | .3912476 .0120405 32.49 0.000 .3676486 .4148466 2#No | .1628713 .0101131 16.11 0.000 .1430501 .1826925 2#Yes | .1398537 .0079462 17.60 0.000 .1242794 .1554279 3#No | .5345612 .0136994 39.02 0.000 .5077108 .5614116 3#Yes | .4688987 .0116594 40.22 0.000 .4460468 .4917507----------------------------------------------------------------------------------
. quietly margins hhchild, at(hhincome=(20(20)100))
. xtmlogit estatus i.hhchild age hhincome i.hhsigno, fe rrrnote: 80 groups (451 obs) omitted because of no variation in the outcome variable over time.
Computing initial values ...
Setting up 26,168 permutations:....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Fitting full model:
Iteration 0: log likelihood = -2154.4175 Iteration 1: log likelihood = -2154.2058 Iteration 2: log likelihood = -2154.2057
Fixed-effects multinomial logistic regression Number of obs = 4,310Group variable: id Number of groups = 720
Obs per group: min = 5 avg = 6.0 max = 7
LR chi2(8) = 67.42Log likelihood = -2154.2057 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------ estatus | RRR Std. err. z P>|z| [95% conf. interval]-------------------+----------------------------------------------------------------Out_of_labor_force | hhchild | Yes | 1.784236 .2237128 4.62 0.000 1.395488 2.28128 age | .9977834 .0146507 -0.15 0.880 .9694778 1.026915 hhincome | .9895225 .0086923 -1.20 0.231 .9726318 1.006707 | hhsigno | Yes | 1.658753 .1654425 5.07 0.000 1.364217 2.016878-------------------+----------------------------------------------------------------Unemployed | hhchild | Yes | 1.181866 .1933766 1.02 0.307 .8576197 1.628702 age | 1.004991 .0194887 0.26 0.797 .967511 1.043924 hhincome | .9717411 .0116616 -2.39 0.017 .9491514 .9948684 | hhsigno | Yes | 1.11936 .1454154 0.87 0.385 .8677426 1.443939-------------------+----------------------------------------------------------------Employed | (base outcome)------------------------------------------------------------------------------------