# Comparing Logit and Probit Coefficients between Models and ... Comparing Logit and Probit Coefficients between Models and Across Groups Richard Williams (with assistance from Cheng Wang) Notre Dame Sociology [email protected] https://www3.nd.edu/~rwilliam August 2012 Annual Meetings of the American Sociological Association Introduction We are used to estimating models where an observed, continuous independent variable, Y, is regressed on one or more independent variables, i.e. Y X , N (0, 2 ) Since the residuals are uncorrelated with the Xs, it follows that V (Y ) V ( X ) V ( ) Explained Variance + Residual Variance As you add explanatory variables to a model, the variance of the observed variable Y stays the same in OLS regression. As the explained variance goes up, the residual variance goes down by a corresponding amount.

But suppose the observed Y is not continuous instead, it is a collapsed version of an underlying unobserved variable, Y* Examples: Do you approve or disapprove of the President's health care plan? 1 = Approve, 2 = Disapprove Income, coded in categories like \$0 = 1, \$1\$10,000 = 2, \$10,001-\$30,000 = 3, \$30,001\$60,000 = 4, \$60,001 or higher = 5 For such variables, also known as limited dependent variables, we know the interval that the underlying Y* falls in, but not its exact value Binary & Ordinal regression techniques allow us to estimate the effects of the Xs on the underlying Y*. They can also be used to see how the Xs affect the probability of being in one category of the The latent variable model in binary logistic regression can be written as y* X , Standard Logistic If y* >= 0, y = 1 If y* < 0, y = 0 In logistic regression, the errors are assumed to have a standard logistic distribution. A standard logistic distribution has a mean of 0 and a variance

of 2/3, or about 3.29. Since the residuals are uncorrelated with the Xs, it follows that V ( y*) V ( x ) V ( y* ) V ( x ) 2 / 3 V ( x ) 3.29 Notice an important difference between OLS and Logistic Regression. In OLS regression with an observed variable Y, V(Y) is fixed and the explained and unexplained variances change as variables are added to the model. But in logistic regression with an unobserved variable y*, V(y*) is fixed so the explained variance and total variance change as you add variables to the model. This difference has important implications. Comparisons of coefficients between nested models and across groups do Comparing Logit and Probit Coefficients across Models . use http://www.nd.edu/~rwilliam/xsoc73994/statafiles/standardized.dta . logit ybinary x1, nolog Logit estimates Log likelihood = -265.54468 Number of obs LR chi2(1)

Prob > chi2 Pseudo R2 = = = = 500 161.77 0.0000 0.2335 -----------------------------------------------------------------------------ybinary | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x1 | .7388678 .072961 10.13 0.000 .5958668 .8818687 _cons | -.0529777

.105911 -0.50 0.617 -.2605593 .154604 -----------------------------------------------------------------------------. logit ybinary x2, nolog Logit estimates Log likelihood = -266.25298 Number of obs LR chi2(1) Prob > chi2 Pseudo R2 = = = = 500 160.35 0.0000 0.2314

-----------------------------------------------------------------------------ybinary | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | .4886751 .0482208 10.13 0.000 .3941641 .5831861 _cons | -.0723833 .1058261 -0.68 0.494 -.2797986 .135032 -----------------------------------------------------------------------------. logit ybinary x1 x2, nolog . logit ybinary x1 x2, nolog

Logit estimates Log likelihood = -124.73508 Number of obs LR chi2(2) Prob > chi2 Pseudo R2 = = = = 500 443.39 0.0000 0.6399 -----------------------------------------------------------------------------ybinary | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x1 | 1.78923 .1823005

9.81 0.000 1.431927 2.146532 x2 | 1.173144 .1207712 9.71 0.000 .9364369 1.409851 _cons | -.2144856 .1626906 -1.32 0.187 -.5333532 .1043821 ------------------------------------------------------------------------------ Usually, when we add variables to a model (at least in OLS regression), the effects of variables added earlier goes down. However, in this case, we see that the coefficients for x1 and x2 increase (seemingly) dramatically when both variables are in the model, i.e. in the separate bivariate regressions the effects of x1 and x2 are .7388678 and .4886751, but in the multivariate regressions the effects are 1.78923 and 1.173144, more than twice as large as before. This leads to two questions: 1. If we saw something similar in an OLS regression, what would we suspect was going on?

In other words, in an OLS regression, what can cause coefficients to get bigger rather than smaller as more variables are added? 2. In a logistic regression, why might such an interpretation be totally wrong? . corr, means (obs=500) Variable | Mean Std. Dev. Min Max -------------+---------------------------------------------------y | 5.51e-07 3.000001 -8.508021 7.981196 ybinary | .488 .5003566 0 1 x1 | -2.19e-08 2 -6.32646 6.401608

x2 | 3.57e-08 3 -10.56658 9.646875 | y ybinary x1 x2 -------------+-----------------------------------y | 1.0000 ybinary | 0.7923 1.0000 x1 | 0.6667 0.5248 1.0000 x2 | 0.6667 0.5225 0.0000 1.0000 x1 and x2 are uncorrelated! So suppressor effects cannot account for the changes in coefficients.

Long & Freeses listcoef command can add some . quietly logit ybinary x1 . listcoef, std logit (N=500): Unstandardized and Standardized Estimates Observed SD: .50035659 Latent SD: 2.3395663 Odds of: 1 vs 0 ------------------------------------------------------------------------------ybinary | b z P>|z| bStdX bStdY bStdXY SDofX -------------+----------------------------------------------------------------x1 | 0.73887 10.127 0.000 1.4777 0.3158 0.6316 2.0000 ------------------------------------------------------------------------------. quietly logit . listcoef, std

ybinary x2 logit (N=500): Unstandardized and Standardized Estimates Observed SD: .50035659 Latent SD: 2.3321875 Odds of: 1 vs 0 ------------------------------------------------------------------------------ybinary | b z P>|z| bStdX bStdY bStdXY SDofX -------------+----------------------------------------------------------------x2 | 0.48868 10.134 0.000 1.4660 0.2095 0.6286 3.0000 ------------------------------------------------------------------------------- . quietly logit . listcoef, std

ybinary x1 x2 logit (N=500): Unstandardized and Standardized Estimates Observed SD: .50035659 Latent SD: 5.3368197 Odds of: 1 vs 0 ------------------------------------------------------------------------------ybinary | b z P>|z| bStdX bStdY bStdXY SDofX -------------+----------------------------------------------------------------x1 | 1.78923 9.815 0.000 3.5785 0.3353 0.6705 2.0000 x2 | 1.17314 9.714 0.000 3.5194

0.2198 0.6595 3.0000 ------------------------------------------------------------------------------- Note how the standard deviation of y* fluctuates from one logistic regression to the next; it is about 2.34 in each of the bivariate logistic regressions and 5.34 in the multivariate logistic regression. It is because the variance of y* changes that the coefficients change so much when you go from one model to the next. In effect, the scaling of Y* is different in each model. By way of analogy, if in one OLS regression income was measured in dollars, and in another it was measured in thousands of dollars, the coefficients would be Why does the variance of y* go up? Because it has to. The residual variance is fixed at 3.29, so improvements in model fit result in increases in explained variance which in turn result in increases in total variance. Hence, comparisons of coefficients across nested models can be misleading because the dependent variable is scaled

differently in each model. How serious is the problem in practice? Hard to say. We easily found dozens of recent papers that present sequences of nested models. Their numbers are at least a little off, but without re-analyzing the data you cant tell whether their conclusions are seriously distorted as a result. Several attempts of our own using real world data have failed to raise major concerns with the comparisons We asked several authors for copies of their data, but most were unwilling or unable to do One author, Ervin (Maliq) Matthew, did graciously provide us with the data used for his paper Effort Optimism in the Classroom: Attitudes of Black and White Students on Education, Social Structure, and Causes of Life Opportunities (Sociology of Education 2011 84:225-245) The paper contains potentially problematic statements such as The effect of race on the dependent variable is even stronger once GPA, SES, and sex are controlled for (Model 2), indicating that when blacks and whites have equal GPAs and family SES, blacks are more likely to agree with this statement. In practice, however, we found that any potential errors

were modest, with estimates being only slightly affected by solutions we discuss later. For example, his Table 7 Nonetheless, researchers should realize that Increases in the magnitudes of coefficients across models need not reflect suppressor effects Declines in coefficients across models will actually be understated, i.e. you will be understating how much other variables account for the estimated direct effects of the variables in the early models. Distortions are potentially more severe when added variables greatly increase the pseudo R^2 statistics, as the variance of Y* will What are possible solutions? Just dont present the coefficients for each model in the first place. Researchers often present chi-square contrasts to show how they picked their final model and then only present the coefficients for it. Use y-standardization. With y-standardization, instead of fixing the residual variance, you fix the variance of y* at 1. This does not work perfectly, but it does greatly reduce rescaling of coefficients between models. Listcoef gives the y-standardized coefficients in the column labeled bStdy, and they hardly changed at all

between the bivariate and multivariate models (.3158 and .2095 in the bivariate models, .3353 and .2198 in the multivariate model). The Karlson/Holm/Breen (KHB) method (Papers are available or forthcoming in both Sociological Methodology and Stata Journal) shows promise According to KHB, their method separates changes in coefficients due to rescaling from true changes in coefficients that result from adding more variables to the model (and does a better job of doing so than ystandardization and other alternatives) They further claim that with their method the total effect of a variable can be decomposed We would add that, when authors estimate sequences of models, it is often because they want to see how the effects of variables like race decline (or increase) after other variables are controlled for. The KHB method provides a parsimonious and more accurate way of depicting such changes. Well first present a simple example showing the relationship between

khb example 1 . webuse nhanes2f, clear . khb logit diabetes black || weight Decomposition using the KHB-Method Model-Type: logit Number of obs = 10335 Variables of Interest: black Pseudo R2 = 0.02 Z-variable(s): weight -----------------------------------------------------------------------------diabetes | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------black | Reduced | .6038012 .1236714 4.88 0.000 .3614098

.8461926 Full | .5387425 .1241889 4.34 0.000 .2953368 .7821483 Diff | .0650587 .0132239 4.92 0.000 .0391403 .0909771 ------------------------------------------------------------------------------ Possible interpretation of results In the line labeled Reduced, only black is in the model. .6038 is the total effect of black. However, blacks may have higher rates of diabetes both because of a direct effect of race on diabetes, and because of an indirect effect: blacks tend to be heavier than whites, and heavier people have higher rates of diabetes. Hence, the line labeled Full gives the direct effect of race (.5387) while the line labeled

Diff gives the indirect effect (.065) Khb Example 2 Matthew (2011; see Table 7, p. 240) examines the determinants of how likely a student is to feel they will have a job he or she enjoys (0 = 50 percent or lower; 1 = better than 50 percent). In the first model, race (0 = white, 1 = black) is the only independent variable. The estimated effect of race is -.510. In the final model controls are added for GPA, SES, and others. The effect of race declines to -.471, an apparent -.039 drop. The khb method shows that the decline is actually about twice as large. Again this is at least partly because the variance of y* becomes greater as more variables are added, causing coefficients to increase. . khb logit jobenjoy race || gpa ses sex educjob educimportant luckimportant sbprevent Decomposition using the KHB-Method Model-Type: logit Number of obs = Variables of Interest: race Pseudo R2 = Z-variable(s): gpa ses sex educjob educimportant luckimportant sbprevent

jobenjoy Coef. Reduced Full Diff -.5727334 -.4833004 -.089433 Std. Err. z P>|z| 6731 0.08 [95% Conf. Interval] race .10607 .1095584 .0349898

-5.40 -4.41 -2.56 0.000 0.000 0.011 -.7806269 -.6980309 -.1580117 -.3648399 -.26857 -.0208542 Comparing Logit and Probit Coefficients across groups We often want to compare the effects of variables across groups, e.g. we want to see if the effect of education is the same for men as it is for women Both OLS and logistic regression assume that error variances are the same for both groups When that assumption is violated in OLS, the consequences are often minor: standard errors and significance tests are a bit off but

coefficients remain unbiased. But when a binary or ordinal regression model incorrectly assumes that error variances are the same for all cases, the standard errors are wrong and (unlike OLS regression) the parameter estimates are wrong too. As Hoetker (2004, p. 17) notes, in the presence of even fairly small differences in residual variation, naive comparisons of coefficients [across groups] can indicate differences where none exist, hide differences that do exist, and even show differences in the opposite direction of what actually exists. Explanation. Suppose that y* were observed, but our estimation procedure continued to standardize the variable by fixing its residual variance at 3.29. How would differences in residual variability across groups affect the estimated coefficients? In the examples, the coefficients for the residuals reflect the differences in residual variability across groups. Any residual that does not have a coefficient attached to it is assumed to already have a variance of 3.29 Case 1: True coefficients are equal, residual variances differ Group 0 Group 1

True coefficients yi* xi1 xi 2 xi 3 i yi* xi1 xi 2 xi 3 2 i Standardized Coefficients yi* xi1 xi 2 xi 3 i yi* .5 xi1 .5 xi 2 .5 xi 3 i In Case 1, the true coefficients all equal 1 in both groups. But, because the residual variance is twice as large for group 1 as it is for group 0, the standardized s are only half as large for group 1 as for group 0. s are only half as large for group 1 as for group 0. Naive comparisons of coefficients can indicate differences where none exist. Case 2: True coefficients differ, residual variances differ Group 0 Group 1 True coefficients y xi1 xi 2 xi 3 i

yi* 2 xi1 2 xi 2 2 xi 3 2 i Standardized Coefficients yi* xi1 xi 2 xi 3 i yi* xi1 xi 2 xi 3 i * i In Case 2, the true coefficients are twice as large in group 1 as in group 0. But, because the residual variances also differ, the standardized s are only half as large for group 1 as for group 0. s for the two groups are the same. Differences in residual variances obscure the differences in the underlying effects. Naive comparisons of coefficients can hide differences that do exist. Case 3: True coefficients differ, residual variances differ even more Group 0 Group 1 True coefficients yi* xi1 xi 2 xi 3 i

Standardized Coefficients yi* xi1 xi 2 xi 3 i yi* 2 xi1 2 xi 2 2 xi 3 3 i 2 2 2 yi* xi1 xi 2 xi 3 i 3 3 3 In Case 3, the true coefficients are again twice as large in group 1 as in group 0. But, because of the large differences in residual variances, the standardized s are only half as large for group 1 as for group 0. s are smaller for group 0 than group 1. Differences in residual variances make it look like the Xs have smaller effects on group 1 when really the effects are larger. Naive comparisons of coefficients can even show differences in the opposite direction of what actually exists. Example: Allisons (1999) model for group comparisons Allison (Sociological Methods and Research, 1999) analyzes a data set of 301 male and 177 female biochemists.

Allison uses logistic regressions to predict the probability of promotion to associate professor. Table 1: Results of Logit Regressions Predicting Promotion to Associate Professor for Male and Female Biochemists (Adapted from Allison 1999, p. 188) Men Variable Coefficient SE Women Coefficient SE Intercept Duration Duration squared Undergraduate selectivity Number of articles Job prestige Log

likelihood Error variance -7.6802*** 1.9089*** .6814 .2141 -5.8420*** 1.4078*** .8659 .2573 .76 .74 2.78 2.24 -0.1432*** .0186 -0.0956***

.0219 .67 2.74 0.2158*** .0614 0.0551 .0717 .25 2.90 0.0737*** -0.4312*** .0116 .1088 0.0340** -0.3708*

.0126 .1560 .46 .86 5.37* 0.10 -526.54 -306.19 3.29 3.29 *p < .05, **p < .01, *** p < .001 Ratio of Coefficients Chi-Square for Difference As his Table 1 shows, the effect of number of articles on

promotion is about twice as great for males (.0737) as it is for females (.0340). If accurate, this difference suggests that men get a greater payoff from their published work than do females, a conclusion that many would find troubling (Allison 1999:186). BUT, Allison warns, women may have more heterogeneous career patterns, and unmeasured variables affecting chances for promotion may be more important for women than for men. Put another way, the error variance for women may be greater than the error variance for men This corresponds to the Case I we presented earlier. Allisons solution for the problem Ergo, in his Table 2, Allison adds a parameter to the model he calls delta. Delta adjusts for differences in residual variation across groups. Table 2: Logit Regressions Predicting Promotion to Associate Professor for Male and Female Biochemists, Disturbance Variances Unconstrained (Adapted from Allison 1999, p. 195) Variable

Intercept Female Duration Duration squared Undergraduate selectivity Number of articles Job prestige Articles x Female Log likelihood All Coefficients Equal Coefficient SE -7.4913*** .6845 -0.93918** .3624 1.9097*** .2147 -0.13970*** .0173 0.18195** .0615 0.06354*** -0.4460***

-0.26084* -836.28 *p < .05, **p < .01, *** p < .001 .0117 .1098 .1116 Articles Coefficient Unconstrained Coefficient SE -7.3655*** .6818 -0.37819 .4833 1.8384*** .2143 -0.13429*** .01749 0.16997*** .04959 0.07199*** -0.42046*** -0.16262

-0.03064 -835.13 .01079 .09007 .1505 .0173 The delta-hat coefficient value .26 in Allisons Table 2 (first model) tells us that the standard deviation of the disturbance variance for men is 26 percent lower than the standard deviation for women. This implies women have more variable career patterns than do men, which causes their coefficients to be lowered relative to men when differences in variability are not taken into account, as in the original logistic regressions. Allisons final model shows that the interaction term for Articles x Female is NOT statistically significant Allison concludes The apparent difference in the coefficients for article counts in Table 1 does not necessarily reflect a real difference in causal effects.

It can be readily explained by differences in the degree of residual variation Problems with Allisons Approach Williams (2009) noted various problems with Allisons approach Allison says you should first test whether residual variances differ across groups To do this, you contrast two models: In both cases, the coefficients are constrained to be the same for both groups but in one model the residual variances are also constrained to be the same, whereas in the other model the residual variances can differ. Allison says that if the test statistic is significant, you then allow the residual variances to differ The problem is that, if the residual variances are actually the same across groups but the effects of the Xs differ, the test can also be statistically significant! Put another way, Allisons test has difficulty distinguishing between cross-group differences in residual variability & differences in coefficients.

Hence, his suggested procedure can make it much likely that you will conclude residual variances differ when they really dont Erroneously allowing residual variances to differ Also, Allisons approach only allows for a single categorical variable in the variance equation. The sources of heteroskedasticity can be more complex than that; more variables may be involved, & some of these may be continuous Keele & Park (2006) show that a misspecificied variance equation, e.g. one in which relevant variables are omitted, can Finally, Allisons method only works with a dichotomous dependent variable Models with binary dvs that allow for heteroskedasticity can be difficult to estimate Ordinal dependent variables contain more information about Y* Williams (2009, 2010) therefore proposed a more powerful alternative A Broader Solution:

Heterogeneous Choice Models Heterogeneous choice/ location-scale models explicitly specify the determinants of heteroskedasticity in an attempt to correct for it. These models are also useful when the variability of underlying attitudes is itself of substantive interest. The Heterogeneous Choice (aka Location-Scale) Model Can be used for binary or ordinal models Two equations, choice & variance Binary case : xi xi xi g g Pr( yi 1) g exp( zi)

exp(ln( i )) i Allisons model with delta is actually a special case of a heterogeneous choice model, where the dependent variable is a dichotomy and the variance equation includes a single dichotomous variable that also appears in the choice equation. Allisons results can easily be replicated with the user-written routine oglm (Williams, 2009, 2010) . * oglm replication of Allisons Table 2, Model 2 with interaction added: . use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta", clear (Gender differences in receipt of tenure (Scott Long 06Jul2006)) . keep if pdasample (148 observations deleted) . oglm tenure female year yearsq select articles prestige f_articles, het(female) Heteroskedastic Ordered Logistic Regression Log likelihood = -835.13347 Number of obs LR chi2(8) Prob > chi2 Pseudo R2

= = = = 2797 415.39 0.0000 0.1992 -----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------tenure | female | -.3780597 .4500207 -0.84 0.401 -1.260084 .5039646 year | 1.838257

.2029491 9.06 0.000 1.440484 2.23603 yearsq | -.1342828 .017024 -7.89 0.000 -.1676492 -.1009165 select | .1699659 .0516643 3.29 0.001 .0687057 .2712261 articles | .0719821 .0114106 6.31 0.000 .0496178 .0943464 prestige | -.4204742 .0961206

-4.37 0.000 -.6088671 -.2320813 f_articles | -.0304836 .0187427 -1.63 0.104 -.0672185 .0062514 -------------+---------------------------------------------------------------lnsigma | female | .1774193 .1627087 1.09 0.276 -.141484 .4963226 -------------+---------------------------------------------------------------/cut1 | 7.365285 .6547121 11.25 0.000 6.082073 8.648497 -----------------------------------------------------------------------------. display "Allison's delta = " (1 - exp(.1774193)) / exp(.1774193)

-.16257142 As Williams (2009) notes, there are important advantages to turning to the broader class of heterogeneous choice models that can be estimated by oglm Dependent variables can be ordinal rather than binary. This is important, because ordinal vars have more information and hence lead to better estimation The variance equation need not be limited to a single binary grouping variable, which (hopefully) reduces the likelihood that the variance equation will be mis-specified Williams (2010) also notes that, even if the researcher does not want to present a heterogenous choice model, estimating one can be useful from a diagnostic standpoint Often, the appearance of heteroskedasticity is actually caused by other problems in model specification, e.g. variables are omitted, variables should be transformed (e.g. logged), squared terms should be added Williams (2010) shows that the heteroskedasticity issues in Allisons models go away if articles^2 is added to the model

. use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta", clear (Gender differences in receipt of tenure (Scott Long 06Jul2006)) . keep if pdasample (148 observations deleted) . * hetero effect becomes insignificant when articles^2 is added to model . oglm tenure i.female year c.year#c.year select articles prestige c.articles#c.articles, het(i.female) Heteroskedastic Ordered Logistic Regression Log likelihood = -822.79102 Number of obs LR chi2(8) Prob > chi2 Pseudo R2 = = = = 2797 440.07 0.0000 0.2110 --------------------------------------------------------------------------------------tenure |

Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------------+---------------------------------------------------------------tenure | 1.female | -.5612041 .2926285 -1.92 0.055 -1.134746 .0123373 year | 1.739173 .1933258 9.00 0.000 1.360261 2.118084 | c.year#c.year | -.1265911 .0162677 -7.78 0.000 -.1584751 -.094707

| select | .1710519 .0504271 3.39 0.001 .0722167 .2698872 articles | .1533931 .0244003 6.29 0.000 .1055694 .2012168 prestige | -.454951 .0936162 -4.86 0.000 -.6384354 -.2714666 | c.articles#c.articles | -.0026412 .0007213 -3.66 0.000

-.004055 -.0012274 ----------------------+---------------------------------------------------------------lnsigma | 1.female | .141633 .1377843 1.03 0.304 -.1284193 .4116853 ----------------------+---------------------------------------------------------------/cut1 | 7.40805 .648316 11.43 0.000 6.137374 8.678726 --------------------------------------------------------------------------------------- . * You don't near the female*articles interaction terms either . oglm tenure i.female year c.year#c.year select articles prestige c.articles#c.articles i.female#(c.articles c.articles#c.articles) Ordered Logistic Regression Log likelihood = -823.3041

Number of obs LR chi2(9) Prob > chi2 Pseudo R2 = = = = 2797 439.05 0.0000 0.2105 ---------------------------------------------------------------------------------------------tenure | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------------+---------------------------------------------------------------1.female | -.2588495 .3094906 -0.84 0.403 -.8654399

.3477409 year | 1.649057 .1651954 9.98 0.000 1.32528 1.972834 | c.year#c.year | -.11984 .0142338 -8.42 0.000 -.1477378 -.0919423 | select | .1583254 .0466906 3.39 0.001 .0668136 .2498372 articles | .1492724 .0295934

5.04 0.000 .0912703 .2072745 prestige | -.4386555 .0898018 -4.88 0.000 -.6146637 -.2626472 | c.articles#c.articles | -.0025455 .0009236 -2.76 0.006 -.0043558 -.0007352 | female#c.articles | 1 | -.007599 .0450029 -0.17 0.866 -.0958031 .0806051 |

female#c.articles#c.articles | 1 | .0001025 .0013542 0.08 0.940 -.0025517 .0027568 -----------------------------+---------------------------------------------------------------/cut1 | 7.091958 .5479358 12.94 0.000 6.018024 8.165893 ---------------------------------------------------------------------------------------------- Problems with heterogeneous choice models Models can be difficult to estimate, although this is generally less problematic with ordinal variables While you have more flexibility when specifying the variance equation, a misspecified equation can still be worse than no equation at all But the most critical problem of all may

be Problem: Radically different interpretations are possible An issue to be aware of with heterogeneous choice models is that radically different interpretations of the results are possible Hauser and Andrew (2006), for example, proposed a seemingly different model for assessing differences in the effects of variables across groups (where in their case, the groups were different educational transitions) They called it the logistic response model with proportionality constraints (LRPC): Instead of having to estimate a different set of coefficients for each group/transition, you estimate a single set of coefficients, along with one j proportionality factor for each group/ j proportionality factor for each group/ transition (j proportionality factor for each group/ 1 is constrained to equal 1) The proportionality constraints would hold if, say, the coefficients for the 2nd group were all 2/3 as large as the corresponding coefficients for the first group, the coefficients for the 3rd group were all half as large as for the first group, etc.

Models compared Hauser & Andrew note, however, that one cannot distinguish empirically between the hypothesis of uniform proportionality of effects across transitions and the hypothesis that group differences between parameters of binary regressions are artifacts of heterogeneity between groups in residual variation. (p. 8) Williams (2010) showed that, even though the rationales behind the models are totally different, the heterogeneous choice models estimated by oglm produce identical fits to the LRPC models estimated by Hauser and Andrew; simple algebra converts one models parameters into the others Williams further showed that Hauser & Andrews software produced the exact same coefficients that Allisons software did when used with Allisons data . . . . . . * Hauser & Andrew's original LRPC program * Code has been made more efficient and readable,

* but results are the same. Note that it * actually estimates and reports * lambda - 1 rather than lamba. program define lrpc02 1. tempvar theta 2. version 8 3. args lnf intercepts lambdaminus1 betas 4. gen double `theta' = `intercepts' + `betas' + (`lambdaminus1' * `betas') 5. quietly replace `lnf' = ln(exp(`theta')/(1+exp(`theta'))) if \$ML_y1==1 6. quietly replace `lnf' = ln(1/(1+exp(`theta'))) if \$ML_y1==0 7. end . . . > > > . * Hauser & Andrews original LRPC parameterization used with Allison's data

* Results are identical to Allisons Table 2, Model 1 ml model lf lrpc02 /// (intercepts: tenure = male female, nocons) /// (lambdaminus1: female, nocons) /// (betas: year yearsq select articles prestige, nocons), max nolog ml display Log likelihood = -836.28235 Number of obs Wald chi2(2) Prob > chi2 = = = 2797 180.60 0.0000 -----------------------------------------------------------------------------tenure | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+---------------------------------------------------------------intercepts | male | -7.490506 .6596634 -11.36 0.000 -8.783422 -6.197589 female | -6.230958 .6205863 -10.04 0.000 -7.447285 -5.014631 -------------+---------------------------------------------------------------lambdaminus1 | female | -.2608325 .1080502 -2.41 0.016 -.4726069 -.0490581 -------------+---------------------------------------------------------------betas | year | 1.909544 .1996937 9.56

0.000 1.518151 2.300936 yearsq | -.1396868 .0169425 -8.24 0.000 -.1728935 -.1064801 select | .1819201 .0526572 3.45 0.001 .0787139 .2851264 articles | .0635345 .010219 6.22 0.000 .0435055 .0835635 prestige | -.4462074 .096904 -4.60 0.000

-.6361357 -.2562791 ------------------------------------------------------------------------------ But, the theoretical concerns that motivated their models and programs lead to radically different interpretations of the results. According to Allisons theory (and the theory behind the heterogeneous choice model) apparent differences in effects between men and women are an artifact of differences in residual variability. Once these differences are taken into account, there is no significant difference in the effect of articles across groups, implying there is no gender inequality in the tenure process. Someone looking at these exact same numbers from the viewpoint of the LRPC, however, would conclude that the effect of articles (and every other variable for that matter) is 26 percent smaller for women than it is men. Those who believed that the LRPC was the theoretically correct model would likely conclude that there is substantial gender inequality in the tenure promotion process. For any given problem, strong substantive arguments might be made for one perspective or the other.

Researchers using any of these models should realize, however, that there is often if not always a radically Longs solution Long (2009) looks at these same sorts of problems, but proposes a different analytical approach. He says An alternative approach [to Allison] uses predicted probabilities. Since predicted probabilities are unaffected by residual variation, tests of the equality of predicted probabilities across groups can be used for group comparisons without assuming the equality of the regression coefficients of some variables Testing the equality of predicted probabilities requires multiple tests since group differences in predictions vary with the levels of the variables in the model. A simple example of Longs technique . use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta", clear (Gender differences in receipt of tenure (Scott Long 06Jul2006)) . keep if year <= 10 (148 observations deleted) . * Basic model - articles only . logit tenure articles i.male i.male#c.articles, nolog Logistic regression

Log likelihood = -982.04029 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 2797 121.58 0.0000 0.0583 --------------------------------------------------------------------------------tenure | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------articles | .0471351 .0104974 4.49

0.000 .0265605 .0677097 1.male | -.2198428 .1853876 -1.19 0.236 -.5831959 .1435102 | male#c.articles | 1 | .0552514 .0148436 3.72 0.000 .0261585 .0843444 | _cons | -2.501162 .140056 -17.86 0.000 -2.775667 -2.226657 ---------------------------------------------------------------------------------

. margins, dydx(male) at(articles=(0(1)50)) vsquish Conditional marginal effects Model VCE : OIM Expression : Pr(tenure), predict() dy/dx w.r.t. : 1.male 1._at : articles = 2._at : articles = [output deleted] 51._at : articles = Number of obs = 2797 0 1 50

-----------------------------------------------------------------------------| Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------1.male | _at | 1 | -.0140315 .0120717 -1.16 0.245 -.0376916 .0096285 2 | -.0111948 .0120559 -0.93 0.353 -.0348239 .0124343 [output deleted] 50 | .4562794

.1118036 4.08 0.000 .2371485 .6754104 51 | .4527383 .1139979 3.97 0.000 .2293066 .67617 -----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level. . marginsplot 0 Effects on Pr(Tenure) .2 .4 .6 .8 Conditional Marginal Effects of 1.male with 95% CIs 0

10 20 30 Total number of articles. 40 50 This simple example shows that the predicted probabilities of tenure for men and women differ little for small numbers of articles; indeed the differences are not even statistically significant for 8 articles or less. The differences become greater as the number of articles increases. For example, a women with 40 articles is predicted to be 45 percent less likely to The analyses can be further extended by adding more variables to the model, and/or by doing various subgroup analyses, e.g. comparing women at highprestige universities with men at high prestige Universities As Long says, this can lead to more complex

conclusions on how groups differ in the effect of a variable. If you are lucky, the differences in predicted probabilities may disappear altogether, e.g. variables added to the model may be able to account for the initially observed group differences. But if they dont Critique of Long The predictive margins produced by Longs approach might be seen as a sort of high-tech descriptives. They illustrate the predicted differences between groups after controlling for other variables. Description can be very useful. In this case we see that the predicted probabilities of tenure differ dramatically by gender and the number of articles published. Once such differences in predicted probabilities are discovered, policy makers may decide that some sort of corrective action should be At the same time, Longs approach may be frustrating because it doesnt try to explain why the differences exist. Are the differences due to the fact that men are rewarded more for the articles they

publish? Or, are they due to the fact that residual variability differs by gender? Perhaps womens careers are disrupted more by family or other matters. From a policy standpoint, we would like to know what is causing these observed differences in predicted probabilities If it is because women are rewarded less for each article they write, we may want to examine if womens work is not being evaluated fairly If it is because of differences in residual variability, we may want to further examine why that is. For example, if family obligations create more career hurdles for women then they do men, how can we make the workplace more family-friendly? But if we do not know what is causing the differences, we arent even sure where to start if we want to eliminate them. Long defends his approach by arguing: For many things, like his report on women in science for the NAS, predictions were of much more interest than was the slope of articles or unobserved heterogeneity. using other information, e.g. on the history of women in science, may resolve issues far more effectively than the types

of assumptions that are needed to be able to disentangle differences in coefficients and unobserved heterogeneity there are times when predictive margins provide more insights than simple answers to yes no hypotheses. For example, there can be cases where, overall the lines for men and women are the same (can't reject they are equal), yet they differ significantly when testing equality at a particular case. Both are valid, but overreliance on one, omnibus test is not a good thing in general. Further, as we have seen, when we try to explain group differences, the coefficients can be interpreted in radically different ways. Two researchers could look at the exact same set of results, and one could conclude that coefficients differ across groups while another could say that it is residual variability that differs. Given such ambiguity, some might argue that you should settle for description and not strive for explanation (at least not with the current data). Others might argue that you should go with the model that you think makes most theoretical sense, while acknowledging that alternative interpretations of the results are possible.

At this point, it is probably fair to say that the descriptions of the problem may be better, or at least more clear-cut, than the various proposed solutions. Selected References Allison, Paul. 1999. Comparing Logit and Probit Coefficients Across Groups. Sociological Methods and Research 28(2): 186-208. Hauser, Robert M. and Megan Andrew. 2006. Another Look at the Stratification of Educational Transitions: The Logistic Response Model with Partial Proportionality Constraints. Sociological Methodology 36(1):1-26. Hoetker, Glenn. 2004. Confounded Coefficients: Extending Recent Advances in the Accurate Comparison of Logit and Probit Coefficients Across Groups. Working Paper, October 22, 2004. Retrieved September 27, 2011 ( http://papers.ssrn.com/sol3/papers.cfm?abstract_id=609104) Keele, Luke and David K. Park. 2006. Difficult Choices: An Evaluation of Heterogeneous Choice Models. Working Paper, March 3, 2006. Retrieved March 21, 2006 (https://www3.nd.edu/~rwilliam/oglm/ljk-021706.pdf)

Karlson, Kristian B., Anders Holm and Richard Breen. 2011. Comparing Regression Coefficients between Same-Sample Nested Models using Logit and Probit: A New Method. Forthcoming in Sociological Methodology. Kohler, Ulrich, Kristian B. Carlson and Anders Holm. 2011. Comparing Coefficients of nested nonlinear probability models. Forthcoming in The Stata Journal. Long, J. Scott. 2009. Group comparisons in logit and probit using predicted probabilities. Working Paper, June 25, 2009. Retrieved September 27, 2011 ( http://www.indiana.edu/~jslsoc/files_research/groupdif/groupwithprobabilities/groups-with-prob-2009-06-25.pdf ) Long, J. Scott and Jeremy Freese. 2006. Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition. College Station, Texas: Stata Press. Williams, Richard. 2009. Using Heterogeneous Choice Models to Compare Logit and Probit Coefficients across Groups. Sociological Methods & Research 37(4): 531-559. A pre-publication version is available at https://www3.nd.edu /~rwilliam/oglm/RW_Hetero_Choice.pdf.

Williams, Richard. 2010. Fitting Heterogeneous Choice Models with oglm. The Stata Journal 10(4):540-567. Available at http:// For more information, see: https://www3.nd.edu/~rwilliam