Advanced Statistical Analysis: Advanced tools from epidemiologists and demographers: Poisson regressions, age-period-cohort models, etc (Dec 14) Instructor: Prof. Louis Chauvel 1 This session: Advanced tools from epidemiologists and demographers Defining the fields of Epidemiology / Biostats / Demo The study (description and search for causes) of diseases in populations Set of specific tools including count, aging, cohort models Set of references As usual : CHAPTERS 7 (glm) & 11 (Some epidemiology) in the STATA ADVANCED MANUAL: http://www.louischauvel.org/stata_manuel_advanced.pdf

Plus more recent 2 Main references Find them online on http://www.a-z.lu/ 3 Other references Find them online on http://www.a-z.lu/ 4 SEE ALSO Find this online at : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158538 5 This session Reminders on the glm generalized linear model Examples of Poisson models The age-period-cohort model in demography & epidemiology

New dveloppements on the APC model 6 Reminders on the glm generalized linear model 7 Reminders on the glm generalized linear model CHAPTER 7 (glm) in the STATA ADVANCED MANUAL: http://www.louischauvel.org/stata_manuel_advanced.pdf Ordinary Least Square (OLS), Logit, Poisson, etc. models find the same general expression where only distribution (family in stata) and link function change, given the nature of the outcome variable OV OV = continuous OV = binary OV = count 8 Typical cases See do file in the first part of: http://www.louischauvel.org/glm.do

Ordinary Least Square (OLS) reg day i.gender i.ethnic i.class glm day i.gender i.ethnic i.class, f(gauss) l(id) Logit model logit absent i.gender i.ethnic i.class glm absent i.gender i.ethnic i.class, f(bin) l(logit) Poisson model poisson days i.gender i.ethnic i.class glm days i.gender i.ethnic i.class, f(poisson) l(log) See the options of glm help glm 9 Poisson models 1 0 Examples of Poisson models on mortality Why Poisson? https://fr.wikipedia.org/wiki/Sim%C3%A9on_Denis_Poisson

When outcome is a count variable (counts of the number of times that events occur during a time period) a suitable model is the Poisson regression. Count variables: days in absentia, nb of life events, death, etc. In case of mortality: counts of death and exposure to risk (pop at risk) SEE https://www.mortality.org/ See do file in the second part of: http://www.louischauvel.org/glm.do 1 1 Examples of Poisson models on mortality Log of death increases linearly 6 by 10% each year 5 Doubles each 7th year Poisson coefficients (=log of death rates) by age groups in 2010-2014 4 3 2 1

0 30 40 50 60 70 80 90 keep if age>=40 & 90>age glm dm i.age if ye==2010, f(poisson) l(log) exp(rm) glm dm age if ye==2010, f(poisson) l(log) exp(rm) Exercise 1: for women? Exercise 2: across years? 1 2 Introduction to APC Age-Period-Cohort models

See pp 230 sqq of 1 3 Introduction to APC Age-Period-Cohort models Consider effects of age, of period, of cohort Collinearity of A = P - C Non linear effects: age thresholds, period fractures, cohort scars 1 4 Introduction to APC Age-Period-Cohort models Methodology I : the base A = P C The Lexis Diagram (1872) Age 80 Isochron: Life line: observation in 1968 C 1918 cohort born in 1948 60

C 1978 40 Age at year of 20 observation: 20 0 1890 1910 1930 1950 1970 1990 2010 2030 Period

BUT ! How to distinguish durable scarring effects and fads ??? Hysteresis = stability versus Resilience = resorption of scars 15 Statistical background: Age Period Cohort models Separate the effects of age, period of measurement and cohort. Problematic colinearity: cohort (date of birth) = period (date of measurement) - age (Ryder 1965, Mason et al. 1973, Mason / Fienberg 1985, Mason / Smith 1985, Yang Yang et al. 2006 2008, Smith 2008, Pampel 2012) 16 Our method A: APCD APCD (detrended): are some cohorts above or below a linear trend of long-run economic growth? Basically, the APCD is a bump detector. y apc a p c 0 rescale (a ) 0 rescale (c) 0 j x j i j p c a (APCD)

a p c 0 p c a Slope ( ) Slope ( ) Slope ( ) 0 a a p p c c min( c ) c max(c) STATA ssc install apcd => available ado file PLZ see more on www.louischauvel.org/apcdex.htm 17 apcd syntax is based on the glm : ssc install apcd apcd dep var control vars if weight, age(var) period(var) glm ptions All glm options including familyname

Description -------------------------------------------------gaussian Gaussian (normal) igaussian inverse Gaussian poisson Poisson etc -------------------------------------------------linkname Description -------------------------------------------------identity identity log log logit logit probit probit etc -------------------------------------------------- 18 http://www.louischauvel.org/vet.do A STATA example on Veterans (CPS extracts ipums) 1965-2015 N=322,243

use "http://www.louischauvel.org/apcgoex.dta", clear * race / 1=caucasian AA=2 * a5 / age * y5 / year * labincome / medianized labor personal income * pweight / sampling weight * vet / 1=veteran 0=no veteran satus * ED / level of education 6=drop out 7=ged 8=comunity coll ... 11=Ba 12=Ma+ * female / male=0 female = 1 * lnlab / ln of labincome keep if fem==0 & a5<65 gen ba=ED==11 | ED==12 ssc install apcd ssc install apcgo tab a5 y5 [w=pwei] , s(vet) nofr nost noobs w * are there non-linear variations of veterans by cohort? (% points)> apcd vet [w=pwei], age(a5) period(y5) drop *apc*

* are there non-linear variations of veterans by cohort? (logit coeff)> apcd vet [w=pwei], age(a5) period(y5) drop *apc* stop * what is the share of veterans in a cohort? (% points)> apctlag vet [w=pwei], age(a5) period(y5) * what is the share of veterans in a cohort? (logit coeff)> apctlag vet [w=pwei], age(a5) period(y5) f(bin) l(logit) * what is the share of BA owners in a cohort? (% points) > apctlag ba [w=pwei], age(a5) period(y5) * what is the share of BA owners in a cohort?> apctlag ba [w=pwei], age(a5) period(y5) f(bin) l(logit) * how the veteran premium changed? apcgo lnlab [w=pwei], gap(vet) age(a5) period(y5) * what is the role of education in the veteran premium change? xi: apcgo lnlab i.ED if fem==0 [w=pwei], gap(vet) age(a5) period(y5) * with bootstrap confidence intervals (time consuming ! => rep(10) is minimalist but you can change...) apcgo lnlab [w=pwei], gap(vet) age(a5) period(y5) rep(10) 19 Period Ex: U.S. veterans in % of the male population (CPS ipums) 1965-2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Age 25 45% 38% 42% 19% 12% 8% 5% 5% 5% 30 61% 47% 41% 43% 20% 12% 11% 10% 8% 6% 6% 35 70% 63% 47% 40% 41% 18% 12% 11% 11% 9%

7% 40 78% 71% 65% 46% 41% 40% 20% 13% 12% 11% 10% 45 63% 78% 71% 64% 48% 41% 42% 20% 14% 13% 12% 50 36% 66% 77% 70% 65% 47% 42% 42% 20% 14% 14% 55 22% 35% 63% 77% 70% 63% 48% 42% 41% 20% 14% 60 14% 22% 35% 66% 75% 69% 62% 43% 42% 40% 19% Cohort 1905 =? 9%

10% Cohort 1925 =WWII Cohort 1965 =? Cohort 1945 = Vietnam W SEE THE STORY IN: Alair MacLean and Meredith Kleykamp. 2016. Income Inequality and the Veteran Experience. Annals of the American Academy of Political and Social Science 663:99-116. Ex: Veterans as % of the male population (CPS ipums) APCD model 1965-2015 Cohort 1965 =? Cohort 1905 =? Cohort 1925 =WWII

Cohort 1945 = Vietnam W Our method B: the larger APC family (with STATA ssc install ) APCD (detrended): are some cohorts above or below a linear trend of long-run economic growth? Basically, the APCD is a bump detector. ssc install apcd APCTLAG (trended by cohort once average lagged age effect fitted): which cohort increased or declined. The program is a part of the ssc install apcgo APCGO (gap / Oaxaca): once controlled by other covariates, did the gap between group 0 and 1 changed. ssc install apcgo APCH (hystersis) is the cohort apcd effect bump durable or not over time Refinements to come (faster bootstraps, better controls, simplification, etc.) 22 APCT-lag (trended with lag) See Paper Online https://orbilu.uni.lu/bitstream/10993/35746/1/LIS%20WP%20gender%20gap%20final%20May%202018.pdf APC-Detrended as an identifiable solution of age, period and cohort non-linear effects (Chauvel, 2013, Chauvel and Schrder. 2014, Chauvel et al., 2016) u apc a p c 0 rescale(a) 0 rescale(c) 0 ( APCD) where a , p , c are sum zero and trend zero; 0 and 0 absorb age and cohort trend 0 is the constant is a two-dimensional linear (=hyperplane) trend 0 rescale (a ) 0 rescale (c)

are 3 vectors of age, period and cohort fluctuations a , identification p , c To solve the problem (a=p-c ), a meaningful constraint is needed: trend in a = the average of the longitudinal shift observed in uapc 23 See Paper Online The APC-lag solution age 10 1 2 3 9 2 3 4 8 3 4 5 7 4 5 6 6 5 6 7 5 6 7 8 4 7 8 9 3 8 9 10 2 9 10 11 1 10 11 12 1 2 3 cohort

4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 4 5 6 7 8 9 10 11 12 13 14 15 6 = [S (u(a+1, p+1, c) - uapc)] / [(A-1) (P-1)] is the average longitudinal age effect along cohorts

(= the average difference between ( APCL) uu (a+1, p+1, c) where S( ) 0 and S( ) 0 ; Trend( ) 0; Trend( ) and its cohort lag uapc across the Operator table) Trend for age coefficients: 7 8 apc a p c 9 a p p a 10 11 12 Trend( a ) 12[S a (2i A 1)] / [(A - 1)A(A 1)] 13 APC-lag delivers a unique estimate of 14 vector c a cohort indexed measure of gaps

15 Average c is the general intensity of the gap 16 Trend of c measures increases/decreases of the gap in the window of observation 7 period Values of c show possible non linearity The c can be compared between countries 24 Ex: Veterans as % of the male population (CPS ipums) APCTLAG model 1965-2015 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Cohort 1965 =?

0.2 0.1 0 1900 1910 1920 1930 Cohort 1905 =? 1940 1950 1960 Cohort 1925 =WWII 1970 1980 1990

Cohort 1945 = Vietnam W 25 Ex: BA owners % of the male population (CPS ipums) 1965-2015 APCTLAG model Skyrocketing tuition and fees Cohort 1948 "Going to College to Avoid the Draft: The Unintended Legacy of the Vietnam War." (with Thomas Lemieux), American Economic Review 91, May 2001. Cohort 1925 =GI bill of rights 26 APC-GO (Gap/Oaxaca) model Now on Stata: ssc install apcgo

APC-GO is a APC model to provide a cohort analysis in gaps in outcomes between 2 groups after controlling for relevant explanatory variables e.g. (gender) gaps in income net of education effects or (racial) gaps in education net of State/county effects Ingredients: 1. Computation of Oaxaca decomposition in unexplained/explained gaps by A x P cell 2. Estimate of APC-lag gaps with a focus on cohort 3. Bootstrapping to obtain confidence intervals 27 Structure of data See Paper Online Lexis table / diagram: Cross-sectional surveys including one outcome y and controls x Condition: Large sample with data for each cell (APC) of the Lexis table cohort 4 5 5 6

6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 4 5 6 7 8 9 10 11 12 13 14 15 6 7 8 9 10 11 12

13 14 15 16 7 period c=pa+A Age a indexed by a from 1 to A Period by p from 1 to P Cohort by c = p a + A from 1 to C age 10 1 2 3 9 2 3 4 8 3 4 5 7 4 5 6 6 5 6 7 5 6 7 8 4 7 8 9 3 8 9 10 2 9 10 11 1 10 11 12 1 2 3 28

28 Part II: APC-lag of the uapc See Paper Online APC-Detrended as an identifiable solution of age, period and cohort non-linear effects (Chauvel, 2013, Chauvel and Schrder. 2014, Chauvel et al., 2016) u apc a p c 0 rescale(a) 0 rescale(c) 0 ( APCD) where a , p , c are sum zero and trend zero; 0 and 0 absorb age and cohort trend 0 is the constant is a two-dimensional linear (=hyperplane) trend 0 rescale (a ) 0 rescale (c) are 3 vectors of age, period and cohort fluctuations a , identification p , c To solve the problem (a=p-c ), a meaningful constraint is needed: trend in a = the average of the longitudinal shift observed in uapc 30 Part II: APC-lag of the uapc See Paper Online

The APC-lag solution age 10 1 2 3 9 2 3 4 8 3 4 5 7 4 5 6 6 5 6 7 5 6 7 8 4 7 8 9 3 8 9 10 2 9 10 11 1 10 11 12 1 2 3 cohort 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14

4 5 6 7 8 9 10 11 12 13 14 15 6 = [S (u(a+1, p+1, c) - uapc)] / [(A-1) (P-1)] is the average longitudinal age effect along cohorts (= the average difference between ( APCL) uu (a+1, p+1, c) where S( ) 0 and S( ) 0 ; Trend( ) 0; Trend( ) and its cohort lag uapc across the Operator table) Trend for age coefficients: 7 8

apc a p c 9 a p p a 10 11 12 Trend( a ) 12[S a (2i A 1)] / [(A - 1)A(A 1)] 13 APC-lag delivers a unique estimate of 14 vector c a cohort indexed measure of gaps 15 Average c is the general intensity of the gap 16 Trend of c measures increases/decreases of the gap in the window of observation 7 period Values of c show possible non linearity The c can be compared between countries

31 Summary APC-GO combines the different steps 1. Oaxaca of the cells of the initial Lexis table data generates an aggregated Oaxaca Lexis table of measures of gaps unexplained by controls 2. APC-lag of the Oaxaca Lexis table deliver notably c coefficients 3. Bootstrapping to obtain confidence intervals See Stata ado file, ssc install apcgo 32 Implementation on different examples: Veterans and the veteran premium http://www.louischauvel.org/vet.do Suicide rates in a comparative perspective http://www.louischauvel.org/suicplosone.do Obesity epidemic http://www.louischauvel.org/apcobese.do and the ppt http://www.louischauvel.org/apc_obese.pptx 33 Ex: Veterans wage premium (diff of log)

APCGO model (GO=Gap Oaxaca) 1965-2015 0.4 WWII veterans premium >30% 0.35 0.3 0.25 0.2 Cohort 1955 Premium<0 0.15 0.1 0.05 0 1900 -0.05 1910 1920

1930 1940 1950 1960 1970 1980 1990 -0.1 -0.15 SEE THE STORY IN: Alair MacLean and Meredith Kleykamp. 2016. Income Inequality and the Veteran Experience. Annals of the American Academy of Political and Social Science 663:99-116. 34