Working Papers
Errors in Reporting and Imputation of Government Benefits and Their Implications (with Pablo Celhay and Bruce D. Meyer)
Recent studies document extensive errors in household surveys that bias important estimates. We link administrative cash welfare and SNAP records to three major U.S. household surveys to answer important questions these studies raise. First, we show that survey data misrepresent patterns of participation in multiple programs and thereby distort total transfers to the poorest households and how households navigate the complex welfare system. On the positive side, survey data capture the reach of the safety net better than receipt of individual programs. Second, we examine error due to item non-response and imputation, as well as whether imputation improves estimates. Item non-respondents have higher receipt rates than the population, even conditional on many covariates. The assumptions for consistent estimates in multivariate models fail both when excluding item non-respondents and when using the imputed values. In models of program receipt, estimates from the linked data favor excluding item non-respondents rather than using their imputed values. We show that such analyses can help researchers make more informed decisions on the use of imputed values. Along the way, we extend prior work by documenting for other programs and geographies a pattern of pronounced misreporting that varies with key demographics. Our estimates allow researchers to gauge or correct bias in models of program participation, because they predict when these biases are (or are not) large. Most notably, the survey error we document cause the differences in receipt rates between Blacks or Hispanics and those of whites to be 1.4 to 6 times larger than survey estimates.
Stigma in Welfare Programs (with Pablo Celhay and Bruce D. Meyer)
Stigma of welfare participation is important for policy and survey design, because it deters program take-up and increases misreporting. Stigma is also relevant to the literature on social image concerns, yet empirical evidence is scant because stigma is difficult to empirically identify. We use a novel approach to studying stigma by examining the relationship between program participation in a recipient's local network and underreporting program participation in surveys. We find a robust negative relationship and rule out explanations other than stigma. Stigma decreases when more peers engage in the stigmatized behavior and when such actions are less observable.
A Nonparametric Two-Sample Test of Conditional Independence
ProgramAssumptions that a continuous and a discrete variable are independent conditional on covariates are ubiquitous in, among others, program evaluation, discrete choice models, missing data problems and studies of teacher or peer effects. Instead of testing conditional independence, current studies at best compare means, which only tests correlation. I propose an assumption-free, non-parametric Kolmogorov test that is simple to implement and has power against all alternatives at distance 1/√N that differ at any point in the joint support of the distributions of the covariates. Other non- and semi-parametric tests can easily be based on the restriction I test. The test can be used for hypotheses that depend on estimated parameters such as a location shift by (conditional) treatment effects or a regression adjustment. Inference can be conducted by simulation.
National Experimental Wellbeing Statistics (with Adam Bee, Joshua Mitchell, Jonathan Rothbaum, Carl Sanders, Lawrence Schmidt, and Matthew Unrath)
WebsiteThis is the U.S. Census Bureau's first release of the National Experimental Wellbeing Statistics (NEWS) project. The NEWS project aims to produce the best possible estimates of income and poverty given all available survey and administrative data. We link survey, decennial census, administrative, and commercial data to address measurement error in income and poverty statistics. We estimate improved (pre-tax money) income and poverty statistics for 2018 by addressing several possible sources of bias documented in prior research. We address biases from (1) unit nonresponse through improved weights, (2) missing income information in both survey and administrative data through improved imputation, and (3) misreporting by combining or replacing survey responses with administrative information. Reducing survey error substantially affects key measures of wellbeing: We estimate median household income is 6.3 percent higher than in the survey estimate, and poverty is 1.1 percentage points lower. These changes are driven by subpopulations for which survey error is particularly relevant. For householders aged 65 and over, median household income is 27.3 percent higher than in the survey estimate and for people aged 65 and over, poverty is 3.3 percentage points lower than the survey estimate. We do not find a significant impact on median household income for householders under 65 or on child poverty. Finally, we discuss plans for future releases: addressing other potential sources of bias, releasing additional years of statistics, extending the income concepts measured, and including smaller geographies such as state and county.
Work in Progress
Please send me an email (nikolasmittag@posteo.de) for further information or a current draft.How Does Potential Unemployment Insurance Benefit Duration Affect Re-employment Timing and Wages? (with R. Felder and H. Frings)
How potential benefit duration (PBD) of unemployment insurance (UI) affects wages and matching is crucial to evaluate PBD extensions as a policy tool and to understand the causal relation between unemployment, job search and wages. Recent studies provide evidence from quasi-experiments, but disagree even on the sign of (local average) wage effects. We re-visit the regression discontinuity design of Schmieder et al. (2016), but use more detailed data and a wage decomposition to re-analyze the effects of PBD in a framework that allows for unrestricted heterogeneity of both duration and wage effects. Our (preliminary) results indicate treatment effect heterogeneity that casts doubt on simple mechanisms and complicates learning from (local) average effects. Specifically, we first show that duration effects are heterogeneous, which confirms that treatment changes dynamic selection. Our results so far suggest that PBD almost exclusively prolongs unemployment spells ending close to exhaustion points. We examine pre-determined wage components to (partly) separate dynamic selection from dynamic treatment effects. Dynamic selection is potentially non-monotonic and may create spurious treatment effects in our sample. We find that PBD affects wages only through the firm fixed effect, which adds to evidence on the importance of firms. It suggests that wage losses are likely due to firm attributes (such as bargaining power) rather than individual attributes (such as productivity). Purging (parts of) dynamic selection from re-employment wages shows that the steep wage decline is mainly driven by the firm fixed effect, but also by time-varying unobservables. The marked exhaustion effects have at most a small impact on the effect of PBD on wages. Wage effects appear to accumulate through employment at lower-paying firms at short unemployment durations. Our current results are work in progress, but demonstrate that heterogeneity in both duration and wage effects alters the interpretation of common analyses and thereby potentially reconciles diverging findings. We develop tools to make progress on important questions in the presence of essential heterogeneity.
The Dynamics of Single Parenthood: New Evidence from Linked Survey and Administrative Data (with D. Wu)
Race, Ethnicity and Measurement Error (with B.D. Meyer and D. Wu)
Publications
Estimating Effects of School Quality using Multiple Proxies. With Pedro Bernal and Javaeria A. Qureshi. Labour Economics (2016).
Misclassification in Binary Choice Models. With Bruce D. Meyer. Journal of Econometrics (2017).Programs
Two Simple Methods to Improve Official Statistics for Small Subpopulations. Survey Research Methods (2018). , ProgramsAdd. Tables
Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness and Holes in the Safety Net. With Bruce D. Meyer. AEJ: Applied Economics (2019). Web Appendix Tables
Correcting for Misreporting of Government Benefits. AEJ: Economic Policy (2019)., Parameters Web Appendix
Misreporting of Government Transfers: How Important are Survey Design and Geography? With Bruce D. Meyer. Southern Economic Journal (2019).
Creating Improved Survey Data Products Using Linked Administrative-Survey Data. With Michael Davern and Bruce D. Meyer. Journal of Survey Statistics and Methodology (2019).
Voter Response to Hispanic Sounding Names: Evidence from Down Ballot Statewide Elections. With Suzanne K. Barth and Kyung H. Park. Quarterly Journal of Political Science (2019).AppendixWorking Paper Version,
A Simple Method to Estimate Large Fixed Effects Models Applied to Wage Determinants. Labour Economics (2019).Longer IZA Working PaperPrograms,Appendix,
Combining Administrative and Survey Data to Improve Income Measurement. With Bruce D. Meyer in Administrative Records for Survey Methodology, ed. A.Y. Chun, M. Larson, J. Reiter and G. Durrant, Wiley: NY. (2021). IZA Working Paper
An Empirical Total Survey Error Decomposition Using Data Combination. With Bruce D. Meyer. Journal of Econometrics (2021).
Errors in Survey Reporting and Imputation and their Effects on Estimates of Food Stamp Program Participation. With Bruce D. Meyer and Robert M. Goerge. Journal of Human Resources (2022).
What Leads to Measurement Error? Evidence from Reports of Program Participation in Three Surveys With Pablo Celhay and Bruce D. Meyer. Journal of Econometrics (Forthcoming).
Not in Progress
Distributional Impact Analysis Toolkit (with Guadalupe Bedoya, Luca Bitarello and Jonathan Davis)
Program evaluations often focus on average treatment effects. However, average treatment effects miss important aspects of policy evaluation, such as the impact on inequality and whether treatment harms some individuals. A growing literature develops methods to evaluate such issues by examining the distributional impacts of programs and policies. This toolkit reviews methods to do so, focusing on their application to randomized control trials. The paper emphasizes two strands of the literature: estimation of impacts on outcome distributions and estimation of the distribution of treatment impacts. The article then discusses extensions to conditional treatment effect heterogeneity, that is, to analyses of how treatment impacts vary with observed characteristics. The paper offers advice on inference, testing, and power calculations, which are important when implementing distributional analyses in practice. Finally, the paper illustrates select methods using data from two randomized evaluations.
Imputations: Benefits, Risks and a Method for Missing Data
DataPrograms,Missing data is a frequent problem in economics, either because some variables are missing from a data set or values are missing for some observations. Researchers usually either omit the affected variables and observations or impute them. While the consequences of the former are well understood, the imputation and missing data literature has focused on the conditions under which they lead to unbiased estimates. These conditions often do not hold, but there is little evidence on the circumstances under which missing data methods improve estimates if the conditions for unbiased estimates are violated. I first examine these conditions by discussing the circumstances under which missing data methods can be beneficial and common sources of bias. I then discuss advantages and problems of common missing data methods. Two important problems are that most methods work well for some models, but poorly for others and that researchers often do not have enough information to use imputed observations in common data sets appropriately. To address these problems, I develop a method based on the conditional density that works well for a wide range of models and allows producers of the data to incorporate private information and expertise, but still allows users of the data to adjust the imputations to their application and use them appropriately. Applications to some common problems show that the conditional density method works well in practice.