Working Papers
Errors in Reporting and Imputation of Government Benefits and Their Implications (with Pablo Celhay and Bruce D. Meyer)
Recent studies document extensive errors in household surveys that bias important estimates. We link administrative cash welfare and SNAP records to three major U.S. household surveys to answer important questions these studies raise. First, we show that survey data misrepresent patterns of participation in multiple programs and thereby distort total transfers to the poorest households and how households navigate the complex welfare system. On the positive side, survey data capture the reach of the safety net better than receipt of individual programs. Second, we examine error due to item non-response and imputation, as well as whether imputation improves estimates. Item non-respondents have higher receipt rates than the population, even conditional on many covariates. The assumptions for consistent estimates in multivariate models fail both when excluding item non-respondents and when using the imputed values. In models of program receipt, estimates from the linked data favor excluding item non-respondents rather than using their imputed values. We show that such analyses can help researchers make more informed decisions on the use of imputed values. Along the way, we extend prior work by documenting for other programs and geographies a pattern of pronounced misreporting that varies with key demographics. Our estimates allow researchers to gauge or correct bias in models of program participation, because they predict when these biases are (or are not) large. Most notably, the survey error we document cause the differences in receipt rates between Blacks or Hispanics and those of whites to be 1.4 to 6 times larger than survey estimates.
How Does Potential Unemployment Insurance Benefit Duration Affect Re-employment Timing and Wages? (with Rahel Felder and Hanna Frings)
Recent papers identify the effects of unemployment insurance and potential benefit duration (PBD) on unemployment duration and reemployment wages using quasi-experiments. To make known problems of heterogeneity in quasi-experiments tractable, they often use models of job search, but we argue that letting the data speak without restrictions remains surprisingly informative. We focus on two broad questions: How informative are the local average effects quasi-experiments identify and what can we learn about causes and mechanisms from quasi-experiments in the presence of heterogeneous treatment effects? We first line out a framework for treatment effect heterogeneity with two interdependent outcomes, such as duration and wages, and then re-examine the effects of longer PBD in \citet{Schmiederetal2016}. Local average effects become more informative when amended with other parameters identified by (quasi-)randomization: Duration effects of PBD almost exclusively prolong few long spells, which helps to explain differences between studies. Dynamic selection into reemployment timing is non-monotonic, but does not change with PBD at short durations so dynamic treatment effects are identified at short durations. For wage effects of PBD, we find neither evidence of positive effects nor meaningful heterogeneity. Even though key structural parameters are not identified because LATE confounds average effects with the covariance of first and second stage effects, the data remain informative about causes and mechanisms. A wage decomposition shows that wage loss operates through the firm fixed effect, which speaks against individual-based causes such as skill depreciation or bargaining. Using dynamic treatment effects and mediation analyses, we find PBD to affect wages even for workers who do not change unemployment duration, i.e. directly. The negative direct effect we find casts doubt on key assumptions of common models of job search.
OLS with Heterogeneous Coefficients
Regressors often have heterogeneous effects in the social sciences, which are usually modeled as unit-specific slopes. OLS is frequently applied to these correlated coefficient models. I first show that without restrictions on the relation between slopes and regressors, OLS estimates can take any value including being negative when all individual slopes are positive. I derive a simple formula for the bias in the OLS estimates, which depends on the covariance of the slopes with the squared regressor. While instrumental variable methods still allow estimation of (local) average effects under the additional assumptions that the instrument is independent of the coefficients in the first stage and reduced form equations, the results here imply complicated biases when these assumptions fail.
A Nonparametric Two-Sample Test of Conditional Independence
ProgramAssumptions that a continuous and a discrete variable are independent conditional on covariates are ubiquitous in, among others, program evaluation, discrete choice models, missing data problems and studies of teacher or peer effects. Instead of testing conditional independence, current studies at best compare means, which only tests correlation. I propose an assumption-free, non-parametric Kolmogorov test that is simple to implement and has power against all alternatives at distance 1/√N that differ at any point in the joint support of the distributions of the covariates. Other non- and semi-parametric tests can easily be based on the restriction I test. The test can be used for hypotheses that depend on estimated parameters such as a location shift by (conditional) treatment effects or a regression adjustment. Inference can be conducted by simulation.
National Experimental Wellbeing Statistics (with Adam Bee, Joshua Mitchell, Jonathan Rothbaum, Carl Sanders, Lawrence Schmidt, and Matthew Unrath)
WebsiteThis is the U.S. Census Bureau's first release of the National Experimental Wellbeing Statistics (NEWS) project. The NEWS project aims to produce the best possible estimates of income and poverty given all available survey and administrative data. We link survey, decennial census, administrative, and commercial data to address measurement error in income and poverty statistics. We estimate improved (pre-tax money) income and poverty statistics for 2018 by addressing several possible sources of bias documented in prior research. We address biases from (1) unit nonresponse through improved weights, (2) missing income information in both survey and administrative data through improved imputation, and (3) misreporting by combining or replacing survey responses with administrative information. Reducing survey error substantially affects key measures of wellbeing: We estimate median household income is 6.3 percent higher than in the survey estimate, and poverty is 1.1 percentage points lower. These changes are driven by subpopulations for which survey error is particularly relevant. For householders aged 65 and over, median household income is 27.3 percent higher than in the survey estimate and for people aged 65 and over, poverty is 3.3 percentage points lower than the survey estimate. We do not find a significant impact on median household income for householders under 65 or on child poverty. Finally, we discuss plans for future releases: addressing other potential sources of bias, releasing additional years of statistics, extending the income concepts measured, and including smaller geographies such as state and county.
Work in Progress
Please send me an email (nikolasmittag@posteo.de) for further information or a current draft.The Dynamics of Single Parenthood: New Evidence from Linked Survey and Administrative Data (with D. Wu)
Imputing Government Transfer (with B.D. Meyer and D. Wu)
Publications
Estimating Effects of School Quality using Multiple Proxies. With Pedro Bernal and Javaeria A. Qureshi. Labour Economics (2016).
Misclassification in Binary Choice Models. With Bruce D. Meyer. Journal of Econometrics (2017).Programs
Two Simple Methods to Improve Official Statistics for Small Subpopulations. Survey Research Methods (2018). , ProgramsAdd. Tables
Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness and Holes in the Safety Net. With Bruce D. Meyer. AEJ: Applied Economics (2019). Web Appendix Tables
Correcting for Misreporting of Government Benefits. AEJ: Economic Policy (2019)., Parameters Web Appendix
Misreporting of Government Transfers: How Important are Survey Design and Geography? With Bruce D. Meyer. Southern Economic Journal (2019).
Creating Improved Survey Data Products Using Linked Administrative-Survey Data. With Michael Davern and Bruce D. Meyer. Journal of Survey Statistics and Methodology (2019).
Voter Response to Hispanic Sounding Names: Evidence from Down Ballot Statewide Elections. With Suzanne K. Barth and Kyung H. Park. Quarterly Journal of Political Science (2019).AppendixWorking Paper Version,
A Simple Method to Estimate Large Fixed Effects Models Applied to Wage Determinants. Labour Economics (2019).Longer IZA Working PaperPrograms,Appendix,
Combining Administrative and Survey Data to Improve Income Measurement. With Bruce D. Meyer in Administrative Records for Survey Methodology, ed. A.Y. Chun, M. Larson, J. Reiter and G. Durrant, Wiley: NY. (2021). IZA Working Paper
An Empirical Total Survey Error Decomposition Using Data Combination. With Bruce D. Meyer. Journal of Econometrics (2021).
Errors in Survey Reporting and Imputation and their Effects on Estimates of Food Stamp Program Participation. With Bruce D. Meyer and Robert M. Goerge. Journal of Human Resources (2022).
What Leads to Measurement Error? Evidence from Reports of Program Participation in Three Surveys. With Pablo Celhay and Bruce D. Meyer. Journal of Econometrics (2024).
Race, Ethnicity and Measurement Error With Bruce D. Meyer and Derek Wu. in Race, Ethnicity, and Economic Statistics for the 21st Century ed. R. Akee, L.F. Katz and M. Loewenstein, University of Chicago Press: Chicago, IL. (forthcoming).
Stigma in Welfare Programs With Pablo Celhay and Bruce D. Meyer. Review of Economics and Statistics (conditionally accepted).
Not in Progress
Distributional Impact Analysis Toolkit (with Guadalupe Bedoya, Luca Bitarello and Jonathan Davis)
Program evaluations often focus on average treatment effects. However, average treatment effects miss important aspects of policy evaluation, such as the impact on inequality and whether treatment harms some individuals. A growing literature develops methods to evaluate such issues by examining the distributional impacts of programs and policies. This toolkit reviews methods to do so, focusing on their application to randomized control trials. The paper emphasizes two strands of the literature: estimation of impacts on outcome distributions and estimation of the distribution of treatment impacts. The article then discusses extensions to conditional treatment effect heterogeneity, that is, to analyses of how treatment impacts vary with observed characteristics. The paper offers advice on inference, testing, and power calculations, which are important when implementing distributional analyses in practice. Finally, the paper illustrates select methods using data from two randomized evaluations.
Imputations: Benefits, Risks and a Method for Missing Data
DataPrograms,Missing data is a frequent problem in economics, either because some variables are missing from a data set or values are missing for some observations. Researchers usually either omit the affected variables and observations or impute them. While the consequences of the former are well understood, the imputation and missing data literature has focused on the conditions under which they lead to unbiased estimates. These conditions often do not hold, but there is little evidence on the circumstances under which missing data methods improve estimates if the conditions for unbiased estimates are violated. I first examine these conditions by discussing the circumstances under which missing data methods can be beneficial and common sources of bias. I then discuss advantages and problems of common missing data methods. Two important problems are that most methods work well for some models, but poorly for others and that researchers often do not have enough information to use imputed observations in common data sets appropriately. To address these problems, I develop a method based on the conditional density that works well for a wide range of models and allows producers of the data to incorporate private information and expertise, but still allows users of the data to adjust the imputations to their application and use them appropriately. Applications to some common problems show that the conditional density method works well in practice.