Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions

In today’s big data environment, missing values continues to be a problem that harms data quality. The biasedness caused by missing values raises the highest concern as it cannot be eliminated simply by increasing the sample size. Although the statistics literature has developed approaches to handling missing values and formulated assumptions regarding when these approaches generate valid statistical inferences, these prescriptions have yet to be broadly accepted by many social science disciplines including the Information Systems (IS) discipline. By reviewing recently published empirical research in information systems, we find that missing values is indeed an important and pervasive problem. We believe that a review of missing value theory is necessary and timely for the IS community to understand the nature of missing values and to promote more rigorous research practice when missing values is often unavoidable. In addition, the missing not at random (MNAR) mechanism brings about challenges in parameter estimation. We contribute to research practice by proposing and demonstrating the superior performance of a Monte Carlo likelihood approach in correcting bias in parameter estimation. We conclude by suggesting that research validity can be enhanced through reasoned adoption of missing value handling method and structured missing value reporting practices.