View article

[PDF] from umn.edu

Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining

Authors

Mochen Yang, Gediminas Adomavicius, Gordon Burtch, Yuqing Ren

Publication date

2018/3

Journal

Information Systems Research

Volume

Issue

Pages

4-24

Publisher

INFORMS

Description

The application of predictive data mining techniques in information systems research has grown in recent years, likely because of their effectiveness and scalability in extracting information from large amounts of data. A number of scholars have sought to combine data mining with traditional econometric analyses. Typically, data mining methods are first used to generate new variables (e.g., text sentiment), which are added into subsequent econometric models as independent regressors. However, because prediction is almost always imperfect, variables generated from the first-stage data mining models inevitably contain measurement error or misclassification. These errors, if ignored, can introduce systematic biases into the second-stage econometric estimations and threaten the validity of statistical inference. In this commentary, we examine the nature of this bias, both analytically and empirically, and show that it …

Total citations

Cited by 93

20182019202020212022202320246 8 15 14 19 20 11

Scholar articles

Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining

M Yang, G Adomavicius, G Burtch, Y Ren - Information Systems Research, 2018