Title: Transfer Learning in Dynamic Data Environments
Abstract:
Despite advances in analytics techniques and access to abundant data, making accurate predictions is still challenging due to changing data environments as we lack sufficient information to adjust prediction models in a timely manner. Data analysts face two trade-offs – 1) whether or not they should use the large but potentially less relevant historical source data, and 2) whether they should adjust the prediction model immediately when change in data is suspected or later when more relevant post-change source data becomes available. These trade-offs are related to the fundamental bias-variance and exploration-exploitation trade-offs. To aid data analysts’ deal with these trade-offs, we develop theoretical insights by adopting a sample selection perspective to represent changes in data patterns. Based on the sample selection model, we propose a transfer learning framework that leverages the large historical source data. The framework allows us to theoretically study the two fundamental trade-offs. Our analysis shows that using historical source data offers relatively larger benefits when the prediction model has higher complexity, and when the extent of change is not too large. However, the benefit also varies with the scarcity of post-change source data thus with the timing of adjusting the prediction model. Our theoretical analysis shows that the decisions on the two trade-offs interact with each other. Our analysis is further corroborated by Monte Carlo simulations in a data change detection and model adjustment context. Overall, this study provides theoretical insights and practical guidelines for the application of predictive analytics in changing data environments.