In this study we develop a model for early box office receipts forecasting that, in addition to traditionally used regressors, uses several inputs that have never been used before, but appeared to be very useful predictors according to our variable importance analysis. New predictors account for the power of actors and directors, as well as for the intensity of competition at the time of movie release. Instead of Motion Picture of Association of America (MPAA) ratings commonly used in movie success prediction, textual information about the reasons for giving a movie its MPAA rating was formalized using word frequency and principal components analyses. The expert system is based on the Random forest algorithm, which outperformed a stepwise regression and a multilayer perceptron neural network. A regression tree-based diagnostic approach allowed us to detect the heterogeneity of model accuracy across segments of data and assess the applicability of the model to different movie types.
Forecasting demand and understanding sales drivers are one of the most important tasks in retail analytics. However, traditionally, linear models and/or models with a small number of predictors have been predominantly used in sales modeling. Taking into account that real-world demand is naturally determined by complex substitution and complementation patterns among a large number of interrelated SKUs, nonlinear effects of prices, promotions, seasonality, as well as many other factors, their lagged values, and interactions, a realistic model has to be able to account for all that. We propose a conceptual model for sales modeling based on standard POS data available to any retailer and generate almost 500 potentially useful predictors of a focal SKU’s sales accordingly. In our comparison of three classes of models, Gradient Boosting Machines outperformed Random Forests and Elastic nets. By using interpretable machine learning methods, we came up with actionable insights related to the importance of various groups of predictors from the conceptual model, as well as demonstrated how helpful it can be for marketing managers to decompose predictions into the effects of individual regressors by using an approximation of Shapley values for feature attribution.