Are box office revenues equally unpredictable for all movies? Evidence from a Random forest-based model
In this study we develop a model for early box office receipts forecasting that, in addition to traditionally used regressors, uses several inputs that have never been used before, but appeared to be very useful predictors according to our variable importance analysis. New predictors account for the power of actors and directors, as well as for the intensity of competition at the time of movie release. Instead of Motion Picture of Association of America (MPAA) ratings commonly used in movie success prediction, textual information about the reasons for giving a movie its MPAA rating was formalized using word frequency and principal components analyses. The expert system is based on the Random forest algorithm, which outperformed a stepwise regression and a multilayer perceptron neural network. A regression tree-based diagnostic approach allowed us to detect the heterogeneity of model accuracy across segments of data and assess the applicability of the model to different movie types.