Forecasting railway ticket demand with search query open data
Ilyas Varshavskiy, Elizaveta Stavinova, Petr Chunaev
Code Available — Be the first to reproduce this paper.
ReproduceCode
Abstract
This study proposes a solution to the problem of railway demand forecasting on open data of a passenger railway company and search engines. A time series of web search queries is used as a predictor, and demand time series for train tickets is used as a target variable. The predictor is taken with a lag corresponding to the best correlation with the demand series. The LSTM, MV-LSTM, ARIMA, SARIMA, ARIMAX and SARIMAX models are used for forecasting. ARIMA-based models are used in 39 and 1 day forecasting experiments. SARIMAX model showed slightly better results in 1-day prediction experiments, however, the MV-LSTM model significantly improved the metrics due to the use of a predictor. The results of many experiments show the usefulness of using web search queries as a predictor for predicting passenger demand for rail tickets, the quality of the best model improved to 1.43 percentage points by MAPE and 76 by RMSE which is measured in terms of sold tickets, relative to models trained without using search queries.