Abstract
Transformer-based models are very popular in time series forecasting because of their ability to capture contextual information in natural language processing tasks. However, their application to time series has yielded mixed results. Performance improvements of Transformers in this domain are usually based on metrics such as MSE or MAE and have marginal percentage gains. We include statistical tests to examine the significance of such performance differences. Moreover, comparisons rarely account for the uncertainty that training such complex models entails. We investigate this by looking at the robustness in terms of training stability of Transformer-based models compared to simpler long-term time series forecasting linear (LTSF-Linear) models. Instead of focusing on adversarial or anomaly robustness, we measure it through standard deviation to common perturbations in data-driven algorithms, including initialized parameters and data splits. By analyzing model performance and robustness, and statistical significance, across various benchmark datasets, we find that Transformer-based models perform generally worse in our experiments, as pointed out by other works, and exhibit significantly higher variability of results, hence lacking robustness. This study highlights the challenges of using Transformer architectures for time series forecasting and underscores the importance of considering model robustness and stability, and the significance of the results, when choosing algorithms for time series forecasting.