Abstract
In the realm of option pricing, parametric models originating from the Black-Scholes-Merton framework have proven extremely persistent. However, machine learning models have recently entered the field with success, arguably due to their flexible and non-parametric nature. A combined LSTM-MLP deep learning architecture that combines time series data with cross-sectional pricing information, avoiding explicit volatility estimates, has recently been proposed. This LSTM-MLP model outperforms relevant benchmarks in different dimensions. In this research, we investigated whether a transformer-based alternative is able to better capture the inter-temporal characteristics of the data than the LSTM-based LSTM-MLP model. We found that although the transformer performs better during the extreme market conditions of COVID-19, the LSTM-MLP architecture is overall superior.