- This study introduces LLM-VaR and LLM-ES, novel risk estimation metrics that utilize general-purpose large language models (LLMs) for the forecasting tasks of Value at Risk (VaR) and Expected Shortfall (ES) in a zero-shot setting. Building on the input encoding mechanism of the LLMTime framework, we extend its application by defining new financial risk measures and performing an empirical evaluation of three generations of GPT models, GPT-3.5, GPT-4 and GPT-4o, versus advanced benchmark models such as GARCH with Student innovations and EWMA with Dynamic Conditional Score (DCS).
Financial time series are encoded as numerical strings, allowing for model-free inference without requiring retraining. Results show that LLMs perform well when short rolling windows are used, particularly in volatile markets like cryptocurrencies. GPT-3.5 frequently outperforms or matches the performance of newer models, raising questions about model complexity, alignment, and biases. In contrast, performanceThis study introduces LLM-VaR and LLM-ES, novel risk estimation metrics that utilize general-purpose large language models (LLMs) for the forecasting tasks of Value at Risk (VaR) and Expected Shortfall (ES) in a zero-shot setting. Building on the input encoding mechanism of the LLMTime framework, we extend its application by defining new financial risk measures and performing an empirical evaluation of three generations of GPT models, GPT-3.5, GPT-4 and GPT-4o, versus advanced benchmark models such as GARCH with Student innovations and EWMA with Dynamic Conditional Score (DCS).
Financial time series are encoded as numerical strings, allowing for model-free inference without requiring retraining. Results show that LLMs perform well when short rolling windows are used, particularly in volatile markets like cryptocurrencies. GPT-3.5 frequently outperforms or matches the performance of newer models, raising questions about model complexity, alignment, and biases. In contrast, performance deteriorates with longer windows, where the econometric models prove more reliable. Our findings demonstrate the potential of general-purpose LLMs as adaptive tools for short-horizon financial risk assessment and contribute a first-of-its-kind benchmark for LLM-based VaR/ES estimation.…

