In the beginning was the word: LLM-VaR and LLM-ES

Pele, Daniel Traian; Bolovăneanu, Vlad; Lin, Min-Bin; Ren, Rui; Ginavar, Andrei Theodor; Spilak, Bruno; Andrei, Alexandru-Victor; Toma, Filip-Mihai; Lessmann, Stefan; Härdle, Wolfgang Karl

doi:10.1016/j.eswa.2025.128676

In the beginning was the word: LLM-VaR and LLM-ES

Daniel Traian Pele, Vlad Bolovăneanu, Min-Bin Lin, Rui Ren, Andrei Theodor Ginavar, Bruno Spilak, Alexandru-Victor Andrei, Filip-Mihai Toma, Stefan Lessmann, Wolfgang Karl Härdle

This study introduces LLM-VaR and LLM-ES, novel risk estimation metrics that utilize general-purpose large language models (LLMs) for the forecasting tasks of Value at Risk (VaR) and Expected Shortfall (ES) in a zero-shot setting. Building on the input encoding mechanism of the LLMTime framework, we extend its application by defining new financial risk measures and performing an empirical evaluation of three generations of GPT models, GPT-3.5, GPT-4 and GPT-4o, versus advanced benchmark models such as GARCH with Student innovations and EWMA with Dynamic Conditional Score (DCS). Financial time series are encoded as numerical strings, allowing for model-free inference without requiring retraining. Results show that LLMs perform well when short rolling windows are used, particularly in volatile markets like cryptocurrencies. GPT-3.5 frequently outperforms or matches the performance of newer models, raising questions about model complexity, alignment, and biases. In contrast, performanceThis study introduces LLM-VaR and LLM-ES, novel risk estimation metrics that utilize general-purpose large language models (LLMs) for the forecasting tasks of Value at Risk (VaR) and Expected Shortfall (ES) in a zero-shot setting. Building on the input encoding mechanism of the LLMTime framework, we extend its application by defining new financial risk measures and performing an empirical evaluation of three generations of GPT models, GPT-3.5, GPT-4 and GPT-4o, versus advanced benchmark models such as GARCH with Student innovations and EWMA with Dynamic Conditional Score (DCS). Financial time series are encoded as numerical strings, allowing for model-free inference without requiring retraining. Results show that LLMs perform well when short rolling windows are used, particularly in volatile markets like cryptocurrencies. GPT-3.5 frequently outperforms or matches the performance of newer models, raising questions about model complexity, alignment, and biases. In contrast, performance deteriorates with longer windows, where the econometric models prove more reliable. Our findings demonstrate the potential of general-purpose LLMs as adaptive tools for short-horizon financial risk assessment and contribute a first-of-its-kind benchmark for LLM-based VaR/ES estimation.…

Metadaten
Author:	Daniel Traian Pele, Vlad Bolovăneanu, Min-Bin Lin, Rui Ren ORCiD GND, Andrei Theodor Ginavar, Bruno Spilak, Alexandru-Victor Andrei, Filip-Mihai Toma, Stefan Lessmann, Wolfgang Karl Härdle
URN:	urn:nbn:de:bvb:384-opus4-1230176
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/123017
ISSN:	0957-4174OPAC
Parent Title (English):	Expert Systems with Applications
Publisher:	Elsevier BV
Place of publication:	Amsterdam
Type:	Article
Language:	English
Year of first Publication:	2026
Publishing Institution:	Universität Augsburg
Release Date:	2025/06/25
Volume:	295
First Page:	128676
DOI:	https://doi.org/10.1016/j.eswa.2025.128676
Institutes:	Wirtschaftswissenschaftliche Fakultät
	Wirtschaftswissenschaftliche Fakultät / Institut für Statistik und mathematische Wirtschaftstheorie
	Wirtschaftswissenschaftliche Fakultät / Institut für Statistik und mathematische Wirtschaftstheorie / Lehrstuhl für Statistik
Dewey Decimal Classification:	3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung

Open Access

In the beginning was the word: LLM-VaR and LLM-ES

Download full text files

Export metadata

Statistics

Additional Services