Is the answer in the text? Challenging ChatGPT with evidence retrieval from instructive text

Henning, Sophie; Anthonio, Talita; Zhou, Wei; Adel, Heike; Mesgar, Mohsen; Friedrich, Annemarie

Is the answer in the text? Challenging ChatGPT with evidence retrieval from instructive text

Sophie Henning, Talita Anthonio, Wei Zhou, Heike Adel, Mohsen Mesgar, Annemarie Friedrich

Generative language models have recently shown remarkable success in generating answers to questions in a given textual context. However, these answers may suffer from hallucination, wrongly cite evidence, and spread misleading information. In this work, we address this problem by employing ChatGPT, a state-of-the-art generative model, as a machine-reading system. We ask it to retrieve answers to lexically varied and open-ended questions from trustworthy instructive texts. We introduce WHERE (WikiHow Evidence REtrieval), a new high-quality evaluation benchmark of a set of WikiHow articles exhaustively annotated with evidence sentences to questions that comes with a special challenge: All questions are about the article’s topic, but not all can be answered using the provided context. We interestingly find that when using a regular question-answering prompt, ChatGPT neglects to detect the unanswerable cases. When provided with a few examples, it learns to better judge whether a textGenerative language models have recently shown remarkable success in generating answers to questions in a given textual context. However, these answers may suffer from hallucination, wrongly cite evidence, and spread misleading information. In this work, we address this problem by employing ChatGPT, a state-of-the-art generative model, as a machine-reading system. We ask it to retrieve answers to lexically varied and open-ended questions from trustworthy instructive texts. We introduce WHERE (WikiHow Evidence REtrieval), a new high-quality evaluation benchmark of a set of WikiHow articles exhaustively annotated with evidence sentences to questions that comes with a special challenge: All questions are about the article’s topic, but not all can be answered using the provided context. We interestingly find that when using a regular question-answering prompt, ChatGPT neglects to detect the unanswerable cases. When provided with a few examples, it learns to better judge whether a text provides answer evidence or not. Alongside this important finding, our dataset defines a new benchmark for evidence retrieval in question answering, which we argue is one of the necessary next steps for making large language models more trustworthy.…

Metadaten
Author:	Sophie Henning, Talita Anthonio, Wei Zhou, Heike Adel, Mohsen Mesgar, Annemarie Friedrich ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1266985
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/126698
URL:	https://aclanthology.org/2023.findings-emnlp.949/
ISBN:	979-8-89176-061-5OPAC
Parent Title (English):	Findings of the Association for Computational Linguistics: EMNLP 2023, 6-10 Dezember 2023, Singapore
Publisher:	Association for Computational Linguistics (ACL)
Place of publication:	Stroudsburg, PA
Editor:	Houda Bouamor, Juan Pino, Kalika Bali
Type:	Conference Proceeding
Language:	English
Date of Publication (online):	2025/12/02
Year of first Publication:	2023
Publishing Institution:	Universität Augsburg
Release Date:	2025/12/03
First Page:	14229
Last Page:	14241
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung

Open Access

Is the answer in the text? Challenging ChatGPT with evidence retrieval from instructive text

Download full text files

Export metadata

Statistics

Additional Services