Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning

This abstract has open access
Abstract Summary
Large Language Models (LLMs) are increasingly promoted for knowledge-intensive reasoning tasks. Effective oversight requires faithful reasoning traces which show how answers are actually produced. Chain-of-Thought (CoT) prompting is positioned as a technique to promote both accuracy and transparency, as well as provide reasoning traces on how solutions are reached. Recent studies have shown that CoT traces, while plausible, are unfaithful to the how the answer was derived. However, we argue there is a second more subtle issue with CoT that requires more investigation; even logically correct CoT explanations can conceal key facts used to produce the answer - thereby misleading the reader. In this paper we illustrate this behavior by six LLM models when answering questions across three question answering (QA) datasets of different types (arithmetic, factual QA, and multi-choice reasoning). In particular, we show that injecting a key fact into the prompt increased QA accuracy by 11\% to 36\% (as expected), yet the models omitted this fact from otherwise sound CoT explanations in up 56\% of cases. This provides further evidence that researchers and developers should be wary of relying on CoT explanations, as even those that appear to be logically correct may be misleading.
Abstract ID :
NKDR118
Submission Type
Submission Topics

Associated Sessions

PhD Student
,
School Of Computing Science, University Of Glasgow
University of Glasgow
University of Glasgow

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR99
Machine learning Search and ranking
Short papers
Mr. Amir Khosrojerdi
NKDR115
IR applications Large Language Models
Short papers
Omar Adjali
NKDR108
IR evaluation Search and ranking
Short papers
Ms. PAYEL SANTRA
NKDR112
Machine learning Search and ranking
Short papers
Amirabbas Afzali
NKDR82
Generative IRIR applicationsLarge Language ModelsRetrieval-Augmented GenerationSystem aspects
Short papers
Saisab Sadhu
NKDR102
Short papers
Mehmet Erdeniz Aydo?du
2 visits