Talmud-IR: A Talmud-Inspired Interface for Discussing RAG Response Quality

This abstract has open access
Abstract Summary
Retrieval-augmented generation (RAG) systems promise factually grounded answers, yet evaluating their quality remains difficult. Automated metrics and LLM-as-judge approaches offer scalability but risk circularity, benchmark leakage, and loss of diversity. Human assessors, meanwhile, often struggle to notice subtle omissions or hallucinations when responses appear linguistically fluent and confident. We present Talmud-IR, a novel user interface inspired by the dialogic structure of the Talmud. It visualizes RAG outputs as a central text surrounded by layers of evidence, commentary, and meta-assessment, enabling sustained human--LLM discussion about system quality and failure priorities. The prototype supports comparative RAG evaluation, collaborative exploration of ``unknown unknowns,'' and pedagogical use for teaching critical reading of AI-generated content.
Abstract ID :
NKDR147
Submission Type
Submission Topics

Associated Sessions

NASK National Research Institute
Uni-Kassel
PhD Student
,
Friedrich-Schiller-Universität Jena
Associate Professor
,
University Of New Hampshire
TU Dresden
RMIT University

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR143
Applications Machine Learning and Large Language Models Recommender systems Search and ranking
Demos
Trung Vo
NKDR166
Applications Machine Learning and Large Language Models Search and ranking Societally-motivated IR research
Demos
Rodrigo Silva
NKDR168
Demos
Rishiraj Saha Roy
NKDR156
Applications Machine Learning and Large Language Models Search and ranking System aspects
Demos
Quang Hieu Vu
NKDR159
Applications Machine Learning and Large Language Models Search and ranking
Demos
Rodrigo Duarte
NKDR160
Applications Conversational search and recommender systems Societally-motivated IR research
Demos
Markos Dimitsas
1 visits