← Back to Research Radar
Academic Publication Academic Publication

Detecting hallucinations in large language models using semantic entropy

571
Citations
June 20, 2024
Published Date

Research Abstract & Technology Focus

AbstractLarge language model (LLM) systems, such as ChatGPT1or Gemini2, can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers3,4. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents5or untrue facts in news articles6and even posing a risk to human life in medical domains such as radiology7. Encouraging truthfulness through supervision or reinforcement has been only partially successful8. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.
Read Full Literature

AI Semantic Synergy Context

Connecting this academic literature to real-world market discussions and products.

crossref.org › academic paper
80%
🔥

Detecting hallucinations in large language models using semantic entropy

AbstractLarge language model (LLM) systems, such as ChatGPT1or Gemini2, can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated a...

crossref.org › academic paper
0%

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are...

crossref.org › academic paper
0%

A survey on multimodal large language models

ABSTRACT Recently, the multimodal large language model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models (LLMs) as a brai...

stackexchange.com › answer
0%

Recommended GenerationConfig for Medical Domain LLMs: Strategies to Minimize Hallucination and Ensure Factuality

For medical domain LLMs where factuality is critical, here are the key GenerationConfig parameters I'd recommend based on practical experience: Temperature: 0.1-0.3 (not 0) Setting temperature to e...

stackexchange.com › answer
0%

Recommended GenerationConfig for Medical Domain LLMs: Strategies to Minimize Hallucination and Ensure Factuality

factual accuracy and consistency are far more critical than linguistic creativity = never use a LLM;

Frequently Asked Questions (FAQ)

Curated market intelligence mapped to this research.

What is the core focus of the research titled 'Detecting hallucinations in large language models using semantic entropy'?

This literature focuses on: AbstractLarge language model (LLM) systems, such as ChatGPT1or Gemini2, can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers3,4. Answering unreliably or without the nec...

Are there open-source GitHub repositories related to Detecting hallucinations in large language models using semantic entropy?

Yes, open-source projects like FreedomIntelligence/OpenClaw-Medical-Skills (The largest open-source medical AI skills library for OpenClaw🦞.) are actively building upon these concepts.

What other academic literature is closely related to 'Detecting hallucinations in large language models using semantic entropy'?

Yes, highly correlated activity was mapped. An entry titled 'Detecting hallucinations in large language models using semantic entropy' discusses this: AbstractLarge language model (LLM) systems, such as ChatGPT1or Gemini2, can show impressive reasoning and question-answering capabilities but often...

How is the concept of 'Detecting hallucinations in large language models using semantic entropy' being discussed by engineers on StackExchange?

Yes, highly correlated activity was mapped. An entry titled 'Recommended GenerationConfig for Medical Domain LLMs: Strategies to Minimize Hallucination and Ensure Factuality' discusses this: For medical domain LLMs where factuality is critical, here are the key GenerationConfig parameters I'd recommend based on practical experience: Tem...

Cite this Market Intelligence Report

Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.

Commercial Realization

Startups and Open Source tools heavily associated with the concepts explored in this paper.

Associated Media Narrative