Abstract
Background: The proliferation of tobacco-related misinformation poses significant public health risks, requiring scalable solutions for credibility assessment. Traditional manual fact-checking approaches are resource-intensive and cannot match the pace of misinformation spread.
Objective: To develop and validate a proof-of-concept multi-agent AI pipeline for automated credibility assessment of tobacco misinformation claims, evaluating its performance against expert human reviewers.
Methods: We constructed a three-agent pipeline using OpenAI GPT-4.1 and the Crewai framework. The Serper API provided real-time evidence retrieval. The Content Analyzer classifies claims into four types: health impact, scientific assertion, policy, or statistical. The Scientific Fact Verifier queries authoritative sources (WHO, CDC, PubMed Central, Cochrane). The Health Evidence Assessor applies weighted scoring across five dimensions to assign 0–100 credibility scores on a five-level scale.
Results: The framework achieved an MAE of 6.25 points against expert scores, a weighted Cohen’s κ of 0.68 (95% CI: 0.52–0.84) indicating substantial agreement, 70% exact category agreement, 95% adjacent-level agreement, and processed each claim in under 7 s—over 1,000 × faster than manual review.
Limitations: We validated our approach using 20 diverse tobacco claims through intensive expert review (2–4 h per claim). The system exhibited a conservative bias (+3.25 points, p = 0.03) and did not classify any claims as “Highly Unlikely” despite expert assignment of two claims to this category. This proof-of-concept demonstrates technical feasibility and substantial inter-rater agreement while identifying areas for calibration in future large-scale implementations.
Conclusion: Our proof-of-concept agentic AI pipeline demonstrates substantial agreement with expert assessments of tobacco-related claims while providing dramatic speed improvements. By combining zero-shot LLM reasoning, retrieval-grounded evidence verification, and a transparent five-level scoring schema, the system offers a practical tool for real-time misinformation monitoring in public health. This proof-of-concept establishes technical feasibility for automated tobacco misinformation assessment, with validation results supporting further development and larger-scale testing before operational deployment.
Objective: To develop and validate a proof-of-concept multi-agent AI pipeline for automated credibility assessment of tobacco misinformation claims, evaluating its performance against expert human reviewers.
Methods: We constructed a three-agent pipeline using OpenAI GPT-4.1 and the Crewai framework. The Serper API provided real-time evidence retrieval. The Content Analyzer classifies claims into four types: health impact, scientific assertion, policy, or statistical. The Scientific Fact Verifier queries authoritative sources (WHO, CDC, PubMed Central, Cochrane). The Health Evidence Assessor applies weighted scoring across five dimensions to assign 0–100 credibility scores on a five-level scale.
Results: The framework achieved an MAE of 6.25 points against expert scores, a weighted Cohen’s κ of 0.68 (95% CI: 0.52–0.84) indicating substantial agreement, 70% exact category agreement, 95% adjacent-level agreement, and processed each claim in under 7 s—over 1,000 × faster than manual review.
Limitations: We validated our approach using 20 diverse tobacco claims through intensive expert review (2–4 h per claim). The system exhibited a conservative bias (+3.25 points, p = 0.03) and did not classify any claims as “Highly Unlikely” despite expert assignment of two claims to this category. This proof-of-concept demonstrates technical feasibility and substantial inter-rater agreement while identifying areas for calibration in future large-scale implementations.
Conclusion: Our proof-of-concept agentic AI pipeline demonstrates substantial agreement with expert assessments of tobacco-related claims while providing dramatic speed improvements. By combining zero-shot LLM reasoning, retrieval-grounded evidence verification, and a transparent five-level scoring schema, the system offers a practical tool for real-time misinformation monitoring in public health. This proof-of-concept establishes technical feasibility for automated tobacco misinformation assessment, with validation results supporting further development and larger-scale testing before operational deployment.
| Original language | English |
|---|---|
| Article number | 1659861 |
| Journal | Frontiers in Artificial Intelligence |
| Volume | 8 |
| Early online date | 19 Dec 2025 |
| DOIs | |
| Publication status | Published - 19 Dec 2025 |
Data Availability Statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://doi.org/10.7910/DVN/ODKMNH, Harvard Dataverse.Funding
The author(s) declare that financial support was received for the research and/or publication of this article. All authors are funded by Bloomberg Philanthropies as part of the Bloomberg Initiative to Reduce Tobacco Use (www.bloomberg.org). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.
Keywords
- automated fact-checking
- credibility assessment
- expert validation
- large language models
- multi-agent AI pipeline
- public health informatics
- retrieval-augmented generation
- tobacco misinformation
ASJC Scopus subject areas
- Artificial Intelligence