Skip to main navigation Skip to search Skip to main content

Schema-Guided Scene-Graph Reasoning Based on Multi-Agent Large Language Model System

Yiye Chen, Harpreet S. Sawhney, Nicholas Gydé, Yanan Jian, Jack Saunders, Patricio Vela, Benjamin E. Lundell

Research output: Contribution to journalConference articlepeer-review

Abstract

Scene graphs have emerged as a structured and serializable environment representation for grounded spatial reasoning with Large Language Models (LLMs). In this work, we propose SG2, an iterative Schema-Guided Scene-Graph reasoning framework based on multi-agent LLMs. The agents are grouped into two modules: a (1) Reasoner module for abstract task planning and graph information queries generation, and a (2) Retriever module for extracting corresponding graph information based on code-writing following the queries. Two modules collaborate iteratively, enabling sequential reasoning and adaptive attention to graph information. The scene graph schema, prompted to both modules, serves to not only streamline both reasoning and retrieval process, but also guide the cooperation between two modules. This eliminates the need to prompt LLMs with full graph data, reducing the chance of hallucination due to irrelevant information. Through experiments in multiple simulation environments, we show that our framework surpasses existing LLM-based approaches and baseline single-agent, tool-based Reason-while-Retrieve strategy in numerical Q&A and planning tasks.

Original languageEnglish
Pages (from-to)30332-30340
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume40
Issue number36
Early online date14 Mar 2026
DOIs
Publication statusPublished - 14 Mar 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Schema-Guided Scene-Graph Reasoning Based on Multi-Agent Large Language Model System'. Together they form a unique fingerprint.

Cite this