Skip to main navigation Skip to search Skip to main content

Scaling Multimodal Agentic AI in Medical Education: Multisite Cross-Sectional Study of Simulation Effectiveness in Primary Care

Chris Jacobs, Hans Johnson, Kirsty Brownlie, Richard Joiner, Trevor Thompson

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Conversational artificial intelligence (AI) systems offer potential solutions to traditional constraints in medical consultation skills training, including high costs, scheduling difficulties, and varied standardization. There is limited evidence evaluating medical professionals’ perceptions of AI-generated patient interactions across multiple fidelity dimensions and assessing the educational value of conversational AI for consultation skills training. Objective: This study aimed to evaluate perceptions of conversational AI patient simulations in primary care consultation training, examining functional fidelity, conversational realism, educational value, and implementation readiness. Methods: A cross-sectional evaluation study at a UK medical school (medical students and general practitioners) yielded 47 grouped and individual responses. Participants completed standardized clinical scenarios using the SimFlow conversational AI system, a conversational AI system, followed by a multidomain questionnaire evaluating AI realism, medical content, educational value, feedback, and usability. Data were analyzed using the Wilcoxon signed rank test, Spearman correlation, and Firth logistic regression to assess domain performance and participant characteristics. Results: Medical content received the highest ratings (median 4.5, IQR 4.0-5.0), with 97.8% (45/46) rating clinical plausibility highly. Educational value was rated positively (median 4.0, IQR 3.0-4.0), although AI realism received moderate scores (median 3.0, IQR 2.0-4.0). Participants with prior AI experience gave significantly higher ratings for AI realism than those without prior experience (mean 3.81, SD 0.63 vs 3.07, SD 0.72; P=.03). Concordance analysis demonstrated moderate-to-strong agreement between individual- and group-level domain rankings (mean Spearman ρ=0.685), supporting consistency between collaborative and individual survey evaluations. Qualitative analysis revealed 4 themes: clinical authenticity, interactional limitations, educational potential, and implementation considerations. Conclusions: Conversational AI demonstrates strong capabilities in functional fidelity (clinical accuracy) despite limitations in conversational fidelity (realism). The technology shows promise as a supplementary tool for clinical skills training rather than higher-stakes assessment, with future development needed in dialogue naturalness and feedback capabilities.

Original languageEnglish
Article numbere88905
JournalJMIR Formative Research
Volume10
DOIs
Publication statusPublished - 23 Mar 2026

Data Availability Statement

Anonymized data are in the Multimedia Appendix 1.

Acknowledgements

The authors would like to thank Dr Jon Turvey, CEO of SimFlow.ai, for providing access to the platform during this study. The authors declare the use of generative artificial intelligence (GenAI) in the research and writing process. According to the Generative Artificial Intelligence Delegation Taxonomy (2025), the following task was delegated to GenAI tools under full human supervision: for image generation of Figure 1. The GenAI tool used was GPT-4. Responsibility for the final manuscript lies entirely with the authors. GenAI tools are not listed as authors and do not bear responsibility for the final outcomes.

Funding

There were no sources of funding for this study other than the free use of the web-based platform for the purposes of the study.

Keywords

  • AI: medical
  • artificial intelligence
  • clinical
  • conversational
  • education
  • realism

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Health Informatics

Fingerprint

Dive into the research topics of 'Scaling Multimodal Agentic AI in Medical Education: Multisite Cross-Sectional Study of Simulation Effectiveness in Primary Care'. Together they form a unique fingerprint.

Cite this