Script-to-Storyboard: A New Contextual Retrieval Dataset and Benchmark

Xi Tian, Yongliang Yang, Qi Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Storyboards comprising key illustrations and images help filmmakers to outline ideas, key moments, and story events when filming movies. Inspired by this, we introduce the first contextual benchmark dataset Script-to-Storyboard (Sc2St) composed of storyboards to explicitly express story structures in the movie domain, and propose the contextual retrieval task to facilitate movie story understanding. The Sc2St dataset contains fine-grained and diverse texts, annotated semantic keyframes, and coherent storylines in storyboards, unlike existing movie datasets. The contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the storyboard. Compared to classic text-based visual retrieval tasks, this requires capturing the context from the description (script) and keyframe history. We benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context encoding. Comprehensive experiments demonstrate that our methods compare favourably to existing methods; ablation studies validate the effectiveness of the proposed context encoding approaches.
Original languageEnglish
Pages (from-to)103-122
Number of pages20
JournalComputational Visual Media
Volume11
Issue number1
DOIs
Publication statusPublished - 25 Feb 2025

Funding

This research was supported by RCUK grant CAMERA (EP/M023281/1, EP/T022523/1), the Centre for Augmented Reasoning (CAR) at the Australian Institute for Machine Learning, and a gift from Adobe.

FundersFunder number
Australian Institute for Machine Learning
Centre for Augmented Reasoning
RCUKEP/T022523/1, EP/M023281/1

    Keywords

    • benchmark
    • dataset
    • movie
    • text-based image retrieval

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition
    • Computer Graphics and Computer-Aided Design
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Script-to-Storyboard: A New Contextual Retrieval Dataset and Benchmark'. Together they form a unique fingerprint.

    Cite this