Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

Yanpeng Cao, Xing Luo, Jiangxin Yang, Yanlong Cao, Michael Ying Yang

Research output: Contribution to journalArticlepeer-review

31 Citations (SciVal)

Abstract

Multispectral pedestrian detection has received much attention in recent years due to its superiority in detecting targets under adverse lighting/weather conditions. In this paper, we aim to generate highly discriminative multi-modal features by aggregating the human-related clues based on all available samples presented in multispectral images. To this end, we present a novel multispectral pedestrian detector performing locality guided cross-modal feature aggregation and pixel-level detection fusion. Given a number of single bounding boxes covering pedestrians in both modalities, we deploy two segmentation sub-branches to predict the existence of pedestrians on visible and thermal channels. By referring to the important locality information in the reference modality, we perform locality guided cross-modal feature aggregation to learn highly discriminative human-related features in the complementary modality by exploring the clues of all available pedestrians. Moreover, we utilize the obtained spatial locality maps to provide prediction confidence scores in visible and thermal channels and conduct pixel-wise adaptive fusion of detection results in complementary modalities. Extensive experiments demonstrate the effectiveness of our proposed method, outperforming the current state-of-the-art detectors on both KAIST and CVC-14 multispectral pedestrian detection datasets.

Original languageEnglish
Pages (from-to)1-11
Number of pages11
JournalInformation Fusion
Volume88
Early online date13 Jul 2022
DOIs
Publication statusPublished - 1 Dec 2022

Funding

This research was supported by the National Natural Science Foundation of China (No. 52075485 ). The authors would also like to thank the anonymous reviewers for their valuable suggestions.

FundersFunder number
National Natural Science Foundation of China52075485

    Keywords

    • Deep neural networks
    • Feature aggregation
    • Multispectral fusion
    • Pedestrian detection
    • Pixel-wise guidance

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection'. Together they form a unique fingerprint.

    Cite this