Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions

Thuan Trang, Nhat Khang Ngo, Hugo Sonnery, Thieu N. Vo, Siamak Ravanbakhsh, Truong Son Hy

Research output: Contribution to journalArticlepeer-review

1 Citation (SciVal)

Abstract

Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-structured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer’s ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range se-quence modeling, point-clouds classification, and segmentation when using a fixed hierarchy. Our source code is publicly available at https://github.com/HySonLab/HierAttention.

Original languageEnglish
Article number1976
Number of pages26
JournalTransactions on Machine Learning Research
Volume2024
Publication statusPublished - 17 Sept 2024

Bibliographical note

Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions'. Together they form a unique fingerprint.

Cite this