HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

2024-09-25Code Available1· sign in to hype

Jacob Fein-Ashley, Ethan Feng, Minh Pham

Code Available — Be the first to reproduce this paper.

Code

github.com/hyperbolicvit/hyperbolicvit
OfficialIn paperpytorch★ 16

Abstract

Data representation in non-Euclidean spaces has proven effective for capturing hierarchical and complex relationships in real-world datasets. Hyperbolic spaces, in particular, provide efficient embeddings for hierarchical structures. This paper introduces the Hyperbolic Vision Transformer (HVT), a novel extension of the Vision Transformer (ViT) that integrates hyperbolic geometry. While traditional ViTs operate in Euclidean space, our method enhances the self-attention mechanism by leveraging hyperbolic distance and M\"obius transformations. This enables more effective modeling of hierarchical and relational dependencies in image data. We present rigorous mathematical formulations, showing how hyperbolic geometry can be incorporated into attention layers, feed-forward networks, and optimization. We offer improved performance for image classification using the ImageNet dataset.

Tasks

image-classification Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	HVT Huge	Top 1 Accuracy	87.4	—	Unverified
ImageNet	HVT Large	Top 1 Accuracy	85	—	Unverified
ImageNet	HVT Base	Top 1 Accuracy	80.1	—	Unverified

HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

Code

Abstract

Tasks

Benchmark Results

Reproductions