CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

2025-04-02Code Available0· sign in to hype

Sarah Alyami, Hamzah Luqman

Code Available — Be the first to reproduce this paper.

Code

github.com/snalyami/CLIP-SLA
Officialnone★ 2

Abstract

Continuous sign language recognition (CSLR) focuses on interpreting and transcribing sequences of sign language gestures in videos. In this work, we propose CLIP sign language adaptation (CLIP-SLA), a novel CSLR framework that leverages the powerful pre-trained visual encoder from the CLIP model to sign language tasks through parameter-efficient fine-tuning (PEFT). We introduce two variants, SLA-Adapter and SLA-LoRA, which integrate PEFT modules into the CLIP visual encoder, enabling fine-tuning with minimal trainable parameters. The effectiveness of the proposed frameworks is validated on four datasets: Phoenix2014, Phoenix2014-T, CSL-Daily, and Isharah-500, where both CLIP-SLA variants outperformed several SOTA models with fewer trainable parameters. Extensive ablation studies emphasize the effectiveness and flexibility of the proposed methods with different vision-language models for CSLR. These findings showcase the potential of adapting large-scale pre-trained models for scalable and efficient CSLR, which pave the way for future advancements in sign language understanding.

Tasks

parameter-efficient fine-tuning Sign Language Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CSL-Daily	SLA-LoRA	Word Error Rate (WER)	25.8	—	Unverified
RWTH-PHOENIX-Weather 2014	SLA-Adapter	Word Error Rate (WER)	18.8	—	Unverified
RWTH-PHOENIX-Weather 2014 T	SLA-LoRA	Word Error Rate (WER)	19.4	—	Unverified

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions