SOTAVerified

Zero-shot Generalization

Papers

Showing 326350 of 572 papers

TitleStatusHype
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning0
A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments0
Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey0
General Object Foundation Model for Images and Videos at ScaleCode3
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning0
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Adaptive Human Trajectory Prediction via Latent Corridors0
Multi-View Unsupervised Image Generation with Cross Attention Guidance0
MuRF: Multi-Baseline Radiance FieldsCode1
Large Language Models are Good Prompt Learners for Low-Shot Image ClassificationCode1
Boosting Segment Anything Model Towards Open-Vocabulary LearningCode1
MASP: Scalable GNN-based Planning for Multi-Agent Navigation0
I-PHYRE: Interactive Physical Reasoning0
Repurposing Diffusion-Based Image Generators for Monocular Depth EstimationCode4
Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent0
Large Model Based Referring Camouflaged Object Detection0
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing0
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt LearningCode1
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs0
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense EncodersCode1
Neural-Logic Human-Object Interaction DetectionCode1
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question PromptsCode0
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual SpeechCode0
Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels0
Show:102550
← PrevPage 14 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified