SOTAVerified

Zero-shot Generalization

Papers

Showing 51100 of 572 papers

TitleStatusHype
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language ReasoningCode2
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly DetectionCode2
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTVCode2
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion ModelCode2
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by SegmentationCode2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation ModelCode2
Segment Any Anomaly without Training via Hybrid Prompt RegularizationCode2
RoboUniView: Visual-Language Model with Unified View Representation for Robotic ManipulationCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
RecGPT: A Foundation Model for Sequential RecommendationCode2
Unleashing Large-Scale Video Generative Pre-training for Visual Robot ManipulationCode2
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningCode2
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with InstructionsCode2
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion PreimageCode2
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?Code2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
OpenCity: Open Spatio-Temporal Foundation Models for Traffic PredictionCode2
Q-Insight: Understanding Image Quality via Visual Reinforcement LearningCode2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite ImageryCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge DistillationCode2
NeRF-Supervised Deep StereoCode2
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal AlignmentCode2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementCode2
Detecting Everything in the Open World: Towards Universal Object DetectionCode2
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSCode2
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
Matryoshka Diffusion ModelsCode2
Learning to Route Among Specialized Experts for Zero-Shot GeneralizationCode2
Autoregressive Image Generation with Randomized Parallel DecodingCode2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache CompressionCode2
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking AgentsCode2
Crosslingual Generalization through Multitask FinetuningCode2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterCode2
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerceCode2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationCode2
Grounding Language to Entities and Dynamics for Generalization in Reinforcement LearningCode1
CLIP-Forge: Towards Zero-Shot Text-to-Shape GenerationCode1
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation ModelsCode1
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as TeachersCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
M^3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and GenerationCode1
Gradient Ascent Post-training Enhances Language Model GeneralizationCode1
GOMAA-Geo: GOal Modality Agnostic Active Geo-localizationCode1
MAgNet: Mesh Agnostic Neural PDE SolverCode1
Generalization to New Actions in Reinforcement LearningCode1
Digital Twin-Enhanced Wireless Indoor Navigation: Achieving Efficient Environment Sensing with Zero-Shot Reinforcement LearningCode1
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networksCode1
Show:102550
← PrevPage 2 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified