SOTAVerified

Zero-shot Generalization

Papers

Showing 150 of 572 papers

TitleStatusHype
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionCode9
Visually Descriptive Language Model for Vector Graphics ReasoningCode9
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
FoundationStereo: Zero-Shot Stereo MatchingCode7
Large Concept Models: Language Modeling in a Sentence Representation SpaceCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIsCode5
Segment Anything for Videos: A Systematic SurveyCode5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric DepthCode5
RDT-1B: a Diffusion Foundation Model for Bimanual ManipulationCode5
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive ReinforcementCode4
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single ImageCode4
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense PredictionCode4
Zero-1-to-3: Zero-shot One Image to 3D ObjectCode4
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation ModelsCode4
Repurposing Diffusion-Based Image Generators for Monocular Depth EstimationCode4
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text RetrieversCode4
MonSter: Marry Monodepth to Stereo Unleashes PowerCode4
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
DEFOM-Stereo: Depth Foundation Model Based Stereo MatchingCode3
Lag-Llama: Towards Foundation Models for Probabilistic Time Series ForecastingCode3
Detect Anything 3D in the WildCode3
ZIM: Zero-Shot Image Matting for AnythingCode3
3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsCode3
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono FailCode3
What Language Model to Train if You Have One Million GPU Hours?Code3
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers UpCode3
RobustSAM: Segment Anything Robustly on Degraded ImagesCode3
Separate Anything You DescribeCode3
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-ExpertsCode3
General Object Foundation Model for Images and Videos at ScaleCode3
PE3R: Perception-Efficient 3D ReconstructionCode3
Objaverse-XL: A Universe of 10M+ 3D ObjectsCode3
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any CameraCode3
IEPile: Unearthing Large-Scale Schema-Based Information Extraction CorpusCode3
SMART: Scalable Multi-agent Real-time Motion Generation via Next-token PredictionCode3
NeRF-Supervised Deep StereoCode2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
Crosslingual Generalization through Multitask FinetuningCode2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite ImageryCode2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementCode2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
LLM+P: Empowering Large Language Models with Optimal Planning ProficiencyCode2
BigBIO: A Framework for Data-Centric Biomedical Natural Language ProcessingCode2
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation CapabilitiesCode2
Learning to Route Among Specialized Experts for Zero-Shot GeneralizationCode2
Show:102550
← PrevPage 1 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified