SOTAVerified

Zero-shot Generalization

Papers

Showing 251300 of 572 papers

TitleStatusHype
Amortized Active Causal Induction with Deep Reinforcement Learning0
M^3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and GenerationCode1
SMART: Scalable Multi-agent Real-time Motion Generation via Next-token PredictionCode3
Gradient Projection For Continual Parameter-Efficient Tuning0
Prompt Learning for Generalized Vehicle RoutingCode0
Revisiting the Robust Generalization of Adversarial Prompt Tuning0
A Minimalist Prompt for Zero-Shot Policy Learning0
Enhancing Vision-Language Models Generalization via Diversity-Driven Novel Feature Synthesis0
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?Code2
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-ExpertsCode3
Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific TasksCode0
The Third Monocular Depth Estimation Challenge0
CompilerDream: Learning a Compiler World Model for General Code OptimizationCode1
Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning0
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement LearningCode0
PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination0
GeoSynth: Contextually-Aware High-Resolution Satellite Image SynthesisCode2
Visually Descriptive Language Model for Vector Graphics ReasoningCode9
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as TeachersCode1
DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation0
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language ReasoningCode2
Decision Transformer as a Foundation Model for Partially Observable Continuous Control0
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionCode9
Where to Move Next: Zero-shot Generalization of LLMs for Next POI RecommendationCode1
F^2Depth: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis0
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
Federated reinforcement learning for robot motion planning with zero-shot generalization0
Quantifying uncertainty in lung cancer segmentation with foundation models applied to mixed-domain datasets0
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language ModelsCode1
Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over QuantityCode1
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion ModelCode2
Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot GeneralizationCode1
Temporal-spatial Adaptation of Promptable SAM Enhance Accuracy and Generalizability of cine CMR Segmentation0
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical ImagesCode1
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models0
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration0
Augmenting Efficient Real-time Surgical Instrument Segmentation in Video with Point Tracking and Segment AnythingCode1
FluoroSAM: A Language-aligned Foundation Model for X-ray Image SegmentationCode1
RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation ModelCode2
In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model0
SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt DenoisingCode0
Zero-shot Generalizable Incremental Learning for Vision-Language Object DetectionCode1
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTVCode2
Segment anything model for head and neck tumor segmentation with CT, PET and MRI multi-modality imagesCode0
Multimodal Instruction Tuning with Conditional Mixture of LoRACode1
Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot GeneralizationCode1
IEPile: Unearthing Large-Scale Schema-Based Information Extraction CorpusCode3
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance LabelingCode0
Zero-shot generalization across architectures for visual classificationCode0
Show:102550
← PrevPage 6 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified