SOTAVerified

Diversity

Diversity in data sampling is crucial across various use cases, including search, recommendation systems, and more. Ensuring diverse samples means capturing a wide range of variations and perspectives, which leads to more robust, unbiased, and comprehensive models. In search use cases, for instance, diversity helps avoid redundancy, ensuring that users are exposed to a broader set of relevant information rather than repeated similar results.

Papers

Showing 125 of 9051 papers

TitleStatusHype
MinerU: An Open-Source Solution for Precise Document Content ExtractionCode16
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language ModelsCode11
Depth Anything V2Code9
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image AnimationCode9
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait AnimationCode9
Is Diversity All You Need for Scalable Robotic Manipulation?Code7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
FoundationStereo: Zero-Shot Stereo MatchingCode7
Adaptive In-conversation Team Building for Language Model AgentsCode7
PromptWizard: Task-Aware Prompt Optimization FrameworkCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
From Audio to Photoreal Embodiment: Synthesizing Humans in ConversationsCode7
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation DatasetCode7
MaskSketch: Unpaired Structure-guided Masked Image GenerationCode7
Improving Sample Quality of Diffusion Models Using Self-Attention GuidanceCode7
Automatic Chain of Thought Prompting in Large Language ModelsCode6
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-TuningCode5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting ModelsCode5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsCode5
GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object ManipulationCode5
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal ModelsCode5
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world VideosCode5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkCode5
Fake News Detection: It's All in the Data!Code5
Show:102550
← PrevPage 1 of 363Next →

No leaderboard results yet.