Diversity

Diversity in data sampling is crucial across various use cases, including search, recommendation systems, and more. Ensuring diverse samples means capturing a wide range of variations and perspectives, which leads to more robust, unbiased, and comprehensive models. In search use cases, for instance, diversity helps avoid redundancy, ensuring that users are exposed to a broader set of relevant information rather than repeated similar results.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 9051 papers

Title	Date	Tasks	Status	Hype
MinerU: An Open-Source Solution for Precise Document Content Extraction	Sep 27, 2024	DiversityOptical Character Recognition (OCR)	CodeCode Available	16
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models	Feb 25, 2025	DiversityLanguage Modeling	CodeCode Available	11
Depth Anything V2	Jun 13, 2024	Depth EstimationDiversity	CodeCode Available	9
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	Mar 26, 2024	DiversityFace Reenactment	CodeCode Available	9
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation	Jun 13, 2024	DiversityImage Animation	CodeCode Available	9
Is Diversity All You Need for Scalable Robotic Manipulation?	Jul 8, 2025	AllDiversity	CodeCode Available	7
FoundationStereo: Zero-Shot Stereo Matching	Jan 17, 2025	Depth EstimationDiversity	CodeCode Available	7
Flow-GRPO: Training Flow Matching Models via Online RL	May 8, 2025	DenoisingDiversity	CodeCode Available	7
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset	Sep 21, 2023	ChatbotDiversity	CodeCode Available	7
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	Feb 8, 2024	BenchmarkingDiversity	CodeCode Available	7
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Jan 3, 2024	DiversityQuantization	CodeCode Available	7
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance	Oct 3, 2022	DenoisingDiversity	CodeCode Available	7
PromptWizard: Task-Aware Prompt Optimization Framework	May 28, 2024	Computational EfficiencyDiversity	CodeCode Available	7
Adaptive In-conversation Team Building for Language Model Agents	May 29, 2024	DiversityLanguage Modeling	CodeCode Available	7
MaskSketch: Unpaired Structure-guided Masked Image Generation	Feb 10, 2023	Conditional Image GenerationDiversity	CodeCode Available	7
Better Synthetic Data by Retrieving and Transforming Existing Datasets	Apr 22, 2024	Dataset GenerationDiversity	CodeCode Available	7
Automatic Chain of Thought Prompting in Large Language Models	Oct 7, 2022	DiversityQuestion Answering	CodeCode Available	6
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts	Apr 13, 2024	DiversityLanguage Modeling	CodeCode Available	5
GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation	Nov 27, 2024	Depth EstimationDiversity	CodeCode Available	5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation	Jun 25, 2024	DiversityNatural Language Understanding	CodeCode Available	5
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning	Jun 2, 2025	AI AgentDiversity	CodeCode Available	5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations	Dec 10, 2024	AttributeBenchmarking	CodeCode Available	5
Fake News Detection: It's All in the Data!	Jul 2, 2024	AllDiversity	CodeCode Available	5
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving	Apr 25, 2024	Diversity	CodeCode Available	5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models	May 23, 2025	DiversityTime Series	CodeCode Available	5

Show:10 25 50

← PrevPage 1 of 363Next →

No leaderboard results yet.