SOTAVerified

2k

Papers

Showing 101125 of 288 papers

TitleStatusHype
Test-Time Training Done Right0
PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues0
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning0
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning0
ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative AnnotationCode0
Calibrating Translation Decoding with Quality Estimation on LLMsCode0
aiXamine: Simplified LLM Safety and Security0
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis0
Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading0
On Linear Representations and Pretraining Data Frequency in Language Models0
Seedream 3.0 Technical Report0
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration0
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers0
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior0
REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities0
Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation0
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models0
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks0
Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements0
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning0
Improved Regret in Stochastic Decision-Theoretic Online Learning under Differential Privacy0
Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains0
TimeLogic: A Temporal Logic Benchmark for Video QA0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language0
Show:102550
← PrevPage 5 of 12Next →

No leaderboard results yet.