SOTAVerified

16k

Papers

Showing 150 of 146 papers

TitleStatusHype
UniCode^2: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation0
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention RecyclingCode0
How Far Are We from Optimal Reasoning Efficiency?Code0
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and ItalianCode0
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsCode2
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long SequencesCode0
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured AttentionCode1
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationCode2
PSC: Extending Context Window of Large Language Models via Phase Shift CalibrationCode0
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLMCode0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing ApplicationsCode0
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims0
X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One SecondCode0
Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation0
EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts0
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification0
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMsCode1
M+: Extending MemoryLLM with Scalable Long-Term MemoryCode3
Parallel Sequence Modeling via Generalized Spatial Propagation Network0
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the KeyCode2
Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning0
SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs0
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences0
CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese NovelsCode0
Bimanual Dexterity for Complex Tasks0
Piecing It All Together: Verifying Multi-Hop Multimodal Claims0
Model Editing for LLMs4Code: How Far are We?Code0
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context EvaluationCode0
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
Neural Fourier Modelling: A Highly Compact Approach to Time-Series AnalysisCode1
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension0
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model TransformationCode3
Extending Context Window of Large Language Models from a Distributional PerspectiveCode0
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMsCode1
LinFusion: 1 GPU, 1 Minute, 16K ImageCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
Global Structure-from-Motion RevisitedCode7
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of ImagesCode1
Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies0
Learning to (Learn at Test Time): RNNs with Expressive Hidden StatesCode5
LongIns: A Challenging Long-context Instruction-based Exam for LLMs0
Inferring Pluggable Types with Machine Learning0
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models0
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual UnderstandingCode0
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
An Empirical Study of Mamba-based Language ModelsCode0
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Suprime21'"1Unverified