SOTAVerified

16k

Papers

Showing 150 of 146 papers

TitleStatusHype
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
Global Structure-from-Motion RevisitedCode7
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
Code Llama: Open Foundation Models for CodeCode6
Learning to (Learn at Test Time): RNNs with Expressive Hidden StatesCode5
Long-form factuality in large language modelsCode4
Training-Free Long-Context Scaling of Large Language ModelsCode3
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
SnapKV: LLM Knows What You are Looking for Before GenerationCode3
LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model TransformationCode3
Investigating Efficiently Extending Transformers for Long Input SummarizationCode3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
M+: Extending MemoryLLM with Scalable Long-Term MemoryCode3
LinFusion: 1 GPU, 1 Minute, 16K ImageCode3
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256KCode2
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the KeyCode2
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image UnderstandingCode2
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsCode2
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationCode2
Giraffe: Adventures in Expanding Context Lengths in LLMsCode2
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMsCode1
COUGH: A Challenge Dataset and Models for COVID-19 FAQ RetrievalCode1
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of ImagesCode1
Complex Temporal Question Answering on Knowledge GraphsCode1
Faster Causal Attention Over Large Sequences Through Sparse Flash AttentionCode1
In-Context Learning with Many Demonstration ExamplesCode1
Scaling Laws of RoPE-based ExtrapolationCode1
DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single CameraCode1
SMYRF: Efficient Attention using Asymmetric ClusteringCode1
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon TasksCode1
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured AttentionCode1
Home Electricity Data Generator (HEDGE): An open-access tool for the generation of electric vehicle, residential demand, and PV generation profilesCode1
MorphoCluster: Efficient Annotation of Plankton images by ClusteringCode1
MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at ScaleCode1
Classifying the classifier: dissecting the weight space of neural networksCode1
Neural Fourier Modelling: A Highly Compact Approach to Time-Series AnalysisCode1
Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the SocietyCode1
CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin LesionsCode1
An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing ConditionsCode1
Detecting and Preventing Hallucinations in Large Vision Language ModelsCode1
Denial-of-Service Poisoning Attacks against Large Language ModelsCode1
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic PapersCode1
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMsCode1
Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality ReductionCode1
BNLP: Natural language processing toolkit for Bengali languageCode1
Analyzing the Effectiveness of Large Language Models on Text-to-SQL SynthesisCode1
Hydragen: High-Throughput LLM Inference with Shared PrefixesCode1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Suprime21'"1Unverified