SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 99019950 of 661570 papers

TitleStatusHype
Aligning Modalities in Vision Large Language Models via Preference Fine-tuningCode2
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data VisualizationCode2
Continual Learning on Graphs: Challenges, Solutions, and OpportunitiesCode2
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A BenchmarkCode2
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal ReasoningCode2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionCode2
3D Point Cloud Compression with Recurrent Neural Network and Image Compression MethodsCode2
Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge ComputingCode2
Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative FilteringCode2
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based AgentsCode2
CoLLaVO: Crayon Large Language and Vision mOdelCode2
Optimizing tiny colorless feedback delay networksCode2
EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked InputsCode2
PEDANTS: Cheap but Effective and Interpretable Answer EquivalenceCode2
Beyond Generalization: A Survey of Out-Of-Distribution Adaptation on GraphsCode2
Centroid-Based Efficient Minimum Bayes Risk DecodingCode2
An end-to-end attention-based approach for learning on graphsCode2
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language ModelCode2
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsCode2
When is Tree Search Useful for LLM Planning? It Depends on the DiscriminatorCode2
Large Language Models as Zero-shot Dialogue State Tracker through Function CallingCode2
Do Llamas Work in English? On the Latent Language of Multilingual TransformersCode2
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)Code2
ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity AlignmentCode2
TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection ModelsCode2
Distillation Enhanced Generative RetrievalCode2
Incremental Sequence Labeling: A Tale of Two ShiftsCode2
Linear Transformers with Learnable Kernel Functions are Better In-Context ModelsCode2
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation ModelsCode2
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution AdaptationCode2
Recovering the Pre-Fine-Tuning Weights of Generative ModelsCode2
Chain-of-Thought Reasoning Without PromptingCode2
X-maps: Direct Depth Lookup for Event-based Structured Light SystemsCode2
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction SimulatorCode2
PAL: Proxy-Guided Black-Box Attack on Large Language ModelsCode2
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise AttentionCode2
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical FeedbackCode2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical DomainsCode2
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type InferenceCode2
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image PersonalizationCode2
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inferenceCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
A StrongREJECT for Empty JailbreaksCode2
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language ModelsCode2
PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving EnvironmentsCode2
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning DatasetCode2
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM InferenceCode2
Universal Machine Learning Kohn-Sham Hamiltonian for MaterialsCode2
Personalized Large Language ModelsCode2
Less is More: Fewer Interpretable Region via Submodular Subset SelectionCode2
Show:102550
← PrevPage 199 of 13232Next →