SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2010120150 of 474278 papers

TitleStatusHype
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosCode1
Context-Based Visual-Language Place RecognitionCode1
Unified Cross-Modal Image Synthesis with Hierarchical Mixture of Product-of-ExpertsCode1
Offline Reinforcement Learning with OOD State Correction and OOD Action SuppressionCode1
FastPCI: Motion-Structure Guided Fast Point Cloud Frame InterpolationCode1
Enhancing Battery Storage Energy Arbitrage with Deep Reinforcement Learning and Time-Series ForecastingCode1
Flow Generator MatchingCode1
Beyond Point Annotation: A Weakly Supervised Network Guided by Multi-Level Labels Generated from Four-Point Annotation for Thyroid Nodule Segmentation in Ultrasound ImageCode1
Applying sparse autoencoders to unlearn knowledge in language modelsCode1
Monge-Ampere Regularization for Learning Arbitrary Shapes from Point CloudsCode1
C^2: Scalable Auto-Feedback for LLM-based Chart GenerationCode1
Classifier Clustering and Feature Alignment for Federated Learning under Distributed Concept DriftCode1
Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic LabelCode1
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache SharingCode1
Demystifying Large Language Models for Medicine: A PrimerCode1
Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical ProblemsCode1
Prototypical Hash Encoding for On-the-Fly Fine-Grained Category DiscoveryCode1
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate HallucinationsCode1
Scale Propagation Network for Generalizable Depth CompletionCode1
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow ExtractionCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Thermal Chameleon: Task-Adaptive Tone-mapping for Radiometric Thermal-Infrared imagesCode1
You Only Look Around: Learning Illumination Invariant Feature for Low-light Object DetectionCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
Hybrid Preferences: Learning to Route Instances for Human vs. AI FeedbackCode1
TEAM: Topological Evolution-aware Framework for Traffic Forecasting--Extended VersionCode1
ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social NetworksCode1
Large Language Models for Financial Aid in Financial Time-series ForecastingCode1
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web TasksCode1
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMsCode1
Optimizing Edge Offloading Decisions for Object DetectionCode1
BIFRÖST: 3D-Aware Image compositing with Language InstructionsCode1
Infogent: An Agent-Based Framework for Web Information AggregationCode1
End-to-end Training for Recommendation with Language-based User ProfilesCode1
VECTOR: Velocity-Enhanced GRU Neural Network for Real-Time 3D UAV Trajectory PredictionCode1
LOGO -- Long cOntext aliGnment via efficient preference OptimizationCode1
WAFFLE: Finetuning Multi-Modal Model for Automated Front-End DevelopmentCode1
From Imitation to Introspection: Probing Self-Consciousness in Language ModelsCode1
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded SamplingCode1
STTATTS: Unified Speech-To-Text And Text-To-Speech ModelCode1
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-DesignCode1
GCoder: Improving Large Language Model for Generalized Graph Problem SolvingCode1
Large Language Models Reflect the Ideology of their CreatorsCode1
Graphusion: A RAG Framework for Knowledge Graph Construction with a Global PerspectiveCode1
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence GuaranteesCode1
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language ModelsCode1
Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked DataCode1
Cross-model Control: Improving Multiple Large Language Models in One-time TrainingCode1
DisenGCD: A Meta Multigraph-assisted Disentangled Graph Learning Framework for Cognitive DiagnosisCode1
Att2CPC: Attention-Guided Lossy Attribute Compression of Point CloudsCode1
Show:102550
← PrevPage 403 of 9486Next →