SOTAVerified

Benchmarking

Papers

Showing 43514400 of 5548 papers

TitleStatusHype
Ransomware Detection Using Machine Learning in the Linux Kernel0
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration0
RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks0
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis0
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency0
A Comparative study of Hyper-Parameter Optimization Tools0
RDBench: ML Benchmark for Relational Databases0
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models0
RD-Suite: A Benchmark for Ranking Distillation0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models0
RealCause: Realistic Causal Inference Benchmarking0
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data0
A Systematic Analysis of Hybrid Linear Attention0
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection0
Realistic Hair Simulation Using Image Blending0
Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework0
Unifying Few- and Zero-Shot Egocentric Action Recognition0
Real Time Egocentric Object Segmentation: THU-READ Labeling and Benchmarking Results0
Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset0
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences0
Real-time Webcam Heart-Rate and Variability Estimation with Clean Ground Truth for Evaluation0
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering0
Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms0
Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming0
Rearrangement: A Challenge for Embodied AI0
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models0
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature?0
A Comparative Analysis on Ethical Benchmarking in Large Language Models0
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
A Survey on Vision Autoregressive Model0
A Survey on Temporal Sentence Grounding in Videos0
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams0
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?0
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes0
Reconstructing antibody repertoires from error-prone immunosequencing datasets0
A Survey on Preserving Fairness Guarantees in Changing Environments0
A Survey on Model Compression for Large Language Models0
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers0
A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-190
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research0
A Survey on LLM-based News Recommender Systems0
Unitail: Detecting, Reading, and Matching in Retail Scene0
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking0
A Survey of Spanish Clinical Language Models0
Refer to Anything with Vision-Language Prompts0
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models0
Unleashing OpenTitan's Potential: a Silicon-Ready Embedded Secure Element for Root of Trust and Cryptographic Offloading0
A Survey of Small Language Models0
Regularization of ML models for Earth systems by using longer model timesteps0
Show:102550
← PrevPage 88 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified