SOTAVerified

Benchmarking

Papers

Showing 25012550 of 5548 papers

TitleStatusHype
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking0
Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification0
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra0
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation0
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking0
A Theory of Dynamic Benchmarks0
Variational Laplace for Bayesian neural networks0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness0
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination0
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities0
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation0
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals0
GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking0
Benchmarking Robustness in Neural Radiance Fields0
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data0
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Robot Manipulation with the Rubik's Cube0
A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms0
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness0
A Systematic Analysis of Hybrid Linear Attention0
Benchmarking Retrieval-Augmented Generation for Chemistry0
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences0
Airport Capacity and Performance in Europe -- A study of transport economics, service quality and sustainability0
Benchmarking Resource Usage for Efficient Distributed Deep Learning0
Goal-Driven Sequential Data Abstraction0
A Survey on Vision Autoregressive Model0
A Survey on Temporal Sentence Grounding in Videos0
Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper0
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions0
Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)0
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models0
Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings0
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices0
Helsinki Deblur Challenge 2021: description of photographic data0
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams0
A Survey on Preserving Fairness Guarantees in Changing Environments0
Benchmarking Reasoning Robustness in Large Language Models0
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass0
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods0
Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining0
Show:102550
← PrevPage 51 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified