SOTAVerified

Benchmarking

Papers

Showing 22512275 of 5548 papers

TitleStatusHype
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation FrameworkCode0
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability predictionCode0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual ProgramCode0
A Seq2Seq approach to Symbolic RegressionCode0
Harnessing Orthogonality to Train Low-Rank Neural NetworksCode0
High-Quality, ROS Compatible Video Encoding and Decoding for High-Definition DatasetsCode0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop ReasoningCode0
A Continuous Optimisation Benchmark Suite from Neural Network RegressionCode0
Hard-Label Cryptanalytic Extraction of Neural Network ModelsCode0
Benchmarking multi-component signal processing methods in the time-frequency planeCode0
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device ScenariosCode0
Dynamic Neighborhood Construction for Structured Large Discrete Action SpacesCode0
Hardware Aware Neural Network Architectures using FbNetCode0
Aggregated Attributions for Explanatory Analysis of 3D Segmentation ModelsCode0
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and GazeboCode0
Benchmarking MOEAs for solving continuous multi-objective RL problemsCode0
Benchmarking Model-Based Reinforcement LearningCode0
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time AppsCode0
Benchmarking Misuse Mitigation Against Covert AdversariesCode0
Benchmarking missing-values approaches for predictive models on health databasesCode0
Harmonization Benchmarking Tool for Neuroimaging DatasetsCode0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE CorpusCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
Show:102550
← PrevPage 91 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified