SOTAVerified

Benchmarking

Papers

Showing 11511200 of 5548 papers

TitleStatusHype
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality mattersCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal ReasoningCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
HINT3: Raising the bar for Intent Detection in the WildCode1
A global analysis of metrics used for measuring performance in natural language processingCode1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance ImagingCode1
Benchmarking Multidomain English-Indonesian Machine TranslationCode1
Automatic Detection of Generated Text is Easiest when Humans are FooledCode1
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with DataCode1
GRecX: An Efficient and Unified Benchmark for GNN-based RecommendationCode1
Benchmarking Large Language Models for News SummarizationCode1
Graphs, Constraints, and Search for the Abstraction and Reasoning CorpusCode1
GraphWorld: Fake Graphs Bring Real Insights for GNNsCode1
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsCode1
GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent SoftwareCode1
Biomedical Data-to-Text Generation via Fine-Tuning TransformersCode1
A GPU-accelerated Large-scale Simulator for Transportation System Optimization BenchmarkingCode1
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
Benchmarking Simulation-Based InferenceCode1
Benchmarking Visual Localization for Autonomous NavigationCode1
A skeletonization algorithm for gradient-based optimizationCode1
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defensesCode1
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
Boosting Neural Image Compression for Machines Using Latent Space MaskingCode1
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine LearningCode1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice TextCode1
Grounding Descriptions in Images informs Zero-Shot Visual RecognitionCode1
AI Accelerator Survey and TrendsCode1
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation datasetCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth BenchmarkingCode1
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and BenchmarkingCode1
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language ModelsCode1
Show:102550
← PrevPage 24 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified