SOTAVerified

Benchmarking

Papers

Showing 51015125 of 5548 papers

TitleStatusHype
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and ReasoningCode0
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence BenchmarksCode0
Benchmarking LLM-based Relevance Judgment MethodsCode0
Toward 3D Object Reconstruction from Stereo ImagesCode0
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
Skelite: Compact Neural Networks for Efficient Iterative SkeletonizationCode0
Divergent Creativity in Humans and Large Language ModelsCode0
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time SeriesCode0
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and MetricCode0
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue SystemsCode0
User-Guided Deep Anime Line Art Colorization with Conditional Adversarial NetworksCode0
Towards a Benchmark for Large Language Models for Business Process Management TasksCode0
Weighting-Based Treatment Effect Estimation via Distribution LearningCode0
Slot Filling for Extracting Reskilling and Upskilling Options from the WebCode0
On Pitfalls of RemOve-And-Retrain: Data Processing Inequality PerspectiveCode0
Distributional Depth-Based Estimation of Object Articulation ModelsCode0
Benchmarking Linguistic Diversity of Large Language ModelsCode0
On Recurrent Neural Networks for Sequence-based Processing in CommunicationsCode0
Benchmarking Learning Efficiency in Deep Reservoir ComputingCode0
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive EvaluationCode0
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer SectionsCode0
Benchmarking Large Language Model Uncertainty for Prompt OptimizationCode0
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining DatasetsCode0
On the Evaluation Consistency of Attribution-based ExplanationsCode0
On the Evaluation of Conditional GANsCode0
Show:102550
← PrevPage 205 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified