SOTAVerified

Benchmarking

Papers

Showing 11261150 of 5548 papers

TitleStatusHype
Benchmarking LLMs' Swarm intelligenceCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series DataCode1
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with DataCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networksCode1
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image SegmentationCode1
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking PlatformCode1
Deep Learning-Based Synchronization for Uplink NB-IoTCode1
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM AgentsCode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Benchmarking Large Language Models for News SummarizationCode1
Benchmarking machine learning models on multi-centre eICU critical care datasetCode1
3D Common Corruptions and Data AugmentationCode1
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural NetworksCode1
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
Benchmarking Meaning Representations in Neural Semantic ParsingCode1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
Benchmarking Meta-embeddings: What Works and What Does NotCode1
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosCode1
Benchmarking Micro-action Recognition: Dataset, Methods, and ApplicationsCode1
DFGC 2022: The Second DeepFake Game CompetitionCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Show:102550
← PrevPage 46 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified