SOTAVerified

Benchmarking

Papers

Showing 30013050 of 5548 papers

TitleStatusHype
Are SNNs Truly Energy-efficient? - A Hardware Perspective0
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models0
A skeletonization algorithm for gradient-based optimizationCode1
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking0
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of DatasetsCode0
Benchmarking Large Language Models in Retrieval-Augmented GenerationCode2
Hybrid data driven/thermal simulation model for comfort assessment0
Benchmarking Autoregressive Conditional Diffusion Models for Turbulent Flow SimulationCode1
Orientation-Independent Chinese Text Recognition in Scene ImagesCode2
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees0
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction0
NeMig -- A Bilingual News Collection and Knowledge Graph about MigrationCode0
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning0
Can humans help BERT gain "confidence"?0
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Benchmarking the Generation of Fact Checking ExplanationsCode1
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadataCode1
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictionsCode3
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
MLLM-DataEngine: An Iterative Refinement Approach for MLLMCode1
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
Topical-Chat: Towards Knowledge-Grounded Open-Domain ConversationsCode2
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
LLMRec: Benchmarking Large Language Models on Recommendation TaskCode1
Efficient Benchmarking of Language Models0
Expecting The Unexpected: Towards Broad Out-Of-Distribution DetectionCode0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman ProcessCode0
Beyond MD17: the reactive xxMD datasetCode0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning0
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical RepresentationsCode1
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing RisksCode0
Benchmarking Neural Network Generalization for Grammar InductionCode1
Benchmarking Adversarial Robustness of Compressed Deep Learning Models0
IoT Data Trust Evaluation via Machine LearningCode0
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions0
A Survey on Model Compression for Large Language Models0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ SegmentationCode0
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous AgentsCode2
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields0
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations0
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation0
Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints0
Show:102550
← PrevPage 61 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified