SOTAVerified

Benchmarking

Papers

Showing 25012550 of 5548 papers

TitleStatusHype
SAWEC: Sensing-Assisted Wireless Edge ComputingCode0
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction SimulatorCode2
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
Multi-Fidelity Methods for Optimization: A Survey0
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseCode0
Evaluation of simulation methods for tumor subclonal reconstruction0
Massively Multi-Cultural Knowledge Acquisition & LM BenchmarkingCode1
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language ModelsCode2
Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms0
Benchmarking multi-component signal processing methods in the time-frequency planeCode0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
Privacy-Preserving Language Model Inference with Instance Obfuscation0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
Explainable Global Wildfire Prediction Models using Graph Neural NetworksCode1
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation0
Retrieve, Merge, Predict: Augmenting Tables with Data LakesCode1
A Functional Analysis Approach to Symbolic Regression0
Transparent and Scrutable Recommendations Using Natural Language User ProfilesCode0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and DatasetCode0
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
Improved off-policy training of diffusion samplersCode1
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph PriorCode2
Towards Biologically Plausible and Private Gene Expression Data GenerationCode0
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and CosmologyCode2
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256KCode2
Quantitative Metrics for Benchmarking Medical Image Harmonization0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness DetectionCode0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationCode0
PowerGraph: A power grid benchmark dataset for graph neural networks0
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill MatchingCode1
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based VisualizationsCode0
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
Probing Critical Learning Dynamics of PLMs for Hate Speech DetectionCode0
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge LearningCode1
Can LLMs perform structured graph reasoning?Code0
Variational Quantum Circuits Enhanced Generative Adversarial Network0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures0
Show:102550
← PrevPage 51 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified