SOTAVerified

Benchmarking

Papers

Showing 25262550 of 5548 papers

TitleStatusHype
A Functional Analysis Approach to Symbolic Regression0
Transparent and Scrutable Recommendations Using Natural Language User ProfilesCode0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and DatasetCode0
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
Improved off-policy training of diffusion samplersCode1
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph PriorCode2
Towards Biologically Plausible and Private Gene Expression Data GenerationCode0
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and CosmologyCode2
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256KCode2
Quantitative Metrics for Benchmarking Medical Image Harmonization0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness DetectionCode0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationCode0
PowerGraph: A power grid benchmark dataset for graph neural networks0
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill MatchingCode1
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based VisualizationsCode0
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
Probing Critical Learning Dynamics of PLMs for Hate Speech DetectionCode0
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge LearningCode1
Can LLMs perform structured graph reasoning?Code0
Variational Quantum Circuits Enhanced Generative Adversarial Network0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures0
Show:102550
← PrevPage 102 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified