Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2526–2550 of 5548 papers

Title	Date	Tasks	Status	Hype
A Functional Analysis Approach to Symbolic Regression	Feb 9, 2024	Benchmarkingregression	—Unverified	0
Transparent and Scrutable Recommendations Using Natural Language User Profiles	Feb 8, 2024	BenchmarkingDescriptive	CodeCode Available	0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction	Feb 8, 2024	BenchmarkingFace Image Quality	—Unverified	0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available	0
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	Feb 8, 2024	BenchmarkingDiversity	CodeCode Available	7
Improved off-policy training of diffusion samplers	Feb 7, 2024	Benchmarking	CodeCode Available	1
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available	0
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior	Feb 7, 2024	BenchmarkingDecoder	CodeCode Available	2
Towards Biologically Plausible and Private Gene Expression Data Generation	Feb 7, 2024	Benchmarking	CodeCode Available	0
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology	Feb 6, 2024	AllBenchmarking	CodeCode Available	2
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K	Feb 6, 2024	16kBenchmarking	CodeCode Available	2
Quantitative Metrics for Benchmarking Medical Image Harmonization	Feb 6, 2024	AnatomyBenchmarking	—Unverified	0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification	Feb 6, 2024	BenchmarkingMultiple-choice	—Unverified	0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection	Feb 6, 2024	Benchmarking	CodeCode Available	0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation	Feb 5, 2024	BenchmarkingImage Segmentation	CodeCode Available	0
PowerGraph: A power grid benchmark dataset for graph neural networks	Feb 5, 2024	ArticlesBenchmarking	—Unverified	0
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching	Feb 5, 2024	BenchmarkingSentence	CodeCode Available	1
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations	Feb 3, 2024	Benchmarking	CodeCode Available	0
EffiBench: Benchmarking the Efficiency of Automatically Generated Code	Feb 3, 2024	BenchmarkingCode Completion	CodeCode Available	2
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection	Feb 3, 2024	BenchmarkingHate Speech Detection	CodeCode Available	0
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning	Feb 3, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
Can LLMs perform structured graph reasoning?	Feb 2, 2024	BenchmarkingNavigate	CodeCode Available	0
Variational Quantum Circuits Enhanced Generative Adversarial Network	Feb 2, 2024	BenchmarkingGenerative Adversarial Network	—Unverified	0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality	Feb 1, 2024	Benchmarking	—Unverified	0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures	Feb 1, 2024	AnatomyBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 102 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified