Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2575 of 5548 papers

Title	Date	Tasks	Status
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Nov 15, 2024	Benchmarkingcounterfactual	—Unverified
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking	Nov 14, 2024	BenchmarkingDrug Discovery	—Unverified
A survey of probabilistic generative frameworks for molecular simulations	Nov 14, 2024	BenchmarkingDenoising	CodeCode Available
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset	Nov 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere	Nov 13, 2024	BenchmarkingDataset Generation	—Unverified
A Survey on Vision Autoregressive Model	Nov 13, 2024	3D GenerationBenchmarking	—Unverified
Evaluating the Generation of Spatial Relations in Text and Image Generative Models	Nov 12, 2024	BenchmarkingImage Generation	—Unverified
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Nov 11, 2024	16kBenchmarking	CodeCode Available
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes	Nov 11, 2024	BenchmarkingMulti-Object Tracking	—Unverified
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available
MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design	Nov 10, 2024	3D geometryBenchmarking	—Unverified
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication	Nov 9, 2024	BenchmarkingIntegrated sensing and communication	—Unverified
Benchmarking Distributional Alignment of Large Language Models	Nov 8, 2024	Benchmarking	CodeCode Available
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction	Nov 8, 2024	3D ReconstructionBenchmarking	—Unverified
Open-set object detection: towards unified problem formulation and benchmarking	Nov 8, 2024	Autonomous DrivingBenchmarking	—Unverified
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics	Nov 8, 2024	Benchmarking	—Unverified
Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis	Nov 7, 2024	BenchmarkingModel Selection	—Unverified
Perspective on recent developments and challenges in regulatory and systems genomics	Nov 7, 2024	Benchmarking	—Unverified
Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet	Nov 7, 2024	BenchmarkingCombinatorial Optimization	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images	Nov 7, 2024	AnatomyBenchmarking	—Unverified
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries	Nov 7, 2024	Benchmarking	—Unverified
Benchmarking Large Language Models with Integer Sequence Generation Tasks	Nov 7, 2024	BenchmarkingComputational Efficiency	—Unverified

Show:10 25 50

← PrevPage 103 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified