Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2551–2600 of 5548 papers

Title	Date	Tasks	Status
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods	Nov 15, 2024	3D ReconstructionBenchmarking	—Unverified
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking	Nov 14, 2024	BenchmarkingDrug Discovery	—Unverified
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available
A survey of probabilistic generative frameworks for molecular simulations	Nov 14, 2024	BenchmarkingDenoising	CodeCode Available
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset	Nov 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere	Nov 13, 2024	BenchmarkingDataset Generation	—Unverified
A Survey on Vision Autoregressive Model	Nov 13, 2024	3D GenerationBenchmarking	—Unverified
Evaluating the Generation of Spatial Relations in Text and Image Generative Models	Nov 12, 2024	BenchmarkingImage Generation	—Unverified
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes	Nov 11, 2024	BenchmarkingMulti-Object Tracking	—Unverified
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Nov 11, 2024	16kBenchmarking	CodeCode Available
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available
MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design	Nov 10, 2024	3D geometryBenchmarking	—Unverified
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication	Nov 9, 2024	BenchmarkingIntegrated sensing and communication	—Unverified
Benchmarking Distributional Alignment of Large Language Models	Nov 8, 2024	Benchmarking	CodeCode Available
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction	Nov 8, 2024	3D ReconstructionBenchmarking	—Unverified
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics	Nov 8, 2024	Benchmarking	—Unverified
Open-set object detection: towards unified problem formulation and benchmarking	Nov 8, 2024	Autonomous DrivingBenchmarking	—Unverified
Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis	Nov 7, 2024	BenchmarkingModel Selection	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
Perspective on recent developments and challenges in regulatory and systems genomics	Nov 7, 2024	Benchmarking	—Unverified
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images	Nov 7, 2024	AnatomyBenchmarking	—Unverified
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries	Nov 7, 2024	Benchmarking	—Unverified
Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet	Nov 7, 2024	BenchmarkingCombinatorial Optimization	—Unverified
Benchmarking Large Language Models with Integer Sequence Generation Tasks	Nov 7, 2024	BenchmarkingComputational Efficiency	—Unverified
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale	Nov 7, 2024	Active LearningBenchmarking	—Unverified
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking	Nov 6, 2024	Benchmarking	—Unverified
Beemo: Benchmark of Expert-edited Machine-generated Outputs	Nov 6, 2024	Benchmarking	CodeCode Available
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration	Nov 5, 2024	Benchmarkingregression	—Unverified
On the Loss of Context-awareness in General Instruction Fine-tuning	Nov 5, 2024	BenchmarkingInstruction Following	CodeCode Available
TDDBench: A Benchmark for Training data detection	Nov 5, 2024	BenchmarkingComputational Efficiency	—Unverified
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level	Nov 5, 2024	Bayesian OptimisationBenchmarking	—Unverified
Imagining and building wise machines: The centrality of AI metacognition	Nov 4, 2024	BenchmarkingNavigate	—Unverified
Benchmarking XAI Explanations with Human-Aligned Evaluations	Nov 4, 2024	Benchmarking	—Unverified
SinaTools: Open Source Toolkit for Arabic Natural Language Processing	Nov 3, 2024	BenchmarkingLemmatization	—Unverified
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models	Nov 2, 2024	Benchmarking	—Unverified
FEET: A Framework for Evaluating Embedding Techniques	Nov 2, 2024	BenchmarkingRepresentation Learning	CodeCode Available
Artificial Intelligence for Microbiology and Microbiome Research	Nov 2, 2024	BenchmarkingDeep Learning	—Unverified
Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations	Nov 1, 2024	BenchmarkingComputational Efficiency	—Unverified
Benchmarking Bias in Large Language Models during Role-Playing	Nov 1, 2024	BenchmarkingFairness	—Unverified
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Nov 1, 2024	BenchmarkingSemantic Segmentation	CodeCode Available
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model	Nov 1, 2024	BenchmarkingCross-Domain Named Entity Recognition	—Unverified
A Review of Reinforcement Learning in Financial Applications	Nov 1, 2024	BenchmarkingDecision Making	—Unverified
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available
Benchmark Data Repositories for Better Benchmarking	Oct 31, 2024	Benchmarking	—Unverified
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation	Oct 30, 2024	BenchmarkingContinual Learning	CodeCode Available
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning	Oct 30, 2024	BenchmarkingHallucination	—Unverified
Evaluating Cultural and Social Awareness of LLM Web Agents	Oct 30, 2024	BenchmarkingNavigate	—Unverified
Low-Density 3D Point Cloud Classification	Oct 30, 2024	3D Point Cloud ClassificationAutonomous Driving	—Unverified
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	Oct 30, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 52 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified