Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1926–1950 of 5548 papers

Title	Date	Tasks	Status
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis	Apr 25, 2025	Benchmarking	—Unverified
QuantBench: Benchmarking AI Methods for Quantitative Investment	Apr 24, 2025	BenchmarkingContinual Learning	—Unverified
Token Sequence Compression for Efficient Multimodal Computing	Apr 24, 2025	Benchmarking	—Unverified
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies	Apr 24, 2025	Benchmarking	—Unverified
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories	Apr 23, 2025	Benchmarking	CodeCode Available
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark	Apr 23, 2025	Benchmarking	CodeCode Available
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations	Apr 22, 2025	BenchmarkingFew-Shot Learning	—Unverified
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room	Apr 22, 2025	BenchmarkingFairness	—Unverified
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents	Apr 22, 2025	BenchmarkingCross-Lingual Information Retrieval	—Unverified
Fluorescence Reference Target Quantitative Analysis Library	Apr 22, 2025	Benchmarking	CodeCode Available
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs	Apr 22, 2025	BenchmarkingClass-level Code Generation	—Unverified
Benchmarking machine learning models for predicting aerofoil performance	Apr 22, 2025	Benchmarking	—Unverified
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3	Apr 22, 2025	BenchmarkingLanguage Modeling	—Unverified
Establishing Reliability Metrics for Reward Models in Large Language Models	Apr 21, 2025	Benchmarking	—Unverified
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture	Apr 21, 2025	Benchmarkingclass-incremental learning	—Unverified
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues	Apr 21, 2025	BenchmarkingSpeaker Identification	—Unverified
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation	Apr 21, 2025	Benchmarking	CodeCode Available
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays	Apr 20, 2025	3D ReconstructionAnatomy	—Unverified
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents	Apr 20, 2025	BenchmarkingTask Planning	—Unverified
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation	Apr 19, 2025	BenchmarkingImage Restoration	—Unverified
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	Apr 19, 2025	Benchmarking	—Unverified
AI Idea Bench 2025: AI Research Idea Generation Benchmark	Apr 19, 2025	Benchmarkingscientific discovery	—Unverified
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers	Apr 19, 2025	BenchmarkingDiagnostic	—Unverified
Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering	Apr 19, 2025	BenchmarkingDataset Generation	—Unverified
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Apr 18, 2025	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 78 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified