Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2550 of 5548 papers

Title	Date	Tasks	Status	Hype
SAWEC: Sensing-Assisted Wireless Edge Computing	Feb 15, 2024	BenchmarkingEdge-computing	CodeCode Available	0
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	Feb 15, 2024	BenchmarkingDiagnostic	CodeCode Available	2
From Variability to Stability: Advancing RecSys Benchmarking Practices	Feb 15, 2024	BenchmarkingCollaborative Filtering	CodeCode Available	0
Multi-Fidelity Methods for Optimization: A Survey	Feb 15, 2024	BenchmarkingComputational Efficiency	—Unverified	0
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse	Feb 15, 2024	BenchmarkingModel Editing	CodeCode Available	0
Evaluation of simulation methods for tumor subclonal reconstruction	Feb 14, 2024	Benchmarking	—Unverified	0
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking	Feb 14, 2024	BenchmarkingLanguage Modelling	CodeCode Available	1
MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models	Feb 14, 2024	BenchmarkingDiversity	CodeCode Available	2
Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning Algorithms	Feb 14, 2024	Autonomous DrivingBenchmarking	—Unverified	0
Benchmarking multi-component signal processing methods in the time-frequency plane	Feb 13, 2024	BenchmarkingDenoising	CodeCode Available	0
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available	0
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents	Feb 13, 2024	BenchmarkingModel Selection	CodeCode Available	2
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified	0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Customizable Perturbation Synthesis for Robust SLAM Benchmarking	Feb 12, 2024	BenchmarkingSimultaneous Localization and Mapping	CodeCode Available	2
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems	Feb 12, 2024	Benchmarking	—Unverified	0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified	0
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension	Feb 12, 2024	2kAutomatic Speech Recognition	CodeCode Available	2
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Explainable Global Wildfire Prediction Models using Graph Neural Networks	Feb 11, 2024	BenchmarkingCommunity Detection	CodeCode Available	1
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices	Feb 10, 2024	Benchmarking	—Unverified	0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified	0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation	Feb 9, 2024	6D Pose Estimation using RGBBenchmarking	—Unverified	0
Retrieve, Merge, Predict: Augmenting Tables with Data Lakes	Feb 9, 2024	AutoMLBenchmarking	CodeCode Available	1
A Functional Analysis Approach to Symbolic Regression	Feb 9, 2024	Benchmarkingregression	—Unverified	0
Transparent and Scrutable Recommendations Using Natural Language User Profiles	Feb 8, 2024	BenchmarkingDescriptive	CodeCode Available	0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction	Feb 8, 2024	BenchmarkingFace Image Quality	—Unverified	0
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available	0
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	Feb 8, 2024	BenchmarkingDiversity	CodeCode Available	7
Improved off-policy training of diffusion samplers	Feb 7, 2024	Benchmarking	CodeCode Available	1
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available	0
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior	Feb 7, 2024	BenchmarkingDecoder	CodeCode Available	2
Towards Biologically Plausible and Private Gene Expression Data Generation	Feb 7, 2024	Benchmarking	CodeCode Available	0
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology	Feb 6, 2024	AllBenchmarking	CodeCode Available	2
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K	Feb 6, 2024	16kBenchmarking	CodeCode Available	2
Quantitative Metrics for Benchmarking Medical Image Harmonization	Feb 6, 2024	AnatomyBenchmarking	—Unverified	0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification	Feb 6, 2024	BenchmarkingMultiple-choice	—Unverified	0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection	Feb 6, 2024	Benchmarking	CodeCode Available	0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation	Feb 5, 2024	BenchmarkingImage Segmentation	CodeCode Available	0
PowerGraph: A power grid benchmark dataset for graph neural networks	Feb 5, 2024	ArticlesBenchmarking	—Unverified	0
JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching	Feb 5, 2024	BenchmarkingSentence	CodeCode Available	1
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations	Feb 3, 2024	Benchmarking	CodeCode Available	0
EffiBench: Benchmarking the Efficiency of Automatically Generated Code	Feb 3, 2024	BenchmarkingCode Completion	CodeCode Available	2
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection	Feb 3, 2024	BenchmarkingHate Speech Detection	CodeCode Available	0
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning	Feb 3, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
Can LLMs perform structured graph reasoning?	Feb 2, 2024	BenchmarkingNavigate	CodeCode Available	0
Variational Quantum Circuits Enhanced Generative Adversarial Network	Feb 2, 2024	BenchmarkingGenerative Adversarial Network	—Unverified	0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality	Feb 1, 2024	Benchmarking	—Unverified	0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures	Feb 1, 2024	AnatomyBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 51 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified