Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 5548 papers

Title	Date	Tasks	Status
AlphaZip: Neural Network-Enhanced Lossless Text Compression	Sep 23, 2024	BenchmarkingData Compression	CodeCode Available
Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images	Sep 23, 2024	BenchmarkingSegmentation	CodeCode Available
Building a continuous benchmarking ecosystem in bioinformatics	Sep 23, 2024	Benchmarking	—Unverified
Benchmarking Edge AI Platforms for High-Performance ML Inference	Sep 23, 2024	BenchmarkingCPU	—Unverified
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking	Sep 23, 2024	BenchmarkingDiversity	CodeCode Available
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests	Sep 22, 2024	Benchmarking	—Unverified
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra	Sep 22, 2024	Benchmarking	—Unverified
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available
Margin-bounded Confidence Scores for Out-of-Distribution Detection	Sep 22, 2024	Autonomous DrivingBenchmarking	CodeCode Available
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology	Sep 21, 2024	BenchmarkingDepth Estimation	—Unverified
Present and Future Generalization of Synthetic Image Detectors	Sep 21, 2024	BenchmarkingDiversity	CodeCode Available
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators	Sep 21, 2024	Benchmarking	CodeCode Available
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions	Sep 21, 2024	BenchmarkingScheduling	—Unverified
CONGRA: Benchmarking Automatic Conflict Resolution	Sep 21, 2024	Benchmarking	CodeCode Available
Efficient and Effective Model Extraction	Sep 21, 2024	Benchmarkingmodel	CodeCode Available
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection	Sep 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time	Sep 20, 2024	BenchmarkingWorld Knowledge	—Unverified
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions	Sep 20, 2024	BenchmarkingSensitivity	CodeCode Available
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data	Sep 20, 2024	BenchmarkingLanguage Modeling	—Unverified
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks	Sep 20, 2024	Benchmarkingobject-detection	—Unverified
Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation	Sep 19, 2024	BenchmarkingSocial Navigation	—Unverified
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines	Sep 19, 2024	Benchmarking	—Unverified
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards	Sep 19, 2024	Benchmarking	CodeCode Available
ASR Benchmarking: Need for a More Representative Conversational Dataset	Sep 18, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Efficacy of Synthetic Data as a Benchmark	Sep 18, 2024	BenchmarkingFew-Shot Learning	—Unverified
Hard-Label Cryptanalytic Extraction of Neural Network Models	Sep 18, 2024	Benchmarking	CodeCode Available
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models	Sep 18, 2024	BenchmarkingModel Selection	CodeCode Available
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available
WER We Stand: Benchmarking Urdu ASR Models	Sep 17, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection	Sep 17, 2024	BenchmarkingEvent Detection	CodeCode Available
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Sep 17, 2024	BenchmarkingBinary Classification	CodeCode Available
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration	Sep 17, 2024	Benchmarkingcounterfactual	CodeCode Available
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact	Sep 17, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Benchmarking VLMs' Reasoning About Persuasive Atypical Images	Sep 16, 2024	BenchmarkingObject Recognition	—Unverified
Benchmarking Large Language Model Uncertainty for Prompt Optimization	Sep 16, 2024	BenchmarkingDiversity	CodeCode Available
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data	Sep 15, 2024	Benchmarkingtext annotation	—Unverified
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study	Sep 13, 2024	BenchmarkingGrapheme-to-Phoneme Conversion	—Unverified
Text-To-Speech Synthesis In The Wild	Sep 13, 2024	BenchmarkingSpeaker Recognition	—Unverified
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Sep 13, 2024	BenchmarkingBinary Classification	—Unverified
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Sep 12, 2024	BenchmarkingLanguage Modeling	—Unverified
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine	Sep 12, 2024	Autonomous DrivingBenchmarking	—Unverified
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots	Sep 12, 2024	BenchmarkingChatbot	—Unverified
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG	Sep 12, 2024	BenchmarkingQuestion Answering	—Unverified
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification	Sep 12, 2024	BenchmarkingClassification	—Unverified
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning	Sep 12, 2024	BenchmarkingFairness	—Unverified
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available
Benchmarking 2D Egocentric Hand Pose Datasets	Sep 11, 2024	Activity RecognitionBenchmarking	—Unverified
Understanding Foundation Models: Are We Back in 1924?	Sep 11, 2024	Benchmarking	—Unverified
Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition	Sep 11, 2024	BenchmarkingNovelty Detection	CodeCode Available

Show:10 25 50

← PrevPage 56 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified