| AlphaZip: Neural Network-Enhanced Lossless Text Compression | Sep 23, 2024 | BenchmarkingData Compression | CodeCode Available | 0 |
| Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images | Sep 23, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Building a continuous benchmarking ecosystem in bioinformatics | Sep 23, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Edge AI Platforms for High-Performance ML Inference | Sep 23, 2024 | BenchmarkingCPU | —Unverified | 0 |
| Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking | Sep 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra | Sep 22, 2024 | Benchmarking | —Unverified | 0 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 |
| Margin-bounded Confidence Scores for Out-of-Distribution Detection | Sep 22, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 0 |
| @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Sep 21, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Present and Future Generalization of Synthetic Image Detectors | Sep 21, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators | Sep 21, 2024 | Benchmarking | CodeCode Available | 0 |
| An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions | Sep 21, 2024 | BenchmarkingScheduling | —Unverified | 0 |
| CONGRA: Benchmarking Automatic Conflict Resolution | Sep 21, 2024 | Benchmarking | CodeCode Available | 0 |
| Efficient and Effective Model Extraction | Sep 21, 2024 | Benchmarkingmodel | CodeCode Available | 0 |
| Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection | Sep 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time | Sep 20, 2024 | BenchmarkingWorld Knowledge | —Unverified | 0 |
| STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions | Sep 20, 2024 | BenchmarkingSensitivity | CodeCode Available | 0 |
| CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data | Sep 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks | Sep 20, 2024 | Benchmarkingobject-detection | —Unverified | 0 |
| Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation | Sep 19, 2024 | BenchmarkingSocial Navigation | —Unverified | 0 |
| MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | Sep 19, 2024 | Benchmarking | —Unverified | 0 |
| Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards | Sep 19, 2024 | Benchmarking | CodeCode Available | 0 |
| ASR Benchmarking: Need for a More Representative Conversational Dataset | Sep 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Efficacy of Synthetic Data as a Benchmark | Sep 18, 2024 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Hard-Label Cryptanalytic Extraction of Neural Network Models | Sep 18, 2024 | Benchmarking | CodeCode Available | 0 |
| PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models | Sep 18, 2024 | BenchmarkingModel Selection | CodeCode Available | 0 |
| Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II | Sep 17, 2024 | BenchmarkingDescriptive | CodeCode Available | 0 |
| WER We Stand: Benchmarking Urdu ASR Models | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection | Sep 17, 2024 | BenchmarkingEvent Detection | CodeCode Available | 0 |
| THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Sep 17, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration | Sep 17, 2024 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact | Sep 17, 2024 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| Benchmarking VLMs' Reasoning About Persuasive Atypical Images | Sep 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 |
| Benchmarking Large Language Model Uncertainty for Prompt Optimization | Sep 16, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data | Sep 15, 2024 | Benchmarkingtext annotation | —Unverified | 0 |
| LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study | Sep 13, 2024 | BenchmarkingGrapheme-to-Phoneme Conversion | —Unverified | 0 |
| Text-To-Speech Synthesis In The Wild | Sep 13, 2024 | BenchmarkingSpeaker Recognition | —Unverified | 0 |
| Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering | Sep 13, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Sep 12, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine | Sep 12, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints | Sep 12, 2024 | Benchmarking | CodeCode Available | 0 |
| Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots | Sep 12, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG | Sep 12, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification | Sep 12, 2024 | BenchmarkingClassification | —Unverified | 0 |
| Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning | Sep 12, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I | Sep 12, 2024 | BenchmarkingCPU | CodeCode Available | 0 |
| Benchmarking 2D Egocentric Hand Pose Datasets | Sep 11, 2024 | Activity RecognitionBenchmarking | —Unverified | 0 |
| Understanding Foundation Models: Are We Back in 1924? | Sep 11, 2024 | Benchmarking | —Unverified | 0 |
| Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition | Sep 11, 2024 | BenchmarkingNovelty Detection | CodeCode Available | 0 |