| Environment-aware UAV Communications: CKM Construction and Predictive Beamforming | Apr 18, 2024 | Benchmarking | —Unverified | 0 |
| Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems | Apr 17, 2024 | BenchmarkingQuantization | —Unverified | 0 |
| Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions | Apr 17, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking changepoint detection algorithms on cardiac time series | Apr 16, 2024 | BenchmarkingChange Point Detection | —Unverified | 0 |
| Iterated Invariant Extended Kalman Filter (IterIEKF) | Apr 16, 2024 | Benchmarking | —Unverified | 0 |
| White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs | Apr 16, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset | Apr 16, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network | Apr 16, 2024 | BenchmarkingMotion Segmentation | —Unverified | 0 |
| MMInA: Benchmarking Multihop Multimodal Internet Agents | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| A Universal Protocol to Benchmark Camera Calibration for Sports | Apr 15, 2024 | BenchmarkingCamera Calibration | —Unverified | 0 |
| AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides | Apr 15, 2024 | BenchmarkingProtein Language Model | CodeCode Available | 0 |
| LLM Evaluators Recognize and Favor Their Own Generations | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach | Apr 15, 2024 | Benchmarkingfeature selection | —Unverified | 0 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 |
| A Large-Scale Evaluation of Speech Foundation Models | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation | Apr 14, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy | Apr 12, 2024 | BenchmarkingCell Segmentation | —Unverified | 0 |
| Exploring the Decentraland Economy: Multifaceted Parcel Attributes, Key Insights, and Benchmarking | Apr 11, 2024 | AttributeBenchmarking | —Unverified | 0 |
| GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models | Apr 10, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Certifying almost all quantum states with few single-qubit measurements | Apr 10, 2024 | AllBenchmarking | —Unverified | 0 |
| DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs | Apr 10, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs | Apr 9, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution | Apr 9, 2024 | Benchmarking | —Unverified | 0 |
| Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS | Apr 9, 2024 | BenchmarkingNeural Architecture Search | CodeCode Available | 0 |
| MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Apr 8, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 |
| Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level | Apr 8, 2024 | Benchmarking | CodeCode Available | 0 |
| HOEG: A New Approach for Object-Centric Predictive Process Monitoring | Apr 8, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 |
| EFSA: Towards Event-Level Financial Sentiment Analysis | Apr 8, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Apr 7, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes | Apr 7, 2024 | Benchmarking | —Unverified | 0 |
| Multicalibration for Confidence Scoring in LLMs | Apr 6, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics | Apr 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| SDFR: Synthetic Data for Face Recognition Competition | Apr 6, 2024 | BenchmarkingFace Recognition | —Unverified | 0 |
| Enhancing Video Summarization with Context Awareness | Apr 6, 2024 | BenchmarkingInformativeness | CodeCode Available | 0 |
| GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System | Apr 5, 2024 | BenchmarkingGPU | —Unverified | 0 |
| Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) | Apr 5, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios | Apr 5, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation | Apr 5, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| Benchmarking ChatGPT on Algorithmic Reasoning | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| Schroedinger's Threshold: When the AUC doesn't predict Accuracy | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior | Apr 4, 2024 | BenchmarkingImage Restoration | —Unverified | 0 |
| A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off | Apr 4, 2024 | Benchmarking | CodeCode Available | 0 |
| NL2KQL: From Natural Language to Kusto Query | Apr 3, 2024 | BenchmarkingNatural Language Queries | —Unverified | 0 |
| PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics | Apr 2, 2024 | Benchmarking | CodeCode Available | 0 |
| On the reduction of Linear Parameter-Varying State-Space models | Apr 2, 2024 | BenchmarkingDimensionality Reduction | —Unverified | 0 |
| Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach | Apr 2, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Apr 1, 2024 | BenchmarkingMath | —Unverified | 0 |
| Diffusion-Driven Domain Adaptation for Generating 3D Molecules | Apr 1, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| SpiralMLP: A Lightweight Vision MLP Architecture | Mar 31, 2024 | Benchmarking | —Unverified | 0 |