| Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery | Mar 28, 2025 | BenchmarkingSegmentation | —Unverified | 0 |
| Benchmarking Deep Learning-Based Methods for Irradiance Nowcasting with Sky Images | Mar 27, 2025 | Benchmarking | —Unverified | 0 |
| ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition | Mar 27, 2025 | Benchmarkingscientific discovery | —Unverified | 0 |
| CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? | Mar 27, 2025 | BenchmarkingSpecificity | CodeCode Available | 0 |
| Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance | Mar 27, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics | Mar 27, 2025 | BenchmarkingNatural Language Queries | —Unverified | 0 |
| CSPO: Cross-Market Synergistic Stock Price Movement Forecasting with Pseudo-volatility Optimization | Mar 26, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking and optimizing organism wide single-cell RNA alignment methods | Mar 26, 2025 | BenchmarkingDecoder | CodeCode Available | 0 |
| Can geometric combinatorics improve RNA branching predictions? | Mar 26, 2025 | Benchmarking | CodeCode Available | 0 |
| RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy | Mar 26, 2025 | BenchmarkingRepresentation Learning | —Unverified | 0 |
| Benchmarking Machine Learning Methods for Distributed Acoustic Sensing | Mar 26, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy | Mar 25, 2025 | Benchmarkingspeech-recognition | —Unverified | 0 |
| Reservoir Computing with a Single Oscillating Gas Bubble: Emphasizing the Chaotic Regime | Mar 25, 2025 | BenchmarkingLearning Theory | —Unverified | 0 |
| Writing as a testbed for open ended agents | Mar 25, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis | Mar 24, 2025 | BenchmarkingImage Reconstruction | —Unverified | 0 |
| EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Mar 24, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition | Mar 24, 2025 | BenchmarkingFood Recognition | —Unverified | 0 |
| Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling | Mar 24, 2025 | BenchmarkingOpenAI Gym | CodeCode Available | 0 |
| Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages | Mar 24, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages | Mar 24, 2025 | Benchmarking | CodeCode Available | 0 |
| Regularization of ML models for Earth systems by using longer model timesteps | Mar 23, 2025 | Benchmarking | —Unverified | 0 |
| Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering | Mar 23, 2025 | BenchmarkingChart Question Answering | —Unverified | 0 |
| A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives | Mar 23, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning | Mar 23, 2025 | Benchmarking | CodeCode Available | 0 |
| CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data | Mar 22, 2025 | BenchmarkingDisease Prediction | —Unverified | 0 |