| Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing | Jun 11, 2024 | BenchmarkingStance Detection | —Unverified | 0 |
| Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning | Jun 11, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition | Jun 11, 2024 | BenchmarkingCross-corpus | —Unverified | 0 |
| Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images | Jun 11, 2024 | BenchmarkingGPU | —Unverified | 0 |
| MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models | Jun 11, 2024 | BenchmarkingFairness | —Unverified | 0 |
| INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition | Jun 10, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Can Language Models Serve as Text-Based World Simulators? | Jun 10, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking | Jun 10, 2024 | BenchmarkingEconometrics | —Unverified | 0 |
| Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture | Jun 10, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Data-driven Power Flow Linearization: Simulation | Jun 10, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications | Jun 8, 2024 | BenchmarkingMamba | —Unverified | 0 |
| 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Jun 8, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models | Jun 7, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Deep Jansen-Rit Parameter Inference for Model-Driven Analysis of Brain Activity | Jun 7, 2024 | BenchmarkingEEG | CodeCode Available | 0 |
| Scenarios and Approaches for Situated Natural Language Explanations | Jun 7, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Behavior Structformer: Learning Players Representations with Structured Tokenization | Jun 7, 2024 | Benchmarking | —Unverified | 0 |
| VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric | Jun 7, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Jun 7, 2024 | Benchmarking | —Unverified | 0 |
| Better Late Than Never: Formulating and Benchmarking Recommendation Editing | Jun 6, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation | Jun 6, 2024 | BenchmarkingDrug Discovery | —Unverified | 0 |
| Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As | Jun 6, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| NATURAL PLAN: Benchmarking LLMs on Natural Language Planning | Jun 6, 2024 | BenchmarkingScheduling | —Unverified | 0 |
| BEADs: Bias Evaluation Across Domains | Jun 6, 2024 | BenchmarkingBias Detection | —Unverified | 0 |
| Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices | Jun 6, 2024 | BenchmarkingRAG | —Unverified | 0 |