| Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells | Mar 29, 2024 | Benchmarking | —Unverified | 0 |
| IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context | Mar 29, 2024 | BenchmarkingSentence | CodeCode Available | 0 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data | Mar 27, 2024 | BenchmarkingCancer Classification | —Unverified | 0 |
| GPTs and Language Barrier: A Cross-Lingual Legal QA Examination | Mar 26, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Benchmarking Video Frame Interpolation | Mar 25, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| NSINA: A News Corpus for Sinhala | Mar 25, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts | Mar 25, 2024 | Benchmarking | —Unverified | 0 |
| On the Fragility of Active Learners for Text Classification | Mar 23, 2024 | Active LearningBenchmarking | CodeCode Available | 0 |
| TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring | Mar 23, 2024 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation | Mar 22, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards | Mar 22, 2024 | Benchmarkingenergy management | —Unverified | 0 |
| Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos | Mar 22, 2024 | BenchmarkingTone Mapping | —Unverified | 0 |
| Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes | Mar 22, 2024 | Benchmarking | —Unverified | 0 |
| ChatGPT Alternative Solutions: Large Language Models Survey | Mar 21, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation | Mar 19, 2024 | BenchmarkingSegmentation | —Unverified | 0 |
| MARTA: a model for the automatic phonemic grouping of the parkinsonian speech | Mar 19, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset | Mar 19, 2024 | Action RecognitionBenchmarking | —Unverified | 0 |
| Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks | Mar 18, 2024 | BenchmarkingClassification | —Unverified | 0 |
| A Sober Look at the Robustness of CLIPs to Spurious Features | Mar 18, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking the Robustness of UAV Tracking Against Common Corruptions | Mar 18, 2024 | Benchmarking | CodeCode Available | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking | Mar 17, 2024 | BenchmarkingDialogue State Tracking | —Unverified | 0 |
| FlowMind: Automatic Workflow Generation with LLMs | Mar 17, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Depression Detection on Social Media with Large Language Models | Mar 16, 2024 | BenchmarkingDepression Detection | —Unverified | 0 |