| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 |
| Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems | Jul 22, 2024 | BenchmarkingClustering | —Unverified | 0 |
| LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies | Jul 22, 2024 | BenchmarkingOut-of-Distribution Generalization | CodeCode Available | 1 |
| HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning | Jul 22, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| Open-CD: A Comprehensive Toolbox for Change Detection | Jul 22, 2024 | BenchmarkingChange Detection | —Unverified | 0 |
| StylusAI: Stylistic Adaptation for Robust German Handwritten Text Generation | Jul 22, 2024 | BenchmarkingText Generation | —Unverified | 0 |
| Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA | Jul 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs | Jul 20, 2024 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding | Jul 20, 2024 | BenchmarkingHeuristic Search | CodeCode Available | 1 |
| Benchmarking deep learning models for bearing fault diagnosis using the CWRU dataset: A multi-label approach | Jul 19, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking | Jul 19, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection | Jul 19, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations | Jul 19, 2024 | BenchmarkingFairness | CodeCode Available | 1 |
| ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness? | Jul 19, 2024 | BenchmarkingCode Generation | CodeCode Available | 7 |
| Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft | Jul 19, 2024 | BenchmarkingTransfer Learning | —Unverified | 0 |
| SHS: Scorpion Hunting Strategy Swarm Algorithm | Jul 19, 2024 | Benchmarking | —Unverified | 0 |
| Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance | Jul 18, 2024 | Benchmarking | —Unverified | 0 |
| RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Jul 18, 2024 | 3D Human Pose EstimationBenchmarking | —Unverified | 0 |
| Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle | Jul 18, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Restore Anything Model via Efficient Degradation Adaptation | Jul 18, 2024 | 5-Degradation Blind All-in-One Image RestorationBenchmarking | CodeCode Available | 1 |
| Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's Disease | Jul 18, 2024 | Benchmarking | CodeCode Available | 0 |
| Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data | Jul 17, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| Temporal receptive field in dynamic graph learning: A comprehensive analysis | Jul 17, 2024 | BenchmarkingDynamic Link Prediction | CodeCode Available | 0 |
| Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships | Jul 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Jul 17, 2024 | BenchmarkingSarcasm Detection | —Unverified | 0 |