| Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models | Jul 17, 2024 | BenchmarkingRed Teaming | CodeCode Available | 2 |
| LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Jul 17, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects | Jul 17, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Jul 17, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Jul 17, 2024 | BenchmarkingSarcasm Detection | —Unverified | 0 |
| Feature interpretability in BCIs: exploring the role of network lateralization | Jul 16, 2024 | BenchmarkingEEG | CodeCode Available | 0 |
| Benchmarking the Attribution Quality of Vision Models | Jul 16, 2024 | BenchmarkingExplainable Models | CodeCode Available | 0 |
| GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jul 16, 2024 | BenchmarkingLoop Closure Detection | CodeCode Available | 2 |
| A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification | Jul 16, 2024 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities | Jul 16, 2024 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching | Jul 16, 2024 | Benchmarking | CodeCode Available | 0 |
| On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction | Jul 15, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Separable Operator Networks | Jul 15, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| AstroMLab 1: Who Wins Astronomy Jeopardy!? | Jul 15, 2024 | AstronomyBenchmarking | —Unverified | 0 |
| ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation | Jul 15, 2024 | Benchmarking | —Unverified | 0 |
| When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark | Jul 15, 2024 | BenchmarkingGraph Learning | CodeCode Available | 1 |
| Benchmarking Vision Language Models for Cultural Understanding | Jul 15, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control | Jul 14, 2024 | Benchmarking | —Unverified | 0 |
| Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem | Jul 13, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |
| NativQA: Multilingual Culturally-Aligned Natural Query for LLMs | Jul 13, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos | Jul 12, 2024 | BenchmarkingPupil Dilation | CodeCode Available | 1 |
| Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment | Jul 12, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |