| WebCanvas: Benchmarking Web Agents in Online Environments | Jun 18, 2024 | AI AgentBenchmarking | CodeCode Available | 3 |
| Refusal in Language Models Is Mediated by a Single Direction | Jun 17, 2024 | Instruction Following | CodeCode Available | 3 |
| HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model | Jun 17, 2024 | Computational EfficiencyEarth Observation | CodeCode Available | 3 |
| Unveiling Encoder-Free Vision-Language Models | Jun 17, 2024 | DecoderInductive Bias | CodeCode Available | 3 |
| GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement | Jun 17, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 3 |
| DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models | Jun 17, 2024 | Document ClassificationVisual Grounding | CodeCode Available | 3 |
| AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| An Imitative Reinforcement Learning Framework for Autonomous Dogfight | Jun 17, 2024 | Imitation Learningreinforcement-learning | CodeCode Available | 3 |
| GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding | Jun 16, 2024 | | CodeCode Available | 3 |
| Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference | Jun 16, 2024 | | CodeCode Available | 3 |
| Step-level Value Preference Optimization for Mathematical Reasoning | Jun 16, 2024 | Learning-To-RankMath | CodeCode Available | 3 |
| AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | Jun 16, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 3 |
| CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph | Jun 16, 2024 | Drug DesignFairness | CodeCode Available | 3 |
| AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology | Jun 16, 2024 | Code Generation | CodeCode Available | 3 |
| IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization | Jun 15, 2024 | GPUImage Manipulation | CodeCode Available | 3 |
| TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs | Jun 14, 2024 | BenchmarkingKnowledge Graphs | CodeCode Available | 3 |
| DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning | Jun 14, 2024 | Offline RL | CodeCode Available | 3 |
| CarLLaVA: Vision language models for camera-only closed-loop driving | Jun 14, 2024 | Autonomous DrivingBench2Drive | CodeCode Available | 3 |
| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Jun 13, 2024 | Dense Video CaptioningMVBench | CodeCode Available | 3 |
| Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation | Jun 13, 2024 | Multi-agent Reinforcement Learning | CodeCode Available | 3 |
| DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks | Jun 13, 2024 | Benchmarking | CodeCode Available | 3 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 |
| OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Jun 13, 2024 | Video GenerationVideo Prediction | CodeCode Available | 3 |
| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 |