| VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection | Nov 22, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality | Nov 22, 2024 | Efficient Neural NetworkImage Classification | CodeCode Available | 2 |
| Open-Vocabulary Online Semantic Mapping for SLAM | Nov 22, 2024 | SegmentationSemantic SLAM | CodeCode Available | 2 |
| AnyText2: Visual Text Generation and Editing With Customizable Attributes | Nov 22, 2024 | Image GenerationText Generation | CodeCode Available | 2 |
| Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI | Nov 22, 2024 | counterfactualCounterfactual Explanation | CodeCode Available | 2 |
| Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data | Nov 22, 2024 | | CodeCode Available | 2 |
| RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Nov 22, 2024 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Nov 22, 2024 | Video Generation | CodeCode Available | 2 |
| DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Nov 22, 2024 | | CodeCode Available | 2 |
| ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| Natural Language Reinforcement Learning | Nov 21, 2024 | Decision Makingreinforcement-learning | CodeCode Available | 2 |
| MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective | Nov 21, 2024 | Image ComprehensionImage Generation | CodeCode Available | 2 |
| EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild | Nov 21, 2024 | 3D ReconstructionObject | CodeCode Available | 2 |
| CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs | Nov 21, 2024 | Clone DetectionCode Search | CodeCode Available | 2 |
| BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models | Nov 21, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs | Nov 21, 2024 | Relevance Detection | CodeCode Available | 2 |
| Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks | Nov 20, 2024 | Bayesian InferenceDrug Design | CodeCode Available | 2 |
| Quantized symbolic time series approximation | Nov 20, 2024 | Anomaly DetectionAstronomy | CodeCode Available | 2 |
| Disentangling Memory and Reasoning Ability in Large Language Models | Nov 20, 2024 | Decision MakingRetrieval | CodeCode Available | 2 |
| DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Nov 20, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 2 |
| RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | Nov 20, 2024 | Image Generationobject-detection | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI System | Nov 20, 2024 | | CodeCode Available | 2 |
| Practical Compact Deep Compressed Sensing | Nov 20, 2024 | compressed sensing | CodeCode Available | 2 |