| ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI | Nov 22, 2024 | counterfactualCounterfactual Explanation | CodeCode Available | 2 |
| MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Nov 22, 2024 | Video Generation | CodeCode Available | 2 |
| Open-Vocabulary Online Semantic Mapping for SLAM | Nov 22, 2024 | SegmentationSemantic SLAM | CodeCode Available | 2 |
| VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection | Nov 22, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Nov 22, 2024 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality | Nov 22, 2024 | Efficient Neural NetworkImage Classification | CodeCode Available | 2 |
| AnyText2: Visual Text Generation and Editing With Customizable Attributes | Nov 22, 2024 | Image GenerationText Generation | CodeCode Available | 2 |
| Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data | Nov 22, 2024 | | CodeCode Available | 2 |
| DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Nov 22, 2024 | | CodeCode Available | 2 |
| Natural Language Reinforcement Learning | Nov 21, 2024 | Decision Makingreinforcement-learning | CodeCode Available | 2 |
| MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective | Nov 21, 2024 | Image ComprehensionImage Generation | CodeCode Available | 2 |
| BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models | Nov 21, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs | Nov 21, 2024 | Clone DetectionCode Search | CodeCode Available | 2 |
| EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild | Nov 21, 2024 | 3D ReconstructionObject | CodeCode Available | 2 |
| FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs | Nov 21, 2024 | Relevance Detection | CodeCode Available | 2 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| Disentangling Memory and Reasoning Ability in Large Language Models | Nov 20, 2024 | Decision MakingRetrieval | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| Practical Compact Deep Compressed Sensing | Nov 20, 2024 | compressed sensing | CodeCode Available | 2 |
| DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Nov 20, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 2 |
| RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | Nov 20, 2024 | Image Generationobject-detection | CodeCode Available | 2 |
| Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks | Nov 20, 2024 | Bayesian InferenceDrug Design | CodeCode Available | 2 |
| Quantized symbolic time series approximation | Nov 20, 2024 | Anomaly DetectionAstronomy | CodeCode Available | 2 |
| SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI System | Nov 20, 2024 | | CodeCode Available | 2 |
| SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search | Nov 19, 2024 | QuantizationRe-Ranking | CodeCode Available | 2 |
| Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Nov 19, 2024 | Formal LogicLogical Reasoning | CodeCode Available | 2 |
| Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph | Nov 19, 2024 | Autonomous DrivingStereo Matching | CodeCode Available | 2 |
| From Text to Pose to Image: Improving Diffusion Model Control and Quality | Nov 19, 2024 | Image GenerationPrompt Engineering | CodeCode Available | 2 |
| CV-Cities: Advancing Cross-View Geo-Localization in Global Cities | Nov 19, 2024 | Cross-View Geo-LocalisationDrone-view target localization | CodeCode Available | 2 |
| Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | Nov 19, 2024 | Image EnhancementImage Super-Resolution | CodeCode Available | 2 |
| HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation | Nov 19, 2024 | Domain AdaptationImage Generation | CodeCode Available | 2 |
| GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Nov 19, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction | Nov 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Real-Time Fitness Exercise Classification and Counting from Video Frames | Nov 18, 2024 | | CodeCode Available | 2 |
| AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics | Nov 18, 2024 | Vision-Language-Action | CodeCode Available | 2 |
| Syllabus: Portable Curricula for Reinforcement Learning Agents | Nov 18, 2024 | NetHackreinforcement-learning | CodeCode Available | 2 |
| Enhancing LLM Reasoning with Reward-guided Tree Search | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Nov 18, 2024 | Pose EstimationSemantic Segmentation | CodeCode Available | 2 |
| CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters | Nov 18, 2024 | fill-maskFill Mask | CodeCode Available | 2 |
| DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation | Nov 18, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering | Nov 18, 2024 | | CodeCode Available | 2 |
| MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Nov 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Newclid: A User-Friendly Replacement for AlphaGeometry | Nov 18, 2024 | | CodeCode Available | 2 |
| BianCang: A Traditional Chinese Medicine Large Language Model | Nov 17, 2024 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| StableV2V: Stablizing Shape Consistency in Video-to-Video Editing | Nov 17, 2024 | Video Editing | CodeCode Available | 2 |
| VeGaS: Video Gaussian Splatting | Nov 17, 2024 | 3DGS | CodeCode Available | 2 |
| AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers | Nov 17, 2024 | In-Context LearningMeta-Learning | CodeCode Available | 2 |
| RPN 2: On Interdependence Function Learning Towards Unifying and Advancing CNN, RNN, GNN, and Transformer | Nov 17, 2024 | | CodeCode Available | 2 |