| FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing | Dec 10, 2024 | Text-based Image Editing | CodeCode Available | 3 |
| CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding | Dec 10, 2024 | EEGEeg Decoding | CodeCode Available | 3 |
| Normalizing Flows are Capable Generative Models | Dec 9, 2024 | Conditional Image GenerationDensity Estimation | CodeCode Available | 3 |
| BatchTopK Sparse Autoencoders | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| GraphNeuralNetworks.jl: Deep Learning on Graphs with Julia | Dec 9, 2024 | Deep LearningGPU | CodeCode Available | 3 |
| Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey | Dec 9, 2024 | Speech SynthesisSurvey | CodeCode Available | 3 |
| StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist | Dec 9, 2024 | | CodeCode Available | 3 |
| Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation | Dec 9, 2024 | DenoisingPhoto geolocation estimation | CodeCode Available | 3 |
| APOLLO: SGD-like Memory, AdamW-level Performance | Dec 6, 2024 | GPUQuantization | CodeCode Available | 3 |
| UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingDiversity | CodeCode Available | 3 |
| Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Dec 5, 2024 | Multimodal ReasoningNatural Language Visual Grounding | CodeCode Available | 3 |
| Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Dec 5, 2024 | Stereo MatchingZero-shot Generalization | CodeCode Available | 3 |
| Cubify Anything: Scaling Indoor 3D Object Detection | Dec 5, 2024 | 3D Object DetectionObject | CodeCode Available | 3 |
| VisionZip: Longer is Better but Not Necessary in Vision Language Models | Dec 5, 2024 | Video UnderstandingVisual Question Answering | CodeCode Available | 3 |
| Reinforcement Learning Enhanced LLMs: A Survey | Dec 5, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Dec 5, 2024 | Contrastive LearningHallucination | CodeCode Available | 3 |
| ARC Prize 2024: Technical Report | Dec 5, 2024 | ARCProgram Synthesis | CodeCode Available | 3 |
| PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models | Dec 5, 2024 | Earth Observation | CodeCode Available | 3 |
| PaliGemma 2: A Family of Versatile VLMs for Transfer | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning | Dec 4, 2024 | AttributeTime Series | CodeCode Available | 3 |
| TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation | Dec 4, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 3 |
| Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications | Dec 3, 2024 | BenchmarkingDisaster Response | CodeCode Available | 3 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 |
| Advancing Speech Language Models by Scaling Supervised Fine-Tuning with Over 60,000 Hours of Synthetic Speech Dialogue Data | Dec 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | Dec 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes | Dec 2, 2024 | In-Context LearningVideo Segmentation | CodeCode Available | 3 |
| MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost | Dec 2, 2024 | Image Generation | CodeCode Available | 3 |
| Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Dec 2, 2024 | Human Instance SegmentationPose-Based Human Instance Segmentation | CodeCode Available | 3 |
| XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation | Dec 2, 2024 | Image ReconstructionQuantization | CodeCode Available | 3 |
| HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving | Dec 2, 2024 | Autonomous DrivingNovel View Synthesis | CodeCode Available | 3 |
| Towards Universal Soccer Video Understanding | Dec 2, 2024 | Action ClassificationSports Understanding | CodeCode Available | 3 |
| FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration | Dec 2, 2024 | Image RestorationIncremental Learning | CodeCode Available | 3 |
| emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Dec 2, 2024 | AnatomyHand Pose Estimation | CodeCode Available | 3 |
| Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion | Dec 1, 2024 | DenoisingOptical Flow Estimation | CodeCode Available | 3 |
| Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives | Nov 30, 2024 | 3D Scene ReconstructionNeRF | CodeCode Available | 3 |
| o1-Coder: an o1 Replication for Coding | Nov 29, 2024 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| Scaling Transformers for Low-Bitrate High-Quality Speech Coding | Nov 29, 2024 | Quantization | CodeCode Available | 3 |
| Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Nov 29, 2024 | Decision MakingRAG | CodeCode Available | 3 |
| Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction | Nov 28, 2024 | 3D ReconstructionDiagnostic | CodeCode Available | 3 |
| Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models | Nov 27, 2024 | ClassificationSentence | CodeCode Available | 3 |
| ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | Nov 27, 2024 | | CodeCode Available | 3 |
| HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction | Nov 27, 2024 | 3DGS | CodeCode Available | 3 |
| TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution | Nov 27, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 3 |
| Large Language Model-Brained GUI Agents: A Survey | Nov 27, 2024 | Code GenerationLanguage Modeling | CodeCode Available | 3 |
| CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning | Nov 26, 2024 | | CodeCode Available | 3 |
| Star Attention: Efficient LLM Inference over Long Sequences | Nov 26, 2024 | Computational Efficiency | CodeCode Available | 3 |
| On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning | Nov 26, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |
| A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Nov 26, 2024 | Object TrackingSemi-Supervised Video Object Segmentation | CodeCode Available | 3 |
| Pushing the Limits of Large Language Model Quantization via the Linearity Theorem | Nov 26, 2024 | GPULanguage Modeling | CodeCode Available | 3 |