| Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation | Sep 25, 2024 | text-to-speechText to Speech | CodeCode Available | 5 |
| Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 5 |
| Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models | Sep 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion | Sep 19, 2024 | | CodeCode Available | 5 |
| FuXi-2.0: Advancing machine learning weather forecasting model for practical applications | Sep 11, 2024 | Weather Forecasting | CodeCode Available | 5 |
| SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | Sep 9, 2024 | AI AgentKnowledge Graphs | CodeCode Available | 5 |
| MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model | Sep 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Sep 3, 2024 | Depth EstimationDiversity | CodeCode Available | 5 |
| ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis | Sep 3, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 5 |
| rerankers: A Lightweight Python Library to Unify Ranking Methods | Aug 30, 2024 | Re-RankingRetrieval | CodeCode Available | 5 |
| WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling | Aug 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| OmniRe: Omni Urban Scene Reconstruction | Aug 29, 2024 | 3DGS | CodeCode Available | 5 |
| 3D Reconstruction with Spatial Memory | Aug 28, 2024 | 3D Reconstruction | CodeCode Available | 5 |
| Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning | Aug 26, 2024 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Aug 23, 2024 | Image SegmentationSegmentation | CodeCode Available | 5 |
| Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Aug 22, 2024 | 10-shot image generation | CodeCode Available | 5 |
| Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Aug 22, 2024 | ChatbotInstruction Following | CodeCode Available | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 |
| The Vizier Gaussian Process Bandit Algorithm | Aug 21, 2024 | Bayesian Optimization | CodeCode Available | 5 |
| Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey | Aug 19, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 5 |
| Automated Design of Agentic Systems | Aug 15, 2024 | | CodeCode Available | 5 |
| RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation | Aug 15, 2024 | DiagnosticRAG | CodeCode Available | 5 |
| LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Aug 13, 2024 | | CodeCode Available | 5 |
| ControlNeXt: Powerful and Efficient Control for Image and Video Generation | Aug 12, 2024 | Video Generation | CodeCode Available | 5 |
| A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? | Aug 9, 2024 | Natural Language QueriesText to SQL | CodeCode Available | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 |
| Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | Aug 6, 2024 | | CodeCode Available | 5 |
| Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid | Aug 4, 2024 | document understanding | CodeCode Available | 5 |
| Active Learning for Neural PDE Solvers | Aug 2, 2024 | Active Learning | CodeCode Available | 5 |
| Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data | Aug 1, 2024 | | CodeCode Available | 5 |
| MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBench | Aug 1, 2024 | Humanoid ControlMuJoCo | CodeCode Available | 5 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Jul 31, 2024 | Video CompressionVideo Generation | CodeCode Available | 5 |
| Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget | Jul 22, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Jul 21, 2024 | AllFashion Synthesis | CodeCode Available | 5 |
| Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems | Jul 17, 2024 | Autonomous Web NavigationDenoising | CodeCode Available | 5 |
| IMAGDressing-v1: Customizable Virtual Dressing | Jul 17, 2024 | DenoisingImage Generation | CodeCode Available | 5 |
| VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark | Jul 16, 2024 | DiversitySpeaker Identification | CodeCode Available | 5 |
| Semantic Operators: A Declarative Model for Rich, AI-based Data Processing | Jul 16, 2024 | Extreme Multi-Label ClassificationFact Checking | CodeCode Available | 5 |
| BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval | Jul 16, 2024 | Question AnsweringRetrieval | CodeCode Available | 5 |
| GRUtopia: Dream General Robots in a City at Scale | Jul 15, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients | Jul 11, 2024 | Quantization | CodeCode Available | 5 |
| OffsetBias: Leveraging Debiased Data for Tuning Evaluators | Jul 9, 2024 | | CodeCode Available | 5 |
| Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI | Jul 9, 2024 | Survey | CodeCode Available | 5 |
| TAPVid-3D: A Benchmark for Tracking Any Point in 3D | Jul 8, 2024 | Point Tracking | CodeCode Available | 5 |
| Fast On-device LLM Inference with NPUs | Jul 8, 2024 | CPUGPU | CodeCode Available | 5 |
| Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning | Jul 8, 2024 | | CodeCode Available | 5 |
| Learning to (Learn at Test Time): RNNs with Expressive Hidden States | Jul 5, 2024 | 16k8k | CodeCode Available | 5 |
| BM25S: Orders of magnitude faster lexical search via eager sparse scoring | Jul 4, 2024 | Passage RetrievalRetrieval | CodeCode Available | 5 |
| Fake News Detection: It's All in the Data! | Jul 2, 2024 | AllDiversity | CodeCode Available | 5 |