| Adversarial Diffusion Compression for Real-World Image Super-Resolution | Nov 20, 2024 | DecoderDenoising | CodeCode Available | 4 |
| FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on | Nov 15, 2024 | Virtual Try-on | CodeCode Available | 4 |
| MARS: Unleashing the Power of Variance Reduction for Training Large Models | Nov 15, 2024 | Stochastic Optimization | CodeCode Available | 4 |
| JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Nov 14, 2024 | Image AnimationMotion Generation | CodeCode Available | 4 |
| A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | Nov 13, 2024 | DiversityIn-Context Learning | CodeCode Available | 4 |
| Cut Your Losses in Large-Vocabulary Language Models | Nov 13, 2024 | | CodeCode Available | 4 |
| SAMPart3D: Segment Any Part in 3D Objects | Nov 11, 2024 | 3D Generation3D Part Segmentation | CodeCode Available | 4 |
| Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement | Nov 10, 2024 | AttributeImage Generation | CodeCode Available | 4 |
| Autoregressive Models in Vision: A Survey | Nov 8, 2024 | 3D GenerationImage Generation | CodeCode Available | 4 |
| Convolutional Differentiable Logic Gate Networks | Nov 7, 2024 | | CodeCode Available | 4 |
| Taming Rectified Flow for Inversion and Editing | Nov 7, 2024 | Image GenerationText-to-Image Generation | CodeCode Available | 4 |
| BitNet a4.8: 4-bit Activations for 1-bit LLMs | Nov 7, 2024 | Quantization | CodeCode Available | 4 |
| LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation | Nov 7, 2024 | Contrastive LearningImage Captioning | CodeCode Available | 4 |
| SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Nov 7, 2024 | GPUQuantization | CodeCode Available | 4 |
| Training-free Regional Prompting for Diffusion Transformers | Nov 4, 2024 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| TableGPT2: A Large Multimodal Model with Tabular Data Integration | Nov 4, 2024 | BenchmarkingData Integration | CodeCode Available | 4 |
| WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Nov 4, 2024 | | CodeCode Available | 4 |
| Sample-Efficient Alignment for LLMs | Nov 3, 2024 | Thompson Sampling | CodeCode Available | 4 |
| TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling | Oct 31, 2024 | Deep LearningRetrieval | CodeCode Available | 4 |
| No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Oct 31, 2024 | 3D ReconstructionGeneralizable Novel View Synthesis | CodeCode Available | 4 |
| HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models | Oct 30, 2024 | Video Generation | CodeCode Available | 4 |
| TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters | Oct 30, 2024 | model | CodeCode Available | 4 |
| MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering | Oct 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Oct 29, 2024 | Autonomous DrivingScene Understanding | CodeCode Available | 4 |
| Orb: A Fast, Scalable Neural Network Potential | Oct 29, 2024 | | CodeCode Available | 4 |
| SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement | Oct 26, 2024 | Large Language Model | CodeCode Available | 4 |
| Blendify -- Python rendering framework for Blender | Oct 23, 2024 | 10-shot image generation | CodeCode Available | 4 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 |
| Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces | Oct 21, 2024 | Code Generationscientific discovery | CodeCode Available | 4 |
| SNAC: Multi-Scale Neural Audio Codec | Oct 18, 2024 | Audio CompressionAudio Generation | CodeCode Available | 4 |
| Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents | Oct 17, 2024 | Experimental Design | CodeCode Available | 4 |
| One Step Diffusion via Shortcut Models | Oct 16, 2024 | DenoisingScheduling | CodeCode Available | 4 |
| Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | Oct 16, 2024 | Human Agent Collaboration | CodeCode Available | 4 |
| MoH: Multi-Head Attention as Mixture-of-Head Attention | Oct 15, 2024 | Mixture-of-Experts | CodeCode Available | 4 |
| MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI | Oct 15, 2024 | Benchmarking | CodeCode Available | 4 |
| DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Oct 14, 2024 | GPUQuantization | CodeCode Available | 4 |
| When Does Perceptual Alignment Benefit Vision Representations? | Oct 14, 2024 | Depth EstimationImage Generation | CodeCode Available | 4 |
| Depth Any Video with Scalable Synthetic Data | Oct 14, 2024 | Depth Estimation | CodeCode Available | 4 |
| VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents | Oct 14, 2024 | RAGRetrieval | CodeCode Available | 4 |
| Generalizable Humanoid Manipulation with 3D Diffusion Policies | Oct 14, 2024 | Camera CalibrationPoint Cloud Segmentation | CodeCode Available | 4 |
| Agent-as-a-Judge: Evaluate Agents with Agents | Oct 14, 2024 | Code Generation | CodeCode Available | 4 |
| EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations | Oct 14, 2024 | Answer GenerationQuestion Answering | CodeCode Available | 4 |
| LLMMapReduce: Simplified Long-Sequence Processing using Large Language Models | Oct 12, 2024 | document understanding | CodeCode Available | 4 |
| SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Oct 11, 2024 | GSM8KMath | CodeCode Available | 4 |
| Generalizable and Animatable Gaussian Head Avatar | Oct 10, 2024 | | CodeCode Available | 4 |
| CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models | Oct 9, 2024 | Multi-Task Learning | CodeCode Available | 4 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| Taking a turn for the better: Conversation redirection throughout the course of mental-health therapy | Oct 9, 2024 | | CodeCode Available | 4 |
| Restructuring Vector Quantization with the Rotation Trick | Oct 8, 2024 | Quantization | CodeCode Available | 4 |