| Restructuring Vector Quantization with the Rotation Trick | Oct 8, 2024 | Quantization | CodeCode Available | 4 |
| Story-Adapter: A Training-free Iterative Framework for Long Story Visualization | Oct 8, 2024 | Image GenerationStory Visualization | CodeCode Available | 4 |
| Timer-XL: Long-Context Transformers for Unified Time Series Forecasting | Oct 7, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 4 |
| Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration | Oct 3, 2024 | DiversityLanguage Modeling | CodeCode Available | 4 |
| shapiq: Shapley Interactions for Machine Learning | Oct 2, 2024 | BenchmarkingData Valuation | CodeCode Available | 4 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction | Oct 1, 2024 | Predictionregression | CodeCode Available | 4 |
| Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration | Oct 1, 2024 | Blind Face RestorationImage Colorization | CodeCode Available | 4 |
| Old Optimizer, New Norm: An Anthology | Sep 30, 2024 | | CodeCode Available | 4 |
| Replace Anyone in Videos | Sep 30, 2024 | Video GenerationVideo Inpainting | CodeCode Available | 4 |
| Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers | Sep 30, 2024 | | CodeCode Available | 4 |
| Data-Prep-Kit: getting your data ready for LLM application development | Sep 26, 2024 | CPULanguage Modeling | CodeCode Available | 4 |
| Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Sep 26, 2024 | 3D ReconstructionDenoising | CodeCode Available | 4 |
| VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | Sep 25, 2024 | Quantization | CodeCode Available | 4 |
| Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models | Sep 25, 2024 | Image Captioning | CodeCode Available | 4 |
| Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR | Sep 24, 2024 | | CodeCode Available | 4 |
| Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts | Sep 24, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 4 |
| Zero-shot forecasting of chaotic systems | Sep 24, 2024 | AttributeIn-Context Learning | CodeCode Available | 4 |
| KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | Sep 23, 2024 | Point Cloud Registration | CodeCode Available | 4 |
| Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding | Sep 22, 2024 | Anomaly DetectionGPU | CodeCode Available | 4 |
| HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Sep 19, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 4 |
| StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation | Sep 19, 2024 | Image GenerationPersonalized Image Generation | CodeCode Available | 4 |
| Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Sep 17, 2024 | Conditional Image GenerationDepth Estimation | CodeCode Available | 4 |
| UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height | Sep 17, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 4 |
| Kolmogorov-Arnold Transformer | Sep 16, 2024 | Image Classification | CodeCode Available | 4 |
| On the limits of agency in agent-based models | Sep 14, 2024 | Computational Efficiencycounterfactual | CodeCode Available | 4 |
| Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval | Sep 14, 2024 | Contrastive LearningImage Retrieval | CodeCode Available | 4 |
| Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale | Sep 12, 2024 | | CodeCode Available | 4 |
| GeoCalib: Learning Single-image Calibration with Geometric Optimization | Sep 10, 2024 | 3D geometryVisual Localization | CodeCode Available | 4 |
| RealisDance: Equip controllable character animation with realistic hands | Sep 10, 2024 | | CodeCode Available | 4 |
| Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation | Sep 6, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 4 |
| One-Shot Diffusion Mimicker for Handwritten Text Generation | Sep 6, 2024 | Handwriting generationText Generation | CodeCode Available | 4 |
| xLAM: A Family of Large Action Models to Empower AI Agent Systems | Sep 5, 2024 | AI Agent | CodeCode Available | 4 |
| iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models | Sep 5, 2024 | Few-Shot LearningInformation Retrieval | CodeCode Available | 4 |
| MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | Sep 4, 2024 | Optical Character Recognition (OCR) | CodeCode Available | 4 |
| LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Sep 4, 2024 | Question AnsweringSentence | CodeCode Available | 4 |
| Large Language Model-Based Agents for Software Engineering: A Survey | Sep 4, 2024 | AI AgentLanguage Modeling | CodeCode Available | 4 |
| OLMoE: Open Mixture-of-Experts Language Models | Sep 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Diffusion Policy Policy Optimization | Sep 1, 2024 | continuous-controlContinuous Control | CodeCode Available | 4 |
| IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching | Sep 1, 2024 | Patch MatchingStereo Matching | CodeCode Available | 4 |
| CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions | Aug 29, 2024 | Dynamic Time Warpingspeech-recognition | CodeCode Available | 4 |
| Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Aug 28, 2024 | Optical Character Recognition | CodeCode Available | 4 |
| MegActor-Σ: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer | Aug 27, 2024 | Portrait Animation | CodeCode Available | 4 |
| Text2SQL is Not Enough: Unifying AI and Databases with TAG | Aug 27, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 4 |
| Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web | Aug 26, 2024 | Decision MakingMulti-class Classification | CodeCode Available | 4 |
| EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Aug 21, 2024 | 3D Instance SegmentationGPU | CodeCode Available | 4 |
| SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Aug 20, 2024 | Emotion RecognitionMultimodal Emotion Recognition | CodeCode Available | 4 |
| RUMI: Rummaging Using Mutual Information | Aug 19, 2024 | Model Predictive ControlObject | CodeCode Available | 4 |
| DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | Aug 15, 2024 | Automated Theorem ProvingLanguage Modeling | CodeCode Available | 4 |
| FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Aug 15, 2024 | TARVideo Generation | CodeCode Available | 4 |