| Images Speak in Images: A Generalist Painter for In-Context Visual Learning | Dec 5, 2022 | In-Context LearningKeypoint Detection | CodeCode Available | 4 |
| DreamGen: Unlocking Generalization in Robot Learning through Video World Models | May 19, 2025 | Video Generation | CodeCode Available | 4 |
| MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | Mar 10, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 4 |
| Cognitive Architectures for Language Agents | Sep 5, 2023 | Decision Making | CodeCode Available | 4 |
| AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data | Feb 1, 2024 | Conditional Image GenerationDenoising | CodeCode Available | 4 |
| Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition | Oct 24, 2023 | | CodeCode Available | 4 |
| Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers | Jul 14, 2022 | RetrievalText Retrieval | CodeCode Available | 4 |
| Mamba YOLO: A Simple Baseline for Object Detection with State Space Model | Jun 9, 2024 | GPUMamba | CodeCode Available | 4 |
| Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements | Sep 30, 2022 | | CodeCode Available | 4 |
| Compressible-composable NeRF via Rank-residual Decomposition | May 30, 2022 | NeRF | CodeCode Available | 4 |
| Structured Pruning for Deep Convolutional Neural Networks: A survey | Mar 1, 2023 | Network PruningNeural Architecture Search | CodeCode Available | 4 |
| From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Nov 25, 2024 | | CodeCode Available | 4 |
| AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks | Mar 21, 2024 | Image to Video GenerationStyle Transfer | CodeCode Available | 4 |
| Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | Oct 16, 2024 | Human Agent Collaboration | CodeCode Available | 4 |
| Orb: A Fast, Scalable Neural Network Potential | Oct 29, 2024 | | CodeCode Available | 4 |
| Spirit LM: Interleaved Spoken and Written Language Model | Feb 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments | Jul 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Oct 11, 2024 | GSM8KMath | CodeCode Available | 4 |
| I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench | Jan 31, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 4 |
| MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | Jun 17, 2024 | | CodeCode Available | 4 |
| Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later | Jul 3, 2024 | | CodeCode Available | 4 |
| DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection | Mar 7, 2022 | Object DetectionReal-Time Object Detection | CodeCode Available | 4 |
| TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling | Oct 31, 2024 | Deep LearningRetrieval | CodeCode Available | 4 |
| INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation | Jun 13, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| SegGPT: Segmenting Everything In Context | Apr 6, 2023 | Few-Shot Semantic SegmentationIn-Context Learning | CodeCode Available | 4 |
| TinyLLaVA: A Framework of Small-scale Large Multimodal Models | Feb 22, 2024 | Visual Question Answering | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Mar 13, 2024 | Image AnimationImage to Video Generation | CodeCode Available | 4 |
| Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN | May 27, 2022 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| SkyReels-A2: Compose Anything in Video Diffusion Transformers | Apr 3, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| Croissant: A Metadata Format for ML-Ready Datasets | Mar 28, 2024 | FrictionManagement | CodeCode Available | 4 |
| DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning | Feb 28, 2025 | Information Retrievalreinforcement-learning | CodeCode Available | 4 |
| LLMMapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources | Apr 8, 2025 | ArticlesForm | CodeCode Available | 4 |
| KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | Sep 23, 2024 | Point Cloud Registration | CodeCode Available | 4 |
| Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control | Mar 18, 2025 | | CodeCode Available | 4 |
| Prototypical Verbalizer for Prompt-based Few-shot Tuning | Mar 18, 2022 | Contrastive LearningEntity Typing | CodeCode Available | 4 |
| OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning | May 2, 2024 | Autonomous Drivingcounterfactual | CodeCode Available | 4 |
| NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis | Jul 20, 2022 | Image OutpaintingText-to-Image Generation | CodeCode Available | 4 |
| Autoregressive Video Generation without Vector Quantization | Dec 18, 2024 | Image GenerationPrediction | CodeCode Available | 4 |
| Best-of-N Jailbreaking | Dec 4, 2024 | | CodeCode Available | 4 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 |
| Continual Learning of Large Language Models: A Comprehensive Survey | Apr 25, 2024 | Continual LearningSurvey | CodeCode Available | 4 |
| KTO: Model Alignment as Prospect Theoretic Optimization | Feb 2, 2024 | Attributemodel | CodeCode Available | 4 |
| Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval | Sep 14, 2024 | Contrastive LearningImage Retrieval | CodeCode Available | 4 |
| Text2SQL is Not Enough: Unifying AI and Databases with TAG | Aug 27, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 4 |
| Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss | Aug 5, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Convolutional Differentiable Logic Gate Networks | Nov 7, 2024 | | CodeCode Available | 4 |
| Billion-scale similarity search with GPUs | Feb 28, 2017 | GPUImage Similarity Search | CodeCode Available | 4 |
| Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers | Sep 30, 2024 | | CodeCode Available | 4 |
| Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level | Mar 7, 2024 | | CodeCode Available | 4 |