| Language Models are Few-Shot Learners | May 28, 2020 | answerability predictionArticles | CodeCode Available | 3 |
| Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Dec 24, 2024 | Attribute | CodeCode Available | 3 |
| A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning | Jun 3, 2025 | Decision MakingDiagnostic | CodeCode Available | 3 |
| TCFormer: Visual Recognition via Token Clustering Transformer | Jul 16, 2024 | Clusteringimage-classification | CodeCode Available | 3 |
| TSI-Bench: Benchmarking Time Series Imputation | Jun 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 3 |
| Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint | Dec 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization | Jul 12, 2024 | Anomaly DetectionDefect Detection | CodeCode Available | 3 |
| Seamless Human Motion Composition with Blended Positional Encodings | Feb 23, 2024 | DenoisingMotion Generation | CodeCode Available | 3 |
| AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring | Jun 13, 2024 | DeblurringDecoder | CodeCode Available | 3 |
| LocalMamba: Visual State Space Model with Windowed Selective Scan | Mar 14, 2024 | MambaState Space Models | CodeCode Available | 3 |
| CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models | Jan 30, 2024 | Knowledge Base ConstructionQuestion Answering | CodeCode Available | 3 |
| Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective | Mar 24, 2025 | Decision Making | CodeCode Available | 3 |
| Event-Enhanced Blurry Video Super-Resolution | Apr 17, 2025 | DeblurringMotion Estimation | CodeCode Available | 3 |
| Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering | Sep 3, 2023 | Data AugmentationDomain Adaptation | CodeCode Available | 3 |
| Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | May 1, 2024 | ARCGSM8K | CodeCode Available | 3 |
| A Survey of Large Language Models in Finance (FinLLMs) | Feb 4, 2024 | Named Entity Recognition (NER)Question Answering | CodeCode Available | 3 |
| INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning | Jan 12, 2024 | Diversitydocument understanding | CodeCode Available | 3 |
| DM-VIO: Delayed Marginalization Visual-Inertial Odometry | Jan 11, 2022 | | CodeCode Available | 3 |
| Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation | Dec 24, 2024 | AttributeFraud Detection | CodeCode Available | 3 |
| UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction | Feb 19, 2024 | Decision MakingManagement | CodeCode Available | 3 |
| Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing | Mar 21, 2024 | DenoisingVirtual Try-on | CodeCode Available | 3 |
| Separate Anything You Describe | Aug 9, 2023 | Audio Source SeparationNatural Language Queries | CodeCode Available | 3 |
| TAP-Vid: A Benchmark for Tracking Any Point in a Video | Nov 7, 2022 | Optical Flow EstimationPoint Tracking | CodeCode Available | 3 |
| Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement | Nov 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Practical Probabilistic Benchmark for AI Weather Models | Jan 27, 2024 | DiagnosticWeather Forecasting | CodeCode Available | 3 |
| Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model | Jan 4, 2024 | Combinatorial OptimizationLanguage Modeling | CodeCode Available | 3 |
| Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review | Mar 4, 2024 | Medical Report GenerationQuestion Answering | CodeCode Available | 3 |
| ALS-HAR: Harnessing Wearable Ambient Light Sensors to Enhance IMU-based Human Activity Recogntion | Aug 18, 2024 | Activity RecognitionHuman Activity Recognition | CodeCode Available | 3 |
| PGL at TextGraphs 2020 Shared Task: Explanation Regeneration using Language and Graph Learning Methods | Dec 1, 2020 | Graph LearningLanguage Modeling | CodeCode Available | 3 |
| Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors | Jun 30, 2023 | Image to 3D | CodeCode Available | 3 |
| Metadata Embeddings for User and Item Cold-start Recommendations | Jul 30, 2015 | TAG | CodeCode Available | 3 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions | Mar 22, 2024 | Articles | CodeCode Available | 3 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| SVIT: Scaling up Visual Instruction Tuning | Jul 9, 2023 | DiversityImage Captioning | CodeCode Available | 3 |
| XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation | Dec 2, 2024 | Image ReconstructionQuantization | CodeCode Available | 3 |
| Scaling Transformers for Low-Bitrate High-Quality Speech Coding | Nov 29, 2024 | Quantization | CodeCode Available | 3 |
| Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey | Dec 9, 2024 | Speech SynthesisSurvey | CodeCode Available | 3 |
| Learning Bipedal Walking for Humanoids with Current Feedback | Mar 7, 2023 | Deep Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 3 |
| Resolution-robust Large Mask Inpainting with Fourier Convolutions | Sep 15, 2021 | Image InpaintingSeeing Beyond the Visible | CodeCode Available | 3 |
| Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Jun 4, 2024 | 2D Object Detection3D Instance Segmentation | CodeCode Available | 3 |
| AutoVFX: Physically Realistic Video Editing from Natural Language Instructions | Nov 4, 2024 | Code GenerationVideo Editing | CodeCode Available | 3 |
| MAGREF: Masked Guidance for Any-Reference Video Generation | May 29, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 3 |
| ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech | Jul 13, 2022 | DenoisingGPU | CodeCode Available | 3 |
| DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Apr 13, 2024 | Data AugmentationKey Point Matching | CodeCode Available | 3 |
| SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents | Jan 17, 2024 | Natural Language Visual Grounding | CodeCode Available | 3 |
| Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents | Feb 22, 2025 | AI Agent | CodeCode Available | 3 |
| MEMORYLLM: Towards Self-Updatable Large Language Models | Feb 7, 2024 | Model Editing | CodeCode Available | 3 |
| BatchTopK Sparse Autoencoders | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning | Nov 26, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |