| Tarsier: Recipes for Training and Evaluating Large Video Description Models | Jun 30, 2024 | Video CaptioningVideo Description | CodeCode Available | 4 |
| YuLan: An Open-source Large Language Model | Jun 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks | Jun 27, 2024 | Feature EngineeringModel Selection | CodeCode Available | 4 |
| On Scaling Up 3D Gaussian Splatting Training | Jun 26, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 |
| T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jun 25, 2024 | Computational EfficiencyCPU | CodeCode Available | 4 |
| Long Context Transfer from Language to Vision | Jun 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 |
| RaTEScore: A Metric for Radiology Report Generation | Jun 24, 2024 | DiagnosticEntity Embeddings | CodeCode Available | 4 |
| Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments | Jun 24, 2024 | Benchmarking | CodeCode Available | 4 |
| Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs | Jun 23, 2024 | | CodeCode Available | 4 |
| BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | Jun 22, 2024 | BenchmarkingCode Generation | CodeCode Available | 4 |
| Convolutional Kolmogorov-Arnold Networks | Jun 19, 2024 | Kolmogorov-Arnold Networks | CodeCode Available | 4 |
| Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback | Jun 18, 2024 | DenoisingRecommendation Systems | CodeCode Available | 4 |
| Nemotron-4 340B Technical Report | Jun 17, 2024 | Synthetic Data Generation | CodeCode Available | 4 |
| Graspness Discovery in Clutters for Fast and Accurate Grasp Detection | Jun 17, 2024 | | CodeCode Available | 4 |
| Diffusion Models in Low-Level Vision: A Survey | Jun 17, 2024 | DenoisingSurvey | CodeCode Available | 4 |
| MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | Jun 17, 2024 | | CodeCode Available | 4 |
| Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning | Jun 17, 2024 | Emotion RecognitionMultimodal Emotion Recognition | CodeCode Available | 4 |
| A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery | Jun 16, 2024 | scientific discoverySurvey | CodeCode Available | 4 |
| Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center | Jun 15, 2024 | | CodeCode Available | 4 |
| Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Jun 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses | Jun 14, 2024 | | CodeCode Available | 4 |
| MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations | Jun 13, 2024 | 3D visual groundingAttribute | CodeCode Available | 4 |
| HelpSteer2: Open-source dataset for training top-performing reward models | Jun 12, 2024 | Attribute | CodeCode Available | 4 |
| One-Step Effective Diffusion Network for Real-World Image Super-Resolution | Jun 12, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 4 |
| Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Jun 12, 2024 | | CodeCode Available | 4 |
| Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling | Jun 11, 2024 | 4kLanguage Modeling | CodeCode Available | 4 |
| AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising | Jun 11, 2024 | Denoising | CodeCode Available | 4 |
| Simple and Effective Masked Diffusion Language Models | Jun 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice | Jun 11, 2024 | NetHackreinforcement-learning | CodeCode Available | 4 |
| Mamba YOLO: A Simple Baseline for Object Detection with State Space Model | Jun 9, 2024 | GPUMamba | CodeCode Available | 4 |
| MotionClone: Training-Free Motion Cloning for Controllable Video Generation | Jun 8, 2024 | DenoisingMotion Generation | CodeCode Available | 4 |
| The CLRS-Text Algorithmic Reasoning Language Benchmark | Jun 6, 2024 | | CodeCode Available | 4 |
| Lean Workbook: A large-scale Lean problem set formalized from natural language math problems | Jun 6, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search | Jun 6, 2024 | | CodeCode Available | 4 |
| Nomic Embed Vision: Expanding the Latent Space | Jun 6, 2024 | | CodeCode Available | 4 |
| Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving | Jun 6, 2024 | Autonomous DrivingBench2Drive | CodeCode Available | 4 |
| AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | Jun 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Scaling and evaluating sparse autoencoders | Jun 6, 2024 | Language Modelling | CodeCode Available | 4 |
| DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images | Jun 5, 2024 | 2D Object DetectionDenoising | CodeCode Available | 4 |
| Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | Jun 4, 2024 | Common Sense Reasoning | CodeCode Available | 4 |
| Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation | Jun 4, 2024 | Face SwappingGPU | CodeCode Available | 4 |
| Guiding a Diffusion Model with a Bad Version of Itself | Jun 4, 2024 | Image Generation | CodeCode Available | 4 |
| RaDe-GS: Rasterizing Depth in Gaussian Splatting | Jun 3, 2024 | Computational EfficiencyNovel View Synthesis | CodeCode Available | 4 |
| UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Jun 3, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry | Jun 3, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 4 |
| Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval | Jun 2, 2024 | Information RetrievalRAG | CodeCode Available | 4 |
| End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model | Jun 2, 2024 | Image Reconstruction | CodeCode Available | 4 |
| R^2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction | May 31, 2024 | 3DGSNeRF | CodeCode Available | 4 |