| DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers | Mar 28, 2025 | 2kImage Generation | —Unverified | 0 |
| Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior | Mar 26, 2025 | 2k | —Unverified | 0 |
| Ultra-Resolution Adaptation with Ease | Mar 20, 2025 | 2k4k | CodeCode Available | 2 |
| REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities | Mar 17, 2025 | 2kText Generation | —Unverified | 0 |
| Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation | Feb 26, 2025 | 16k2k | —Unverified | 0 |
| Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models | Feb 25, 2025 | 2kModels Alignment | —Unverified | 0 |
| Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks | Feb 24, 2025 | 2kARC | —Unverified | 0 |
| Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements | Feb 21, 2025 | 2kQuantization | —Unverified | 0 |
| Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning | Feb 18, 2025 | 2kLong-Context Understanding | —Unverified | 0 |
| MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | Feb 17, 2025 | 2kAutonomous Driving | CodeCode Available | 3 |
| Improved Regret in Stochastic Decision-Theoretic Online Learning under Differential Privacy | Feb 16, 2025 | 2k | —Unverified | 0 |
| CascadeV: An Implementation of Wurstchen Architecture for Video Generation | Jan 28, 2025 | 2kVideo Generation | CodeCode Available | 1 |
| Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Jan 24, 2025 | 2kLegal Reasoning | —Unverified | 0 |
| TimeLogic: A Temporal Logic Benchmark for Video QA | Jan 13, 2025 | 2kAction Segmentation | —Unverified | 0 |
| LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation | Jan 9, 2025 | 2k8k | —Unverified | 0 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language | Dec 31, 2024 | 2k | —Unverified | 0 |
| Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces | Dec 30, 2024 | 2kRobot Navigation | —Unverified | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| AnalogXpert: Automating Analog Topology Synthesis by Incorporating Circuit Design Expertise into Large Language Models | Dec 17, 2024 | 2kCode Generation | —Unverified | 0 |
| Block-Based Multi-Scale Image Rescaling | Dec 16, 2024 | 2k4k | —Unverified | 0 |
| Do Large Language Models Show Biases in Causal Learning? | Dec 13, 2024 | 2kMisinformation | —Unverified | 0 |
| Elevating Flow-Guided Video Inpainting with Reference Generation | Dec 12, 2024 | 2kVideo Inpainting | CodeCode Available | 2 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 |
| Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video | Dec 4, 2024 | 2k | —Unverified | 0 |