| Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images | Jul 29, 2024 | object-detectionObject Detection | CodeCode Available | 2 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network | Jul 29, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| NAVIX: Scaling MiniGrid Environments with JAX | Jul 28, 2024 | CPUDeep Reinforcement Learning | CodeCode Available | 2 |
| Temporal Feature Matters: A Framework for Diffusion Model Quantization | Jul 28, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Perm: A Parametric Representation for Multi-Style 3D Hair Modeling | Jul 28, 2024 | Image Generation | CodeCode Available | 2 |
| Contrastive Learning of Asset Embeddings from Financial Time Series | Jul 26, 2024 | Contrastive LearningManagement | CodeCode Available | 2 |
| Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation | Jul 26, 2024 | Knowledge DistillationQuestion Answering | CodeCode Available | 2 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 |
| Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement Network | Jul 26, 2024 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| VSSD: Vision Mamba with Non-Causal State Space Duality | Jul 26, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| RefMask3D: Language-Guided Transformer for 3D Referring Segmentation | Jul 25, 2024 | 3D visual groundingImage Segmentation | CodeCode Available | 2 |
| Towards Localized Fine-Grained Control for Facial Expression Generation | Jul 25, 2024 | AnatomyFace Generation | CodeCode Available | 2 |
| VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset | Jul 25, 2024 | Head DetectionKeypoint Estimation | CodeCode Available | 2 |
| The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | Jul 25, 2024 | | CodeCode Available | 2 |
| Reshape Dimensions Network for Speaker Recognition | Jul 25, 2024 | Speaker Recognition | CodeCode Available | 2 |
| LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Jul 25, 2024 | Code GenerationComputational Efficiency | CodeCode Available | 2 |
| Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision | Jul 25, 2024 | DiversityMedical Image Analysis | CodeCode Available | 2 |
| Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Jul 25, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 2 |
| FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing | Jul 25, 2024 | Text-based Image Editing | CodeCode Available | 2 |
| RegionDrag: Fast Region-Based Image Editing with Diffusion Models | Jul 25, 2024 | | CodeCode Available | 2 |
| DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Jul 24, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal | Jul 24, 2024 | Raindrop RemovalRain Removal | CodeCode Available | 2 |
| u-μP: The Unit-Scaled Maximal Update Parametrization | Jul 24, 2024 | | CodeCode Available | 2 |
| Adaptive Training of Grid-Dependent Physics-Informed Kolmogorov-Arnold Networks | Jul 24, 2024 | Kolmogorov-Arnold NetworksPhysics-informed machine learning | CodeCode Available | 2 |
| dlordinal: a Python package for deep ordinal classification | Jul 24, 2024 | ClassificationOrdinal Classification | CodeCode Available | 2 |
| LoFormer: Local Frequency Transformer for Image Deblurring | Jul 24, 2024 | DeblurringImage Deblurring | CodeCode Available | 2 |
| Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos | Jul 23, 2024 | Image GenerationPoint Tracking | CodeCode Available | 2 |
| Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Jul 23, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 2 |
| FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process | Jul 23, 2024 | | CodeCode Available | 2 |
| Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems | Jul 23, 2024 | ColorizationDeblurring | CodeCode Available | 2 |
| COALA: A Practical and Vision-Centric Federated Learning Platform | Jul 23, 2024 | BenchmarkingContinual Learning | CodeCode Available | 2 |
| PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Jul 23, 2024 | Instance SegmentationObject | CodeCode Available | 2 |
| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data | Jul 23, 2024 | Autonomous DrivingAutonomous Racing | CodeCode Available | 2 |
| Harmonizing Visual Text Comprehension and Generation | Jul 23, 2024 | multimodal generationReading Comprehension | CodeCode Available | 2 |
| Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning | Jul 23, 2024 | | CodeCode Available | 2 |
| MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning | Jul 23, 2024 | BenchmarkingDecision Making | CodeCode Available | 2 |
| MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection | Jul 23, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| KAN or MLP: A Fairer Comparison | Jul 23, 2024 | Continual Learning | CodeCode Available | 2 |
| MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Jul 22, 2024 | DiversityMultiple-choice | CodeCode Available | 2 |
| DiffArtist: Towards Structure and Appearance Controllable Image Stylization | Jul 22, 2024 | DisentanglementImage Stylization | CodeCode Available | 2 |
| Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training | Jul 22, 2024 | | CodeCode Available | 2 |
| Retrieval with Learned Similarities | Jul 22, 2024 | Question AnsweringRecommendation Systems | CodeCode Available | 2 |
| LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Jul 22, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| AICircuit: A Multi-Level Dataset and Benchmark for AI-Driven Analog Integrated Circuit Design | Jul 22, 2024 | | CodeCode Available | 2 |
| A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model | Jul 22, 2024 | Diagnosticwhole slide images | CodeCode Available | 2 |
| Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Jul 21, 2024 | General KnowledgeHighlight Detection | CodeCode Available | 2 |
| Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation | Jul 21, 2024 | | CodeCode Available | 2 |
| MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation | Jul 21, 2024 | DiversityMusic Generation | CodeCode Available | 2 |