| KernelWarehouse: Rethinking the Design of Dynamic Convolution | Jun 12, 2024 | | CodeCode Available | 2 |
| Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification | Jun 12, 2024 | Hyperspectral Image Classificationimage-classification | CodeCode Available | 2 |
| LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning | Jun 12, 2024 | text-to-speechText to Speech | CodeCode Available | 2 |
| OpenCOLE: Towards Reproducible Automatic Graphic Design Generation | Jun 12, 2024 | | CodeCode Available | 2 |
| GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices | Jun 12, 2024 | Navigate | CodeCode Available | 2 |
| Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio | Jun 12, 2024 | Clustering | CodeCode Available | 2 |
| Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding Network | Jun 12, 2024 | Image Dehazing | CodeCode Available | 2 |
| Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Jun 12, 2024 | Audio captioningHallucination | CodeCode Available | 2 |
| Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis | Jun 12, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| Discovering Preference Optimization Algorithms with and for Large Language Models | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network | Jun 11, 2024 | 3D Object DetectionActive Learning | CodeCode Available | 2 |
| Improving Autoformalization using Type Checking | Jun 11, 2024 | Informal-to-formal Style Transfer | CodeCode Available | 2 |
| Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation | Jun 11, 2024 | | CodeCode Available | 2 |
| Let Go of Your Labels with Unsupervised Transfer | Jun 11, 2024 | Image ClusteringUnsupervised Image Classification | CodeCode Available | 2 |
| Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Jun 11, 2024 | HallucinationImage Description | CodeCode Available | 2 |
| Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Jun 11, 2024 | Multiple-choiceSelection bias | CodeCode Available | 2 |
| OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding | Jun 11, 2024 | Action UnderstandingDiversity | CodeCode Available | 2 |
| GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection | Jun 11, 2024 | Anomaly DetectionDenoising | CodeCode Available | 2 |
| Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning | Jun 11, 2024 | Contrastive Learning | CodeCode Available | 2 |
| Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring | Jun 11, 2024 | DeblurringOptical Flow Estimation | CodeCode Available | 2 |
| Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees | Jun 11, 2024 | | CodeCode Available | 2 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 |
| A Synthetic Dataset for Personal Attribute Inference | Jun 11, 2024 | AttributeAuthor Profiling | CodeCode Available | 2 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 |
| A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation | Jun 11, 2024 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 2 |