| Discovering Preference Optimization Algorithms with and for Large Language Models | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification | Jun 12, 2024 | Hyperspectral Image Classificationimage-classification | CodeCode Available | 2 |
| Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding Network | Jun 12, 2024 | Image Dehazing | CodeCode Available | 2 |
| LVBench: An Extreme Long Video Understanding Benchmark | Jun 12, 2024 | Decision MakingVideo Understanding | CodeCode Available | 2 |
| Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis | Jun 12, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| KernelWarehouse: Rethinking the Design of Dynamic Convolution | Jun 12, 2024 | | CodeCode Available | 2 |
| Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Jun 12, 2024 | Audio captioningHallucination | CodeCode Available | 2 |
| Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio | Jun 12, 2024 | Clustering | CodeCode Available | 2 |
| GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices | Jun 12, 2024 | Navigate | CodeCode Available | 2 |
| EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech | Jun 12, 2024 | Emotional Speech Synthesistext-to-speech | CodeCode Available | 2 |
| EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network | Jun 11, 2024 | 3D Object DetectionActive Learning | CodeCode Available | 2 |
| Autoregressive Pretraining with Mamba in Vision | Jun 11, 2024 | Mamba | CodeCode Available | 2 |
| OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding | Jun 11, 2024 | Action UnderstandingDiversity | CodeCode Available | 2 |
| Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation | Jun 11, 2024 | | CodeCode Available | 2 |
| RWKV-CLIP: A Robust Vision-Language Representation Learner | Jun 11, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 2 |
| Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning | Jun 11, 2024 | Contrastive Learning | CodeCode Available | 2 |
| Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring | Jun 11, 2024 | DeblurringOptical Flow Estimation | CodeCode Available | 2 |
| Let Go of Your Labels with Unsupervised Transfer | Jun 11, 2024 | Image ClusteringUnsupervised Image Classification | CodeCode Available | 2 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 |
| Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Jun 11, 2024 | HallucinationImage Description | CodeCode Available | 2 |
| Improving Autoformalization using Type Checking | Jun 11, 2024 | Informal-to-formal Style Transfer | CodeCode Available | 2 |
| CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence | Jun 11, 2024 | | CodeCode Available | 2 |
| QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jun 11, 2024 | | CodeCode Available | 2 |
| Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring | Jun 11, 2024 | AttributeDomain Generalization | CodeCode Available | 2 |
| GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection | Jun 11, 2024 | Anomaly DetectionDenoising | CodeCode Available | 2 |
| Needle In A Multimodal Haystack | Jun 11, 2024 | Retrieval | CodeCode Available | 2 |
| Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees | Jun 11, 2024 | | CodeCode Available | 2 |
| A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation | Jun 11, 2024 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 2 |
| Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Jun 11, 2024 | Multiple-choiceSelection bias | CodeCode Available | 2 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 |
| A Synthetic Dataset for Personal Attribute Inference | Jun 11, 2024 | AttributeAuthor Profiling | CodeCode Available | 2 |
| Meent: Differentiable Electromagnetic Simulator for Machine Learning | Jun 11, 2024 | | CodeCode Available | 2 |
| MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models | Jun 10, 2024 | Language Modelling | CodeCode Available | 2 |
| FRAG: Frequency Adapting Group for Diffusion Video Editing | Jun 10, 2024 | DenoisingVideo Editing | CodeCode Available | 2 |
| ProcessPainter: Learn Painting Process from Sequence Data | Jun 10, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Towards Lifelong Learning of Large Language Models: A Survey | Jun 10, 2024 | Continual PretrainingIncremental Learning | CodeCode Available | 2 |
| EpiLearn: A Python Library for Machine Learning in Epidemic Modeling | Jun 10, 2024 | | CodeCode Available | 2 |
| Generalizable Human Gaussians from Single-View Image | Jun 10, 2024 | Novel View SynthesisSSIM | CodeCode Available | 2 |
| Compositional Video Generation as Flow Equalization | Jun 10, 2024 | Video EditingVideo Generation | CodeCode Available | 2 |
| UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | Jun 10, 2024 | RAGRetrieval | CodeCode Available | 2 |
| Vript: A Video Is Worth Thousands of Words | Jun 10, 2024 | Video CaptioningVideo Understanding | CodeCode Available | 2 |
| NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | Jun 10, 2024 | SchedulingVideo Editing | CodeCode Available | 2 |
| RepoQA: Evaluating Long Context Code Understanding | Jun 10, 2024 | Code Search | CodeCode Available | 2 |
| MVGamba: Unify 3D Content Generation as State Space Sequence Modeling | Jun 10, 2024 | 3D GenerationAttribute | CodeCode Available | 2 |
| Safety Alignment Should Be Made More Than Just a Few Tokens Deep | Jun 10, 2024 | Safety Alignment | CodeCode Available | 2 |
| STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics | Jun 10, 2024 | | CodeCode Available | 2 |
| Low-Rank Quantization-Aware Training for LLMs | Jun 10, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| Compute Better Spent: Replacing Dense Layers with Structured Matrices | Jun 10, 2024 | | CodeCode Available | 2 |
| CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | Jun 10, 2024 | Fairness | CodeCode Available | 2 |
| Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset | Jun 10, 2024 | Instance SegmentationSalient Object Detection | CodeCode Available | 2 |