| Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer | Nov 16, 2024 | Text Generation | CodeCode Available | 1 |
| TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition | Nov 16, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 1 |
| MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map | Nov 16, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| SAM Decoding: Speculative Decoding via Suffix Automaton | Nov 16, 2024 | RetrievalText Generation | CodeCode Available | 1 |
| Underwater Image Enhancement with Cascaded Contrastive Learning | Nov 16, 2024 | Contrastive LearningImage Enhancement | CodeCode Available | 1 |
| Explainable DNN-based Beamformer with Postfilter | Nov 16, 2024 | Speech Enhancement | CodeCode Available | 1 |
| HJ-Ky-0.1: an Evaluation Dataset for Kyrgyz Word Embeddings | Nov 16, 2024 | Sentiment AnalysisWord Embeddings | CodeCode Available | 1 |
| C-DiffSET: Leveraging Latent Diffusion for SAR-to-EO Image Translation with Confidence-Guided Reliable Object Generation | Nov 16, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 1 |
| Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Nov 16, 2024 | Mixture-of-ExpertsOptical Character Recognition (OCR) | CodeCode Available | 1 |
| XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection | Nov 15, 2024 | Audio Deepfake DetectionAutomatic Speech Recognition | CodeCode Available | 1 |
| RETR: Multi-View Radar Detection Transformer for Indoor Perception | Nov 15, 2024 | Instance Segmentationobject-detection | CodeCode Available | 1 |
| M3TR: A Generalist Model for Real-World HD Map Completion | Nov 15, 2024 | Autonomous Vehicles | CodeCode Available | 1 |
| AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment | Nov 15, 2024 | | CodeCode Available | 1 |
| InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma | Nov 15, 2024 | Multi-agent Reinforcement Learning | CodeCode Available | 1 |
| Efficient Density Control for 3D Gaussian Splatting | Nov 15, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 1 |
| Unveiling Topological Structures in Text: A Comprehensive Survey of Topological Data Analysis Applications in NLP | Nov 15, 2024 | Topological Data Analysis | CodeCode Available | 1 |
| TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding | Nov 15, 2024 | Graph MatchingGraph Neural Network | CodeCode Available | 1 |
| Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning | Nov 15, 2024 | Cross-Domain Few-Shotcross-domain few-shot learning | CodeCode Available | 1 |
| OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models | Nov 15, 2024 | Optical Flow EstimationText-to-Video Generation | CodeCode Available | 1 |
| Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses | Nov 15, 2024 | Depth EstimationMulti-Task Learning | CodeCode Available | 1 |
| Vision Eagle Attention: a new lens for advancing image classification | Nov 15, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| A unifying framework for generalised Bayesian online learning in non-stationary environments | Nov 15, 2024 | Continual LearningMulti-Armed Bandits | CodeCode Available | 1 |
| Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Nov 15, 2024 | Image Stitching | CodeCode Available | 1 |
| SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers | Nov 15, 2024 | Image GenerationSpeech Synthesis | CodeCode Available | 1 |
| Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement | Nov 15, 2024 | model | CodeCode Available | 1 |
| Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Nov 15, 2024 | HallucinationMultimodal Reasoning | CodeCode Available | 1 |
| OneNet: A Channel-Wise 1D Convolutional U-Net | Nov 14, 2024 | DecoderImage Segmentation | CodeCode Available | 1 |
| DTELS: Towards Dynamic Granularity of Timeline Summarization | Nov 14, 2024 | InformativenessTimeline Summarization | CodeCode Available | 1 |
| On the Surprising Effectiveness of Attention Transfer for Vision Transformers | Nov 14, 2024 | | CodeCode Available | 1 |
| Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers | Nov 14, 2024 | Combinatorial Optimization | CodeCode Available | 1 |
| MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation | Nov 14, 2024 | Optical Flow EstimationVisual Tracking | CodeCode Available | 1 |
| Spider: Any-to-Many Multimodal LLM | Nov 14, 2024 | multimodal interaction | CodeCode Available | 1 |
| OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic Synthesis | Nov 14, 2024 | Dataset Generation | CodeCode Available | 1 |
| Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework | Nov 14, 2024 | Question AnsweringRAG | CodeCode Available | 1 |
| OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling | Nov 14, 2024 | | CodeCode Available | 1 |
| Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Nov 14, 2024 | Computational EfficiencyObject | CodeCode Available | 1 |
| A multidimensional measurement of photorealistic avatar quality of experience | Nov 13, 2024 | SSIM | CodeCode Available | 1 |
| A Chinese Multi-label Affective Computing Dataset Based on Social Media Network Users | Nov 13, 2024 | Marketingvalid | CodeCode Available | 1 |
| Conditional Variable Flow Matching: Transforming Conditional Densities with Amortized Conditional Optimal Transport | Nov 13, 2024 | | CodeCode Available | 1 |
| The Systems Engineering Approach in Times of Large Language Models | Nov 13, 2024 | | CodeCode Available | 1 |
| Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Nov 13, 2024 | | CodeCode Available | 1 |
| Causal Explanations for Image Classifiers | Nov 13, 2024 | | CodeCode Available | 1 |
| Refusal in LLMs is an Affine Function | Nov 13, 2024 | | CodeCode Available | 1 |
| ClevrSkills: Compositional Language and Visual Reasoning in Robotics | Nov 13, 2024 | Visual Reasoning | CodeCode Available | 1 |
| Neural Topic Modeling with Large Language Models in the Loop | Nov 13, 2024 | Topic coverageTopic Models | CodeCode Available | 1 |
| APDDv2: Aesthetics of Paintings and Drawings Dataset with Artist Labeled Scores and Comments | Nov 13, 2024 | | CodeCode Available | 1 |
| Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Nov 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification | Nov 12, 2024 | ClassificationContrastive Learning | CodeCode Available | 1 |
| Privacy-Preserving Verifiable Neural Network Inference Service | Nov 12, 2024 | Privacy Preserving | CodeCode Available | 1 |
| ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG | Nov 12, 2024 | RAGRetrieval | CodeCode Available | 1 |