| Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Sep 20, 2024 | Image CaptioningPanoptic Segmentation | CodeCode Available | 1 |
| From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models | Sep 19, 2024 | Survey | CodeCode Available | 1 |
| DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency | Sep 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation | Sep 19, 2024 | Speech Enhancement | CodeCode Available | 1 |
| Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation | Sep 19, 2024 | Domain GeneralizationImage Segmentation | CodeCode Available | 1 |
| A Case Study of Web App Coding with OpenAI Reasoning Models | Sep 19, 2024 | Code Generation | CodeCode Available | 1 |
| Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning | Sep 19, 2024 | FormInstruction Following | CodeCode Available | 1 |
| Enhancing Agricultural Environment Perception via Active Vision and Zero-Shot Learning | Sep 19, 2024 | 3D ReconstructionZero-Shot Learning | CodeCode Available | 1 |
| PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis | Sep 19, 2024 | Contrastive Learning | CodeCode Available | 1 |
| ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning | Sep 19, 2024 | Audio Synthesis | CodeCode Available | 1 |
| Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation | Sep 19, 2024 | RAGRetrieval | CodeCode Available | 1 |
| Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework | Sep 19, 2024 | Motion CompensationVideo Generation | CodeCode Available | 1 |
| Fundus image enhancement through direct diffusion bridges | Sep 19, 2024 | Image Enhancement | CodeCode Available | 1 |
| Reinforcement Learning-based Model Predictive Control for Greenhouse Climate Control | Sep 19, 2024 | Model Predictive ControlPrediction | CodeCode Available | 1 |
| Language Models Learn to Mislead Humans via RLHF | Sep 19, 2024 | Question Answering | CodeCode Available | 1 |
| Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning | Sep 19, 2024 | Change DetectionDecoder | CodeCode Available | 1 |
| MEXMA: Token-level objectives improve sentence representations | Sep 19, 2024 | Sentence | CodeCode Available | 1 |
| MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation | Sep 19, 2024 | Computational EfficiencyImage Segmentation | CodeCode Available | 1 |
| PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) | Sep 19, 2024 | Code GenerationContrastive Learning | CodeCode Available | 1 |
| Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC | Sep 19, 2024 | Disentanglementspeech-recognition | CodeCode Available | 1 |
| CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs | Sep 19, 2024 | GPU | CodeCode Available | 1 |
| Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering | Sep 19, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Accurate Automatic 3D Annotation of Traffic Lights and Signs for Autonomous Driving | Sep 19, 2024 | Autonomous DrivingSelf-Driving Cars | CodeCode Available | 1 |
| DenoMamba: A fused state-space model for low-dose CT denoising | Sep 19, 2024 | DenoisingDiagnostic | CodeCode Available | 1 |
| Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources | Sep 18, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 |
| Mastering Chess with a Transformer Model | Sep 18, 2024 | Decision Makingmodel | CodeCode Available | 1 |
| MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts | Sep 18, 2024 | Memorization | CodeCode Available | 1 |
| Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics | Sep 18, 2024 | | CodeCode Available | 1 |
| BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling | Sep 18, 2024 | NeRF | CodeCode Available | 1 |
| SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation | Sep 18, 2024 | Image Morphing | CodeCode Available | 1 |
| Generalized compression and compressive search of large datasets | Sep 18, 2024 | | CodeCode Available | 1 |
| Linguini: A benchmark for language-agnostic linguistic reasoning | Sep 18, 2024 | | CodeCode Available | 1 |
| DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion | Sep 18, 2024 | Infrared And Visible Image FusionScene Understanding | CodeCode Available | 1 |
| To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Sep 18, 2024 | MathMMLU | CodeCode Available | 1 |
| MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Sep 18, 2024 | Motion GenerationRetrieval | CodeCode Available | 1 |
| Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models | Sep 18, 2024 | | CodeCode Available | 1 |
| MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Sep 18, 2024 | Math | CodeCode Available | 1 |
| Massively Multi-Person 3D Human Motion Forecasting with Scene Context | Sep 18, 2024 | DecoderDenoising | CodeCode Available | 1 |
| Self-Supervised Speed of Sound Recovery for Aberration-Corrected Photoacoustic Computed Tomography | Sep 17, 2024 | Image Reconstruction | CodeCode Available | 1 |
| HS3-Bench: A Benchmark and Strong Baseline for Hyperspectral Semantic Segmentation in Driving Scenarios | Sep 17, 2024 | Autonomous DrivingHyperspectral Image Segmentation | CodeCode Available | 1 |
| Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Sep 17, 2024 | Active LearningDiversity | CodeCode Available | 1 |
| Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation | Sep 17, 2024 | FairnessRAG | CodeCode Available | 1 |
| Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems | Sep 17, 2024 | Reinforcement Learning (RL) | CodeCode Available | 1 |
| Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Sep 17, 2024 | GSM8KQuestion Answering | CodeCode Available | 1 |
| Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | Sep 17, 2024 | Question AnsweringToken Reduction | CodeCode Available | 1 |
| Ultrasound Image Enhancement with the Variance of Diffusion Models | Sep 17, 2024 | DenoisingImage Enhancement | CodeCode Available | 1 |
| Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers | Sep 17, 2024 | DenoisingImage Denoising | CodeCode Available | 1 |
| Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification | Sep 17, 2024 | | CodeCode Available | 1 |
| LOLA -- An Open-Source Massively Multilingual Large Language Model | Sep 17, 2024 | DiversityLanguage Modeling | CodeCode Available | 1 |