| MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Jul 2, 2024 | | CodeCode Available | 3 |
| Searching for Best Practices in Retrieval-Augmented Generation | Jul 1, 2024 | Question AnsweringRAG | CodeCode Available | 3 |
| BERGEN: A Benchmarking Library for Retrieval-Augmented Generation | Jul 1, 2024 | BenchmarkingRAG | CodeCode Available | 3 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart | Jul 1, 2024 | 3D Medical Imaging Segmentationimage-classification | CodeCode Available | 3 |
| CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents | Jul 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Retrieval-augmented generation in multilingual settings | Jul 1, 2024 | Prompt EngineeringRAG | CodeCode Available | 3 |
| StyleShot: A Snapshot on Any Style | Jul 1, 2024 | Image GenerationStyle Transfer | CodeCode Available | 3 |
| Tree Search for Language Model Agents | Jul 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation | Jun 30, 2024 | AllDeblurring | CodeCode Available | 3 |
| Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting | Jun 29, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting | Jun 28, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Jun 28, 2024 | Vision-Language-ActionWorld Knowledge | CodeCode Available | 3 |
| EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Jun 28, 2024 | Interactive SegmentationLanguage Modeling | CodeCode Available | 3 |
| Segment Anything without Supervision | Jun 28, 2024 | ClusteringImage Segmentation | CodeCode Available | 3 |
| HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | Jun 27, 2024 | Visual Question Answering (VQA) | CodeCode Available | 3 |
| Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Jun 26, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| A Survey on Mixture of Experts | Jun 26, 2024 | In-Context LearningMixture-of-Experts | CodeCode Available | 3 |
| Diffusion Model-Based Video Editing: A Survey | Jun 26, 2024 | modelSurvey | CodeCode Available | 3 |
| A Review of Large Language Models and Autonomous Agents in Chemistry | Jun 26, 2024 | Property Predictionscientific discovery | CodeCode Available | 3 |
| AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors | Jun 26, 2024 | Diversity | CodeCode Available | 3 |
| Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text | Jun 25, 2024 | 3D GenerationDenoising | CodeCode Available | 3 |
| Point-SAM: Promptable 3D Segmentation Model for Point Clouds | Jun 25, 2024 | Image SegmentationSegmentation | CodeCode Available | 3 |
| Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear Classification | Jun 24, 2024 | | CodeCode Available | 3 |
| Adam-mini: Use Fewer Learning Rates To Gain More | Jun 24, 2024 | | CodeCode Available | 3 |
| Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant | Jun 24, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 3 |
| Lossless data compression by large models | Jun 24, 2024 | Data Compression | CodeCode Available | 3 |
| GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization | Jun 24, 2024 | Image ManipulationImage Manipulation Detection | CodeCode Available | 3 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| AudioBench: A Universal Benchmark for Audio Large Language Models | Jun 23, 2024 | Audio Scene UnderstandingInstruction Following | CodeCode Available | 3 |
| Are Language Models Actually Useful for Time Series Forecasting? | Jun 22, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| Taming 3DGS: High-Quality Radiance Fields with Limited Resources | Jun 21, 2024 | 3DGSAttribute | CodeCode Available | 3 |
| A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Jun 20, 2024 | Video Editing | CodeCode Available | 3 |
| ^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials | Jun 20, 2024 | Drug DiscoveryMolecular Property Prediction | CodeCode Available | 3 |
| Consistency Models Made Easy | Jun 20, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines | Jun 20, 2024 | Diversityobject-detection | CodeCode Available | 3 |
| LLM4CP: Adapting Large Language Models for Channel Prediction | Jun 20, 2024 | PredictionTime Series Analysis | CodeCode Available | 3 |
| AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents | Jun 19, 2024 | | CodeCode Available | 3 |
| Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | Jun 19, 2024 | Instruction Following | CodeCode Available | 3 |
| GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Jun 19, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Jun 19, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts | Jun 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 |
| Detecting hallucinations in large language models using semantic entropy | Jun 19, 2024 | Large Language ModelQuestion Answering | CodeCode Available | 3 |
| Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? | Jun 19, 2024 | RAGRetrieval | CodeCode Available | 3 |
| Evaluating representation learning on the protein structure universe | Jun 19, 2024 | Representation Learning | CodeCode Available | 3 |
| DF40: Toward Next-Generation Deepfake Detection | Jun 19, 2024 | DeepFake DetectionFace Reenactment | CodeCode Available | 3 |
| TSI-Bench: Benchmarking Time Series Imputation | Jun 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 3 |
| VoCo-LLaMA: Towards Vision Compression with Large Language Models | Jun 18, 2024 | Computational EfficiencyQuestion Answering | CodeCode Available | 3 |
| Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech | Jun 18, 2024 | Deep LearningDependency Parsing | CodeCode Available | 3 |