SOTAVerified

2k

Papers

Showing 150 of 288 papers

TitleStatusHype
Long-context LLMs Struggle with Long In-context LearningCode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
MovieChat+: Question-aware Sparse Memory for Long Video Question AnsweringCode4
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D AssetsCode4
Highly Accurate Dichotomous Image SegmentationCode4
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object DetectionCode4
Scaling Granite Code Models to 128K ContextCode4
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View SynthesisCode3
CAMixerSR: Only Details Need More "Attention"Code3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality DataCode3
MaskGWM: A Generalizable Driving World Model with Video Mask ReconstructionCode3
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Linear Attention Sequence ParallelismCode2
Task Me AnythingCode2
Elevating Flow-Guided Video Inpainting with Reference GenerationCode2
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion ModelsCode2
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View SynthesisCode2
STICKERCONV: Generating Multimodal Empathetic Responses from ScratchCode2
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid DatasetCode2
HHAvatar: Gaussian Head Avatar with Dynamic HairsCode2
XGen-7B Technical ReportCode2
360MonoDepth: High-Resolution 360deg Monocular Depth EstimationCode2
VFIMamba: Video Frame Interpolation with State Space ModelsCode2
Hyena Hierarchy: Towards Larger Convolutional Language ModelsCode2
Any-resolution Training for High-resolution Image SynthesisCode2
High-fidelity 3D Human Digitization from Single 2K Resolution ImagesCode2
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual ScenesCode2
Towards Metrical Reconstruction of Human FacesCode2
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise TrainingCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head AvatarsCode2
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion ModelsCode2
Ultra-Resolution Adaptation with EaseCode2
FastVAR: Linear Visual Autoregressive Modeling via Cached Token PruningCode2
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language ModelsCode1
MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D ScenesCode1
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMsCode1
Hierarchical Losses and New Resources for Fine-grained Entity Typing and LinkingCode1
360MonoDepth: High-Resolution 360° Monocular Depth EstimationCode1
Identifying concept libraries from language about object structureCode1
Meticulous Object SegmentationCode1
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data CharacteristicCode1
Gated Linear Attention Transformers with Hardware-Efficient TrainingCode1
CascadeV: An Implementation of Wurstchen Architecture for Video GenerationCode1
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable CompressionCode1
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large InputCode1
BuildingNet: Learning to Label 3D BuildingsCode1
End-to-End Speech Recognition from Federated Acoustic ModelsCode1
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.