SOTAVerified

Attribute

Papers

Showing 51100 of 5387 papers

TitleStatusHype
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement LearningCode2
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion ModelingCode2
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D GenerationCode2
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMsCode2
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group LearningCode2
Objaverse++: Curated 3D Object Dataset with Quality AnnotationsCode2
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View ImageryCode2
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic SegmentationCode2
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language ModelsCode2
Is CLIP ideal? No. Can we fix it? Yes!Code2
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous DrivingCode2
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language ModelsCode2
EmbodiedEval: Evaluate Multimodal LLMs as Embodied AgentsCode2
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit ControlCode2
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic ResolutionCode2
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural AnnotationsCode2
QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint VideosCode2
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image InpaintingCode2
ResCLIP: Residual Attention for Training-free Dense Vision-language InferenceCode2
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet EncodingsCode2
Token Merging for Training-Free Semantic Binding in Text-to-Image SynthesisCode2
On the Role of Attention Heads in Large Language Model SafetyCode2
TRESTLE: A Model of Concept Formation in Structured DomainsCode2
PerCo (SD): Open Perceptual CompressionCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image RestorationCode2
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon TasksCode2
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video GenerationCode2
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape DisentanglementCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
RouteFinder: Towards Foundation Models for Vehicle Routing ProblemsCode2
Task Me AnythingCode2
A Synthetic Dataset for Personal Attribute InferenceCode2
Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition MonitoringCode2
MVGamba: Unify 3D Content Generation as State Space Sequence ModelingCode2
Binarized Diffusion Model for Image Super-ResolutionCode2
Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine LearningCode2
DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic ResolutionCode2
LVOS: A Benchmark for Large-scale Long-term Video Object SegmentationCode2
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic UnderstandingCode2
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept MatchingCode2
LLM Attributor: Interactive Visual Attribution for LLM GenerationCode2
Measuring Style Similarity in Diffusion ModelsCode2
SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large ObjectsCode2
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingCode2
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic DirectionsCode2
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language PromptCode2
Faceptor: A Generalist Model for Face PerceptionCode2
Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and ApplicationsCode2
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model RepresentationsCode2
Show:102550
← PrevPage 2 of 108Next →

No leaderboard results yet.