Ying-Cong Chen
  • Bio
  • News
  • Publications
  • Playground
  • Projects
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn
  • Experience
  • Blog
    • 🎉 Easily create your own simple yet highly customizable blog
    • 🧠 Sharpen your thinking with a second brain
    • 📈 Communicate your results effectively with the best data visualizations
    • 👩🏼‍🏫 Teach academic courses
    • ✅ Manage your projects
  • News
    • Our CVPR paper "Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation" is getting noticed
    • Our CVPR paper "TransPixeler: Advancing Text-to-Video Generation with Transparency" is getting noticed
    • Our arxiv paper "Lotus Diffusion-based Visual Foundation Model for High-quality Dense Prediction" is getting noticed
    • Our paper Luciddreamer is selected as Spotlight at CVPR 2024 and featured on Hugging Face Daily
    • Invited Talk: International conference on Artificial Intelligence & Machine Learning (AIM-2024)
    • Invited Talk: International Conference on Applied Mathematics 2024
    • We win the 1st Place in RobotDrive Challenge (ICRA 2024)
    • I am invited to give a Annual Progressive Report in China3DV
    • Invited Talk: 2024年广州市国资并购联合会会员大会
    • 关于生成式模型及其社会影响的采访(广州日报)
    • Our Paper “Ref-NeuS” Nominated for Best Paper at ICCV 2023
  • Playground
    • Transpixar
    • Depth Prediction
    • Motion Inversion
    • Normal Prediction
  • Recent & Upcoming Talks
    • Example Talk
  • Selected Publications
    • TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
    • Dual-balancing for multi-task learning
    • T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection
    • UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
    • STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
    • Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
    • ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
    • DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
    • DivPro: diverse protein sequence design with direct structure recovery guidance
    • DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
    • Event-Guided Consistent Video Enhancement with Modality-Adaptive Diffusion Pipeline
    • exGen: Flexible Multi-View Generation from Text and Image Inputs
    • GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
    • Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
    • Large Language Models for Transforming Healthcare: A Perspective on DeepSeek‐R1
    • Lotus: Diffusion-based visual foundation model for high-quality dense prediction
    • LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images
    • Motion inversion for video customization
    • MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders
    • Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis
    • PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
    • POSTA: A Go-to Framework for Customized Artistic Poster Generation
    • PreGenie: An Agentic Framework for High-quality Visual Presentation Generation
    • PRM: Photometric Stereo based Large Reconstruction Model
    • RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification
    • RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
    • Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
    • Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
    • SEED-Story: Multimodal Long Story Generation with Large Language Model
    • StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams
    • SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
    • TransPixeler: Advancing Text-to-Video Generation with Transparency
    • Uni-IR: One Stage is Enough for Ambiguity-Reduced Inverse Rendering
    • Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
    • OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
    • An Incremental Unified Framework for Small Defect Inspection
    • Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
    • Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
    • MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
    • Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
    • Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning
    • Learning to Remove Wrinkled Transparent Film with Polarized Prior
    • Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
    • LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
    • Backdoor Contrastive Learning via Bi-level Trigger Optimization
    • Denoising Diffusion Step-aware Models
    • GNeRP: Gaussian-guided Neural Reconstruction of Reflective Objects with Noisy Polarization Priors
    • Adv3D: Generating 3D Adversarial Examples for 3D Object Detection in Driving Scenarios with NeRF
    • MantraNet: Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets
    • Rethinking Rendering in Generalizable Neural Surface Reconstruction: A Learning-based Solution
    • Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection
    • Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation
    • Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
    • Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
    • Neuron Structure Modeling for Generalizable Remote Physiological Measurement
    • Real-time 6K Image Rescaling with Rate-distortion Optimization
    • CP-NeRF: Conditionally Parameterized Neural Radiance Fields for Cross-scene Novel View Synthesis
    • Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals
    • DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
    • RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering
    • Semi-supervised Monocular 3D Object Detection by Multi-view Consistency
    • Representation Compensation Networks for Continual Semantic Segmentation
    • Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification
    • SC-GAN: Image Synthesis via Semantic Composition
    • Delving into Deep Imbalanced Regression
    • PointINS: Point-based instance segmentation
    • Text-Guided Human Image Manipulation via Image-Text Shared Space
    • Attentive normalization for conditional image generation
    • Domain Adaptive Image-to-image Translation
    • Homomorphic Interpolation Network for Unpaired Image-to-image Translation
    • VCNet: A Robust Approach to Blind Image Inpainting
    • Homomorphic latent space interpolation for unpaired image-to-image translation
    • Semantic component decomposition for face attribute manipulation
    • View Independent Generative Adversarial Network for Novel View Synthesis
    • Facelet-bank for fast portrait manipulation
    • Person Re-Identification by Camera Correlation Aware Feature Augmentation
    • Makeup-go: Blind reversion of portrait edit
    • An asymmetric distance model for cross-view feature mapping in person reidentification
    • An enhanced deep feature representation for person re-identification
    • Mirror representation for modeling view-specific transform in person re-identification
  • Teaching
    • Learn JavaScript
    • Learn Python

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Jan 1, 2025·
Baijiong Lin
,
Weisen Jiang
,
Pengguang Chen
,
Shu Liu
Ying-Cong Chen
Ying-Cong Chen
· 0 min read
PDF Cite Code
Type
2
Publication
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Last updated on Mar 19, 2026
Ying-Cong Chen
Authors
Ying-Cong Chen
Assistant Professor

← Motion inversion for video customization Jan 1, 2025
Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis Jan 1, 2025 →

© 2026 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.