Selected Publications

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026.
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy. International Conference on Learning Representations (ICLR), 2026.
T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection. AAAI Conference on Artificial Intelligence (AAAI), 2026.
Dual-balancing for multi-task learning. Neural Networks, 2026.
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding. arXiv preprint arXiv:2510.14588, 2025.
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
Uni-IR: One Stage is Enough for Ambiguity-Reduced Inverse Rendering. Pacific Graphics, 2025.
TransPixeler: Advancing Text-to-Video Generation with Transparency. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams. IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
SEED-Story: Multimodal Long Story Generation with Large Language Model. IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2025.
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (Highlight), 2025.
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification. arXiv preprint arXiv:2503.02537, 2025.
PRM: Photometric Stereo based Large Reconstruction Model. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (Highlight), 2025.
PreGenie: An Agentic Framework for High-quality Visual Presentation Generation. Empirical Methods in Natural Language Processing (EMNLP), 2025.
POSTA: A Go-to Framework for Customized Artistic Poster Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model. Proceedings of the International Conference on Machine Learning (ICML), 2025.
Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis. Empirical Methods in Natural Language Processing (EMNLP), 2025.
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.
Motion inversion for video customization. SIGGRAPH 2025, 2025.
LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images. Computer Graphics Forum (Pacific Graphics), 2025.
Lotus: Diffusion-based visual foundation model for high-quality dense prediction. International Conference on Learning Representations (ICLR), 2025, 2025.
Large Language Models for Transforming Healthcare: A Perspective on DeepSeek‐R1. MedComm – Future Medicine, 2025.
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
exGen: Flexible Multi-View Generation from Text and Image Inputs. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
Event-Guided Consistent Video Enhancement with Modality-Adaptive Diffusion Pipeline. Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving. Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
DivPro: diverse protein sequence design with direct structure recovery guidance. Bioinformatics, 2025.
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation. International Conference on Learning Representations (ICLR), 2025.
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback. Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction. arXiv preprint arXiv:2410.04932, 2024.
Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models. Proceedings of the European conference on computer vision (ECCV), 2024.
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders. Proceedings of the European conference on computer vision (ECCV), 2024.
Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics. Proceedings of the European conference on computer vision (ECCV), 2024.
Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement. Proceedings of the European conference on computer vision (ECCV), 2024.
An Incremental Unified Framework for Small Defect Inspection. Proceedings of the European conference on computer vision (ECCV), 2024.
Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning. International Conference on Machine Learning (ICML), 2024.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (Spotlight, Top 2.81%), 2024.
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Learning to Remove Wrinkled Transparent Film with Polarized Prior. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
GNeRP: Gaussian-guided Neural Reconstruction of Reflective Objects with Noisy Polarization Priors. International Conference on Learning Representations (ICLR), 2024.
Denoising Diffusion Step-aware Models. International Conference on Learning Representations (ICLR), 2024.
Backdoor Contrastive Learning via Bi-level Trigger Optimization. International Conference on Learning Representations (ICLR), 2024.
MantraNet: Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets. Computer Graphics Forum (Pacific Graphics), 2024.
Adv3D: Generating 3D Adversarial Examples for 3D Object Detection in Driving Scenarios with NeRF. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.
Rethinking Rendering in Generalizable Neural Surface Reconstruction: A Learning-based Solution. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023.
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023.
Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023.
Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Neuron Structure Modeling for Generalizable Remote Physiological Measurement. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Real-time 6K Image Rescaling with Rate-distortion Optimization. IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.
CP-NeRF: Conditionally Parameterized Neural Radiance Fields for Cross-scene Novel View Synthesis. Computer Graphics Forum (Pacific Graphics), 2023.
Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nature Medicine, 2022.
Semi-supervised Monocular 3D Object Detection by Multi-view Consistency. Proceedings of the European conference on computer vision (ECCV), 2022.
RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering. Proceedings of the European conference on computer vision (ECCV), 2022.
DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation. Proceedings of the European conference on computer vision (ECCV), 2022.
Representation Compensation Networks for Continual Semantic Segmentation. In Computer Vision and Pattern Recognition (CVPR), 2022.
SC-GAN: Image Synthesis via Semantic Composition. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
Text-Guided Human Image Manipulation via Image-Text Shared Space. In IEEE Transation on Pattern Analysis and Machine Intelligence, 2021.
PointINS: Point-based instance segmentation. In IEEE Transation on Pattern Analysis and Machine Intelligence, 2021.
Delving into Deep Imbalanced Regression. In International Conference on Machine Learning (Long Talk, Acceptance Rate: 3%), 2021.
VCNet: A Robust Approach to Blind Image Inpainting. In European Conference on Computer Vision, 2020.
Homomorphic Interpolation Network for Unpaired Image-to-image Translation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
Domain Adaptive Image-to-image Translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
Attentive normalization for conditional image generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral, Acceptance Rate: 5.7%), 2020.
View Independent Generative Adversarial Network for Novel View Synthesis. Proceedings of the IEEE International Conference on Computer Vision (Oral, Acceptance Rate: 2.1%), 2019.
Semantic component decomposition for face attribute manipulation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
Homomorphic latent space interpolation for unpaired image-to-image translation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral, Acceptance Rate: 5.6%), 2019.
Person Re-Identification by Camera Correlation Aware Feature Augmentation. In IEEE Transation on Pattern Analysis and Machine Intelligence (ESI highly cited paper), 2018.
Facelet-bank for fast portrait manipulation. Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
Makeup-go: Blind reversion of portrait edit. Proceedings of the IEEE International Conference on Computer Vision (Oral, Accepance Rate: Acceptance Rate: 2.09%), 2017.
An enhanced deep feature representation for person re-identification. IEEE winter conference on applications of computer vision (WACV), 2016.
An asymmetric distance model for cross-view feature mapping in person reidentification. In IEEE transactions on circuits and systems for video technology, 2016.
Mirror representation for modeling view-specific transform in person re-identification. Proceedings of the International Conference on Artificial Intelligence, 2015.