Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Liu, Tao; Yan, Hao; Chen, Mengting; Hu, Taihang; Yue, Zhengrong; Pan, Zihao; Lan, Jinsong; Zhu, Xiaoyong; Cheng, Ming-Ming; Zheng, Bo; Wang, Yaxing

Continuous-Time Distribution Matching
for Few-Step Diffusion Distillation

Tao Liu¹, Hao Yan², Mengting Chen^2,*, Taihang Hu², Zhengrong Yue², Zihao Pan², Jinsong Lan², Xiaoyong Zhu², Ming-Ming Cheng¹, Bo Zheng^2,†, Yaxing Wang^3,†

¹VCIP, College of Computer Science, Nankai University · ²Alibaba Group · ³College of Artificial Intelligence, Jilin University
^*Project leader ^†Co-corresponding authors

Project Page arXiv Code 🤗 SD3-Medium 🤗 LongCat

CDM teaser - comparison with DMD2 on Longcat-Image at 4 NFE

CDM enables high-fidelity few-step text-to-image generation. We compare our Continuous-Time Distribution Matching (CDM) against DMD2, both distilled from Longcat-Image (1024×1024) and evaluated at 4 NFE. Without relying on any GAN or reward-model auxiliary objectives, CDM produces sharper textures, richer fine-grained details, and overall higher visual fidelity.

Abstract

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL divergence tends to exhibit visual artifacts and over-smoothed outputs, often necessitating complex auxiliary modules—such as GANs or reward models—to restore visual fidelity.

In this work, we introduce Continuous-Time Distribution Matching (CDM), migrating the DMD framework from discrete anchoring to continuous optimization for the first time. CDM achieves this through two continuous-time designs. First, we replace the fixed discrete schedule with a dynamic continuous schedule of random length, so that distribution matching is enforced at arbitrary points along sampling trajectories rather than only at a few fixed anchors. Second, we propose a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student's velocity field, improving generalization and preserving fine visual details. Extensive experiments on different architectures, including SD3-Medium and Longcat-Image, demonstrate that CDM provides highly competitive visual fidelity for few-step image generation without relying on complex auxiliary objectives.

Method Overview

Overview of Continuous-Time Distribution Matching (CDM). Top: Our approach employs a dynamic continuous time schedule during backward simulation, sampling intermediate anchors uniformly from (0, 1]. Bottom Left: CFG augmentation (CA) and distribution matching (DM) operate on this dynamic schedule to align text-image conditions and data distributions at on-trajectory anchors. Bottom Right: To address inter-anchor inconsistency, the proposed CDM objective explicitly extrapolates off-trajectory latents using the predicted velocity.

Our unified training objective combines three complementary losses:

CFG Augmentation Loss (L_CA) — anchors structure and semantic text-image alignment;
Distribution Matching Loss (L_DM) — provides on-trajectory distributional supervision, aligning the student to the teacher's CFG-free distribution;
CDM Loss (L_CDM) — extends supervision to off-trajectory latents via velocity-driven extrapolation, mitigating numerical truncation errors during few-step inference.

Key Insight: Schedule Decoupling

Empirical evidence of schedule decoupling. (a) Conventional distillation strictly anchors backward simulation to predefined discrete inference timesteps. In contrast, our dynamic scheduling optimizes over uniformly sampled continuous timesteps t ∈ (0, 1] at each iteration. (b) Visually, the dynamically scheduled model produces finer details and fewer artifacts than the strictly aligned baseline. (c) Quantitatively, it also attains a higher HPSv3 score, indicating that exact discrete alignment is not only unnecessary but in fact restrictive—motivating our continuous-time formulation.

Understanding the DM Loss

Visual evidence on the role of the DM loss. Samples from teacher models (SD3-Medium and Longcat-Image) with and without CFG, compared against student models distilled with the DM loss alone. Students distilled with the DM loss alone closely match their teachers' CFG-free samples, indicating that the DM loss is not a mere stabilizer but the key driver that aligns the student to the teacher's CFG-free distribution.

Quantitative Results

Comparison of different methods on SD3-Medium and Longcat-Image. CDM achieves state-of-the-art performance across all metrics at only 4 NFE, without requiring any real images, GANs, or reward models.

SD3-Medium (1024×1024)

Method	NFE	Aesthetic ↑	DPGBench ↑	PickScore ↑	HPSv3 ↑	CLIPScore ↑	Image-Free	Continuous
Base (SD3-Medium)	100	5.885	85.04	21.73	8.189	28.60	-	-
Hyper-SD	8	5.180	80.43	20.82	6.054	27.93	✗	✓
Flash	4	5.968	80.47	21.69	8.282	28.18	✗	✗
TDM	4	6.013	83.12	21.61	8.468	27.63	✓	✗
DMD2	4	6.038	83.96	21.58	8.419	27.56	✓	✗
D-DMD	4	6.038	84.52	21.85	9.176	27.69	✓	✗
CDM (Ours)	4	6.075	85.26	21.95	9.561	27.98	✓	✓

Longcat-Image (1024×1024)

Method	NFE	Aesthetic ↑	DPGBench ↑	PickScore ↑	HPSv3 ↑	CLIPScore ↑	Image-Free	Continuous
Base (Longcat-Image)	100	5.926	87.08	21.65	9.450	26.78	-	-
DMD2	4	5.800	87.12	21.07	8.803	26.99	✓	✗
D-DMD	4	5.782	88.04	21.23	9.629	26.57	✓	✗
CDM (Ours)	4	5.919	88.35	21.53	10.65	26.72	✓	✓

Best and second-best results are highlighted in bold and underline, respectively. The base model serves as a reference and is excluded from the ranking.

Qualitative Comparison on SD3-Medium

Qualitative comparison on SD3-Medium. CDM produces more photorealistic results with richer details than competing methods. All results are generated using the same initial noise and random seed for fair comparison. CDM consistently yields sharper textures and fine-grained details, and stronger semantic adherence to multi-entity compositional prompts.

More Qualitative Results

SD3-Medium (4 NFE)

Longcat-Image (4 NFE)

More qualitative results on Longcat-Image

Ablation Study

Qualitative ablation of loss components across training steps. Left: Individual losses (CA, DM, CDM) in isolation. Right: Pairwise and full combinations. Partial combinations suffer from brightness collapse or degraded local fidelity at later stages, whereas our full objective (CA+DM+CDM) effectively preserves both global semantic coherence and local details.

Loss Components Ablation

Configuration	AES ↑	DPG ↑	PICK ↑	HPSv3 ↑	CLIP ↑
(a) Single Loss Ablation
w/o L_CA	5.861	72.87	21.05	8.128	24.78
w/o L_DM	6.016	84.57	21.75	8.954	27.66
w/o L_CDM	6.067	85.12	21.85	9.153	27.91
(b) Dual Loss Ablation
w/o L_DM & L_CDM	4.634	3.45	17.50	-10.15	14.60
w/o L_CA & L_CDM	5.787	70.60	20.82	7.258	25.31
w/o L_CA & L_DM	5.778	72.38	20.80	7.331	24.78
Full CDM	6.075	85.26	21.95	9.561	27.98

Core Mechanism Design Ablation

Model Variant	AES ↑	DPG ↑	PICK ↑	HPSv3 ↑	CLIP ↑
(a) Time Schedule
w/ Fixed Schedule	6.051	83.84	21.89	9.482	27.75
(b) Off-trajectory Perturbation
w/o Perturbation (on-traj)	6.027	84.43	21.94	9.374	27.90
w/ Gaussian Perturbation	6.040	84.65	21.92	9.516	27.88
(c) Target Latent Construction
w/ Full-trajectory target	6.026	85.14	21.92	9.346	27.97
Full CDM	6.075	85.26	21.95	9.561	27.98

BibTeX

@misc{liu2026continuoustimedistributionmatchingfewstep,
      title={Continuous-Time Distribution Matching for Few-Step Diffusion Distillation}, 
      author={Tao Liu and Hao Yan and Mengting Chen and Taihang Hu and Zhengrong Yue and Zihao Pan and Jinsong Lan and Xiaoyong Zhu and Ming-Ming Cheng and Bo Zheng and Yaxing Wang},
      year={2026},
      eprint={2605.06376},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.06376}, 
}

More Works from Our Lab

One-Prompt-One-Story

From Cradle to Cane

Continuous-Time Distribution Matchingfor Few-Step Diffusion Distillation