Diffusion LLMs: Redefining AI Text Generation

Blog post | Diffusion LLMs: Text generation through a new lens

In the realm of generative AI, diffusion models have emerged as a game-changing approach in large language models (LLMs) — first transforming image generation (think Stable Diffusion, DALL·E 2) and now making waves in natural language processing (NLP). But what happens when we take the diffusion framework — known for gradually turning noise into a meaningful signal — and apply it to AI text generation?

Welcome to the world of Diffusion LLMs: a fresh direction in language modeling that challenges the autoregressive stronghold of models like GPT. If GPTs are the typists of AI — fast, linear, efficient — then diffusion LLMs are the writers who sketch, rewrite, and refine. The result is not just faster or more fluent language, but a whole new way of thinking about AI generation.

Across the world, researchers are rapidly advancing the frontiers of diffusion, and we are keeping pace — not only by following the research closely, but by applying it in our own work. This blog is designed to be a foundation: a simple, clear, and comprehensive base for anyone who wishes to understand diffusion in AI, and a starting point for the series where we’ll share our own explorations and contributions.

From Noise to Narrative: How Diffusion LLMs Work

Unlike traditional large language models (LLMs) that generate one token at a time, diffusion-based language models work in reverse. They start with random noise and gradually sculpt it — step by step — into a coherent sentence, paragraph, or story. This reverse journey is made possible by training the model to denoise text embeddings over many steps.

Since raw text is inherently discrete (unlike pixels), most diffusion-based AI models for NLP take a detour: they operate in continuous embedding spaces or latent spaces, enabling them to learn smooth transformations and maintain fluency, structure, and creativity throughout the generation process.

This shift allows the model to “see the big picture” from the start. Rather than being locked into one-directional thinking, diffusion LLMs are free to explore possibilities, revise context mid-stream, and craft globally coherent outputs in parallel.

It’s not just writing. It's an iterative imagination — one denoising step at a time.

Not Just Models — A Movement in AI

In just over a year, diffusion LLMs have gone from experiments to powerful creative engines. Early models like DiffuGPT proved you could bolt diffusion onto existing architectures. Then came DiffuLlama and LLaDA, showing that even open-source LLMs like LLaMA could be reimagined through diffusion. Finally, DreamLLM-7B stepped in with a diffusion-native mindset — trained from scratch to think like a denoiser, not a predictor.

What makes diffusion language models so compelling isn’t just the models — it’s the philosophy. Instead of rigidly marching left-to-right, they dream, revise, and refine — closer to how humans write, think, and imagine.

This paradigm shift isn’t about replacing autoregressive AI models. It’s about expanding the toolkit. It invites developers, researchers, and creators to rethink what’s possible in AI generation: from storytelling and scientific writing to code synthesis and structured summarization.

And the momentum is growing.

Fast, Flexible, and Diffused: The New Wave of AI Generators

Leading this new wave are Mercury, MMaDA, and Apple’s DiffuCoder — each offering unique strengths built around speed, flexibility, and creativity in AI text generation.

Mercury (by Inception Labs): A performance-focused diffusion model that generates text and code at lightning speed (over 1,000 tokens per second). Its coarse-to-fine method sketches, then refines, making it up to 10x faster than popular LLMs today. Especially powerful in AI coding tasks.
MMaDA: A multimodal diffusion LLM that blends text, image, and reasoning into one model. By treating vision and language equally, it achieves performance comparable to top multimodal AI systems, all with a diffusion backbone.
DiffuCoder (Apple): A diffusion-powered AI code generator that doesn’t just write in a straight line. It edits, revises, and restructures code like a real programmer. This flexibility makes it a powerful tool for developers.

Together, these models show that diffusion isn’t just an experimental AI idea — it’s becoming a real, usable, and creative tool for the next era of large language models.

Not the Future — the Upgrade

We’re not just witnessing an evolution of language models. We’re witnessing a creative leap. Diffusion LLMs aren’t trying to beat GPT at its own game. They’re playing a different one.

They think in drafts.

They sculpt ideas from scratch.

They turn noise into nuance.

Diffusion LLMs don’t just predict the next word — they imagine whole sentences, structures, and meaning. That’s a paradigm shift in AI text generation.

In less than two years, we’ve gone from prototypes like DiffuGPT, to bold reinventions like DreamLLM-7B, to polished engines like Mercury that don’t just work in theory — they’re shipping real output right now.

As diffusion-based language models gain momentum, we are actively exploring this space by building real-world inference pipelines, testing fine-tuning strategies, and analyzing how alignment techniques can optimize these models for practical AI applications. And now that you have a clear picture of what diffusion models mean for AI and LLMs, stay tuned for our next blog, where we’ll dive into how we put these ideas into practice.

From Noise to Narrative: How Diffusion LLMs Work

Not Just Models — A Movement in AI

Fast, Flexible, and Diffused: The New Wave of AI Generators

Not the Future — the Upgrade

Let’s stay connected.Subscribe to our newsletter

Let’s stay connected.
Subscribe to our newsletter