Google Veo 3 and Veo 3.1: AI video with audio and lip sync

Blog post | Google Veo 3 and Veo 3.1 explained: AI video with audio, lip sync, and image-to-video

Highlights

Audio and lip sync are native in Veo 3: create speech, ambience, and precise mouth movements in a single render.
Veo 2 shines at editing: inpainting, outpainting, and expansion to remove objects and extend frames.
Veo 3.1 increases control: better prompt adherence, improved audiovisual quality, and image reference for character consistency.

Imagine describing a scene to a friend — the colors, the mood, the little details — and in just moments, watching that scene come alive as a video. That’s what Google’s Veo models bring to the table. They transform imagination into moving pictures, giving creators, marketers, educators, and dreamers a new way to tell stories. With Veo, you don’t need to be a filmmaker — you just need a thought.

What can Google Veo create from a single prompt?

Veo 3 generates cinematic video directly from text prompts, with native audio if needed. No cameras, no crew, no edits — just words becoming visuals.

Description: It’s the night after a summer rain in the city. Neon reflections ripple in puddles. A young girl, Mira, walks along a quiet street, holding a mobile phone— searching for her pet holographic butterfly. She pauses beneath a lantern, listening.

This entire moment can begin with a single description.

0:00

/0:08

How does image-to-video work in Veo?

Now we zoom in on the heart of the story: the holographic butterfly itself. Suppose you’ve snapped or sketched a still image of the glowing butterfly resting on a streetlamp. Veo can bring that still to life, turning the image into a moving video so the story keeps flowing.

Sketch image (generated with Imagen 4):

Generated video with Veo 2:

0:00

/0:05

Adding music to dreams with Veo 3

Veo 3 supports native audio generation, including ambiance and dialogue.

Stories feel fuller when they carry a soundscape. With Veo 3, you can weave music and ambience into the scene as the video is created. Keep the visuals, add emotion.

We’ll reuse the same moment from the first video — Mira searching for her holographic butterfly beneath the lantern — and let the city hum with subtle background music.

To keep the character consistent, we can provide Mira’s image as an input, ensuring she looks the same across every generated clip.

Input Image :

Video generated:

0:00

/0:08

Voices, songs, and lip-sync

Now we return to Mira. She cups her hands and calls out: “Sarah!” Moments later, she spots her holographic pet and smiles: “There you are.”

With Veo 3, voices can be generated with natural timing — and the lips match the words. This lets us script dialogue with precision, placing different lines at specific moments within a single generated video. Each scene can carry its own tone, emotion, and delivery, exactly where we want it.

0:00

/0:08

Prompt (with duration): “Night city street after the rain, neon reflections shimmering. A young girl, Mira, cups her hands and calls out, ‘Sarah!’ (4sec). Moments later, she spots her holographic pet and smiles warmly, saying, ‘There you are’ (6secs). Natural lip-sync with clear voice timing.”

The editing powers of Veo 2

Veo 2 doesn’t stop at generating first drafts. It gives creators the power to refine, reshape, and reimagine videos with intuitive editing abilities:

Erase unwanted elements

Example: Imagine a city scene where a random passerby walks through the frame. With Veo 2, you can erase them seamlessly — leaving only the clean street and your main characters.

Fill missing spaces

Example: A drone shot cuts off part of the skyline. Veo 2 can intelligently fill the missing skyline so the horizon looks natural and complete.

Expand a scene

Example: You filmed a café interior, but now you want to see what’s outside the window. Veo 2 can expand the scene beyond the original frame, revealing a lively street outside.

Copy character styles

Example: You love the outfit and hairstyle of a character in one clip. With a few instructions, you can apply the same look to the same character in a different video, keeping consistency.

Transfer styles

Example: A dramatic night scene can be reimagined as a cozy morning scene, or an action shot can be transformed into a dreamy watercolor style.

This means creators aren’t limited to their first draft — every video can be fine-tuned until it feels just right.

Other use cases where Veo can be applied include:

Advertising : For example, we created an ad for a perfume called Essence.

0:00

/0:08

0:00

/0:08

Social media content creation : Eye-catching vertical videos (9:16) optimized for platforms like Instagram Reels or TikTok.

0:00

/0:08

Veo 2 vs. Veo 3: What changed?

Feature	Veo 2	Veo 3
Create videos from text	Yes	Yes
Animate from images	Yes	Yes
Add music	No	Yes
Voice & lip-sync	No	Yes
Editing tools	Erase, fill, expand, styles	Focused on audio generation

What is new in Veo 3.1?

With the release of Veo 3.1, the creative canvas just got even bigger. Thanks to the new image reference feature, you can now bring your characters and objects to life — letting them play an active role in your videos. Veo 3.1 improves prompt obedience and audiovisual quality, and adds an image reference flow that helps keep characters consistent shot to shot.

‎

Need world-class engineering to bring your AI vision to life?
With 100+ engineers and years of product experience, KeyValue helps you build scalable, production-grade AI systems. Let's talk!

Conclusion: The future is moving

Veo 2 and Veo 3 aren’t just tools — they’re doorways into new ways of storytelling. Whether you’re painting memories, crafting lessons, building ads, or creating art, Veo empowers you to go from thought to video, from image to motion, and from silence to music. It’s a reminder that imagination doesn’t just live in our heads anymore — with Veo, it lives on screen.

Frequently asked questions

1. Is Veo 3 available in India?

Yes. Google has rolled Veo 3 out to Gemini app Pro (and in some promos to other tiers) in India, so Pro subscribers in India can access Veo 3 through the Gemini app.

2. What is Google Veo 3?

Veo 3 is Google DeepMind’s text-and-image-to-video model that generates short cinematic clips with native audio, synchronized sound effects, and lip-synced dialogue for scripted lines. It surfaces via the Gemini app, Flow, and Vertex AI.

3. Can Veo animate a still image into a video?

Yes. Both Veo 2 and Veo 3 support image-to-video, letting you bring a photograph or sketch to life with camera moves and motion cues.

Yes. Veo 3 supports 9:16 vertical output and 1080p support, making it suitable for Reels, TikTok, and Shorts workflows.

5. How is Veo 3 different from Veo 2?

Veo 2 focuses on editing powers like inpainting and outpainting. Veo 3 adds native audio, ambiance, and precise lip sync so you can time dialogue inside a single render.

Highlights

What can Google Veo create from a single prompt?

How does image-to-video work in Veo?

Adding music to dreams with Veo 3

Voices, songs, and lip-sync

The editing powers of Veo 2

Veo 2 vs. Veo 3: What changed?

What is new in Veo 3.1?

Conclusion: The future is moving

Frequently asked questions

1. Is Veo 3 available in India?

2. What is Google Veo 3?

3. Can Veo animate a still image into a video?

4. Does Veo support vertical video for social?

5. How is Veo 3 different from Veo 2?

Let’s stay connected.Subscribe to our newsletter

Let’s stay connected.
Subscribe to our newsletter