How do animators sync up their characters talking with audio?

1.12K views

How do they make their character’s mouths move at just the right speed? Is there a lot of overshooting/undershooting and then having to go back and fix the frames? Or is there some general rule of thumb (x amount of frames to each second of audio)?

In: Technology

6 Answers

Anonymous 0 Comments

Back in the analog days they would write a chart of what sounds were being made each 1/24th (or 1/12th) of a second on the soundtrack.

Occasionally they would also film the voice actors while recording, and that can be used as a reference.

Nowadays computers make it easier. When I animate dialogue in TVPaint, I can import the audio file into the timeline, and when I scrub frame-by-frame, the program plays just that frame’s snippet of audio and I can usually tell what sound is being made.

Anonymous 0 Comments

Well, most video (still, I think) runs 24 fps. It’s fast enough to not catch the eye, bit is still a workable speed. It may have changed since everything is digital now, it may not have.

Nowadays at least some animation is done with skins over live characters (this is how they did Golum in LOTR) where there is a computer program that tracks balls set on joints and the character is acted out by a real life human. ([or dog](https://images.app.goo.gl/eWNPkgy8DwcPvQEL7))

Anonymous 0 Comments

Usually they would record the audio in advance then synchronize the animation to that voice over. With computers these days they can produce some reference positions for the mouth and lip movements for given language sounds, like a “p” sound is going to require closing the lips, etc.

The computer then runs a program that recognizes those language sounds (perhaps also referencing a script) and can animate the movement of the face between those points automatically. On big budget productions they would then go in and tweak those animations to perfection.

Anonymous 0 Comments

I don’t know how it is done now, with computers. Before then, they drew them 1 frame at a time. That is why a 2 hour movie would take a good 2-3 years to do. I am sure it has been reduced to a science by now.

Anonymous 0 Comments

actual animator here:

experience and timing.

if you do as others in this thread say and just plug in whatever lip shape happens for whatever sound you get terribly ugly uncanny valley lip sync. It looks terrible and fucking creepy, it looks unprofessional and slapped-on. What mouth shape you think there’d be is not actually what would be there. The shape you’d use to make one sound in one instance, is not the same elsewhere. there’s like 4 different ways to do an “eeeeee” noise, that changes in context to the dialogue it’s in. An animator’s talent lies in their ability to see how people actually move, not how they think they move, but also vice versa, so by compounding this experience and observation we animate characters- including dialogue- to look the best it can not just realistically but have it “feel” good as well.

but the vast majority is just a large amount of eyeballing it with expertise and experience.

Anonymous 0 Comments

There are several different ways to do that like the other posters already mentioned.

A specific example would be the use of a software that helps you align phonemes (essentially descriptions of mouth shapes) onto a timeline.
For example the open source software Papagayo-NG (Disclaimer: slight self-promotion on my part) is one such software:

https://github.com/morevnaproject/papagayo-ng

You load your sound file and type in your text, then you let it break that down into it’s smaller building blocks and you can arrange those on a timeline.

The final result can then be exported to a lot of different formats which you can then use in your animation software.