On this page |
The Voice Sync CHOP detects phonemes in an audio channel given some audio phoneme samples and produces lip control channels.
In order to set up a lip-synching system based on phoneme detection, you need to model different mouth shapes for each viseme (visually similar phonemes) and create a "phoneme library" of sample phonemes.
Once these
steps are complete, the Voice Sync CHOP matches the phoneme library to
the audio voice track in input 1. The Voice Sync CHOP outputs a channel
for each phoneme (or viseme). Each channel contains the occurrences of
the phoneme in the voice track (as 1
when detected, 0
otherwise)
over its full length.
Parameters
Voice Sync
Quality
Allows you to trade off between performance and accuracy.
Silence Level
Voice Detection will only be performed on the input audio if its average volume for a given segment is higher than this value. It should be adjusted so that background noise is not interpreted as voice.
Silence Chan Name
Creates a channel which acts as a 'silence phoneme'.
Vocal Range (Hz)
The approximate vocal range of the voice actor. The defaults are for an average male speaker. This parameter is only used by Voiced Phonemes.
Max Pitch Shift
The maximum number of octaves that a phoneme in the voice track can differ in pitch from the sample phoneme.
Noise Filter (Hz)
Controls the size of the the noise filter to smooth out non-voiced phonemes.
Setup Phoneme Library
This button build the phoneme library from the second and third inputs. It is only used in time slice mode. Before starting the realtime synching, initialize the phoneme library by clicking this button.
Output
Convert Scores into On/Off States
Produces binary on/off state channels for each viseme into On/Off States rather than outputing the raw scores.
Minimum Score
The viseme with the highest score is chosen to be the match. However, if its score is not above the minimum score, then no visemes will be chosen. This parameter helps to eliminate poor matches.
Voiced Bias
Voiced and Non-Voiced phonemes are detected using the different methods. If you find that either the Voiced or Non-Voiced phonemes are dominating the output, you can shift the bias to balance them. Zero bias doesn’t favor either method, +1 completely favors Voiced phonemes and -1 completely favors Non-Voiced phonemes. Normally values should remain close to zero.
Sample Rate
The sample rate of the output channels. This allow partially determines how many matches are done on the input audio.
Super Sample
How many comparisons are done per output sample. A higher super sample will increase the matching accuracy, but will affect performance greatly.
Common
Some of these parameters may not be available on all CHOP nodes.
Scope
To determine which channels get affected, some CHOPs have a scope string. Patterns can be used in the scope, for example *
(match all), and ?
(match single character).
The following are examples of possible channel name matching options:
chan2
Matches a single channel name.
chan3 tx ty tz
Matches four channel names, separated by spaces.
chan*
Matches each channel that starts with chan
.
*foot*
Matches each channel that has foot
in it.
t?
The ?
matches a single character. t?
matches two-character channels starting with t.
r[xyz]
Matches channels rx
, ry
and rz
.
blend[3-7:2]
Matches number ranges giving blend3
, blend5
, and blend7
.
blend[2-3,5,13]
Matches channels blend2
, blend3
, blend5
, blend13
.
t[xyz]
[xyz]
matches three characters, giving channels tx
, ty
and tz
.
Sample Rate Match
The Sample Rate Match Options handle cases where multiple input CHOPs’ sample rates are different.
Resample At First Input’s Rate
Use rate of first input to resample others.
Resample At Maximum Rate
Resample to highest sample rate.
Resample At Minimum Rate
Resample to the lowest sample rate.
Error if Rates Differ
Does not accept conflicting sample rates.
Units
The units for which time parameters are specified.
For example, you can specify the amount of time a lag should last for in seconds (default), frames (at the Houdini FPS), or samples (in the CHOP’s sample rate).
Note
When you change the Units parameter, it does not convert the existing parameters to the new units.
Time Slice
Time Slicing is a feature which boosts cooking performance and reduces memory usage. Traditionally, CHOPs calculate the channel over its entire frame range. If the channel does need to be evaluated every frame, then cooking the entire range of the channel is unnecessary. It is more efficient to calculate only the fraction of the channel that is needed. This fraction is known as a Time Slice.
Unload
Causes the memory consumed by a CHOP to be released after it is cooked and the data passed to the next CHOP.
Export Prefix
The Export prefix is prepended to CHOP channel names to determine where to export to.
For example, if the CHOP channel was named geo1:tx
, and the prefix was /obj
, the channel would be exported to /obj/geo1/tx
.
Note
You can leave the Export Prefix blank, but then your CHOP track names need to be absolute paths, such as obj:geo1:tx
.
Graph Color
Every CHOP has this option. Each CHOP gets a default color assigned for display in the Graph port, but you can override the color in the Common page under Graph Color. There are 36 RGB color combinations in the Palette.
Graph Color Step
When the graph displays the animation curves and a CHOP has two or more channels, this defines the difference in color from one channel to the next, giving a rainbow spectrum of colors.
See also |