I hooked up all the useful parameters to sliders, and played around with them. I didn't find anything that made the vocal sound great, but I was able to find some that made the output really bad.

One of the other problems with the voice is that it currently is at an absolutely flat pitch, which makes it sound really robotic. I'll fix that eventually, but for now it helps me focus on hearing that's wrong with the voice.

The main problem is that it's still fuzzy and buzzy - a bit too "out of focus".

Anyway, here's the list of parameters I've been playing with:

FFT Resolution

This is a balancing act. Too low (256 samples) and it's too rough. Too high (4096) and it's losing too much detail because what it gains in spectral fidelity, it loses in temporal fidelity.

Frequency Range

This is how what range of frequency data that's being captured. Too low (3000) and it's boxy. Too high and it's capturing a lot of information that's not really audible. At least, not to me with my mild hearing loss.

Mel Bands

This is the number of bands the frequency range that's being captured. Less than 20 is too low resolution. Too many bands and the audio gets "ringy".

Bandwidth

Each Mel Band has single bandpass filter the glottal pulse goes through. Too small, and the output is shrill. Too wide, and the audio is muffled. I'm still working on figuring this value out.

Percent Pulse

For the glottal pulse, there are two basic parameters: what percent of the pulse is "the pulse" and not flat. This corresponds to the "tension" of the voice.

Up/Down Percent

Simple enough - what percent of the pulse duration is spent going up vs. going down. This doesn't seem to have a large impact on the voice.

Aspiration

This is how "breathy" the voice is. It's more of a "mix to taste" sort of thing.

I suspect that one of the main problems is probably a mismatch between the frequency content of the synthetic glottal pulse and the spectral pulse.

That is, running the glottal pulse through the bandpass filter ends up with a copy that has less power at the high frequencies than the original signal.

I've tried working various ways to compensate for this, with various degrees of success.

In any case, I'm not quite ready to move on to the next step until I've run out of ideas for making the voice better. I keep hoping the I'll find a useful hint in that next research paper I read.


This free site is ad-supported. Learn more