synsinger posted: " I know I'd been saying that I'd use the IFFT for rendering unvoiced sounds, but I decided there was just too much overhead to it, so I decided to take another approach. I decided to try implementing a proper Mel Filterbank instead. Mel space is set so"
I know I'd been saying that I'd use the IFFT for rendering unvoiced sounds, but I decided there was just too much overhead to it, so I decided to take another approach.
I decided to try implementing a proper Mel Filterbank instead. Mel space is set so that the linear distance between two frequencies is perceptually equidistant.
It seems to work "well enough", and it's basically the approach I used to encode unvoiced audio in prior implementations. It still sounds a bit rough, but it'll do for now.
What follows is a whole bunch of code, probably not of interest to many people. Those who don't know about this sort of stuff will (rightfully) continue to not care, and those who do know about these things will be using far better implementations than what I'm presenting below.
Converting back and forth from Mel to Frequency looks like this:
-- Convert **frq** from frequency to non-linear mel space function frequencyToMel( frq ) return 1127.01048 * math.log(1+frq/700) end -- Convert a **mel** from mel space to frequency space function melToFrequency(mel) return 700 * ((math.exp(mel/1127.01048)-1)) end
A Mel Filterbank is a set of overlapping bands that are linear in Mel space, and non-linear in Frequency (Hz) space:
A Mel Filterbank of overlapping bands. Note that it's non-linear in frequency space. Not my artwork, but I'm not sure who to credit here.
So once a chunk of audio is analyzed via FFT, the amplitudes are placed into the Mel Filterbank. For aspirated sounds, the resynthesis is accomplished by passing white noise through the filterbank.
The filterbank is created by specifying the maximum frequency, and the number of bands:
-- create a filter bank local fb = filterBank( 5000, 16 )
At this point, I'm just performing copy synthesis. A buffer of audio is analyzed by the FFT to determine the real and imaginary elements. From this, the magnitude of the value in the bin can be calculated. This is then stored in the Mel Filterbank. Once that's done, I can pass in white noise and write that to a buffer:
-- perform analysis local re, im = fft( buffer ) -- clear the filterBank values fb:clear() -- calculate the frequencies and magnitudes for i = 1, fftSize/2 do local frq = binFrequency( i, fftSize ) local mag = sqrt( re[i]*re[i] + im[i]*im[i] ) fb:add( frq, mag ) end -- generate output for i = startSample, endSample do -- send white noise into the filter bank and place in the **outBuffer** outBuffer[i] = fb:tick( whiteNoise() ) end
For completeness, here's the binFrequency routine, which calculates the frequency associated with a given FFT bin:
-- Return the frequency of a bin local function binFrequency( binNumber, fftSize ) -- frequency per bin is SAMPLE_RATE / FFT_SIZE return (binNumber-1) * SAMPLE_RATE/ fftSize end -- Return the rounded bin number a frequency belongs to local function frequencyToBin( frequency, fftSize ) -- frequency per bin is SAMPLE_RATE / FFT_SIZE return floor((fftSize+1)*frequency/SAMPLE_RATE) end
The filter bank uses band pass filters, the same as Dennis Klatt used in MITalk:
---------------------- -- BAND PASS FILTER -- ---------------------- -- Calculate the response parameters for a simple bandpass filter -- given the **frequency** and **bandwidth** local function resonator_set( self, frequency, bandwidth ) -- calculate resonator response local r = exp(MINUS_PI_T * bandwidth) self.c = -(r*r) self.b = r * 2 * cos(TWO_PI_T * frequency) self.a = 1 - self.b - self.c end -- pass **sampleIn** into the resonator, getting **out** as the result local function resonator_tick( self, sampleIn ) -- oral resonator 1 local out = self.a*sampleIn + self.b*self.z1 + self.c*self.z2 self.z2 = self.z1 self.z1 = out return out end -- clear the resonator history local function resonator_clear( self ) self.z1, self.z2 = 0, 0 end -- create a new resonator with the methods **tick**, **set** and **clear** function resonator( frequency, bandwidth ) -- set the initial values local self = { z1=0, z2=0, a=0, b=0, c=0, tick=resonator_tick, set=resonator_set, clear=resonator_clear } -- set the resonator self:set( frequency, bandwidth ) return self end
Here's the complete Mel Filterbank class. No guarantees that it's not riddled with bugs.
The add method works in Mel space to determine which filters (if any) the frequency falls in, and adds the weight to the filter's sum.
The tick method runs a sample through all the filters in parallel, outputting the net result. Note that I "cheat" the bandwidth a bit by dividing it by the somewhat arbitrary value of 3.5 to give the output sound a bit more focus.
-- Add **amp** to the corresponding banks in **self**. local function filterBank_add( self, frq, amp ) -- reject frequencies greater than maximum frequency if frq > self.maxFrq then return end -- convert to mels local mels = frequencyToMel( frq ) -- get the half-wdith of the bank local halfWide = self.halfWide -- find the mel band **mels** that lies on the left side of the frequency local index = math.floor( mels / halfWide ) local left = index * halfWide -- get the weight - how close the frequency is to the center of the band local t = (mels-left) / halfWide -- add to value self[index] = (self[index] or 0) + (t * amp) -- add to the overlapping band on the left? if index > 0 then -- overlaps with band to the left, put the remaining weight into it self[index-1] = (self[index-1] or 0) + ( (1-t) * amp) end end -- run the sample through all the filterbank resonators, returns result local function filterBank_tick( self, sample ) -- run the sample through all the samplebank resonators local res = self.res -- holds sum local out = 0 -- iterate through all the resonators for i, r in ipairs( res ) do -- run the value through the resonator, scale and add to the sum out = out + (r:tick( sample ) * self[i]) end -- return the sum return out end local function filterBank_clear( self ) for i = 1, self.count do self[i] = 0 end end -- Return a table with empty values, and **halfWide** representing -- half-width of a single filter, in mel space function filterBank( maxFrq, bankCount ) -- the banks local self = {} -- convert frequency to mels local mels = frequencyToMel( maxFrq ) local halfWide = mels / bankCount -- set values self.halfWide = halfWide self.count = bankCount self.maxFrq = maxFrq -- zero the bank values for i = 0, bankCount-1 do self[i] = 0 end -- create the resonators local res = {} self.res = res local priorMid = 0 for i = 0, bankCount-1 do -- center frequency for the band local nextMid = priorMid + halfWide local frq = melToFrequency( nextMid ) -- bandwidth extends full width local bw = melToFrequency( nextMid+halfWide ) - melToFrequency( priorMid ) -- create a resonator -- Note the arbitrary shrinking of the bandwidth to focus it a bit more res[i] = resonator( frq, bw / 3.5 ) -- change history priorMid = nextMid end -- add methods self.add = filterBank_add self.tick = filterBank_tick self.clear = filterBank_clear return self end
That's about all there is to this. All code is subject to change, and I may decided tomorrow to take an entirely different approach.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.