I know I'd been saying that I'd use the IFFT for rendering unvoiced sounds, but I decided there was just too much overhead to it, so I decided to take another approach.

I decided to try implementing a proper Mel Filterbank instead. Mel space is set so that the linear distance between two frequencies is perceptually equidistant.

It seems to work "well enough", and it's basically the approach I used to encode unvoiced audio in prior implementations. It still sounds a bit rough, but it'll do for now.

What follows is a whole bunch of code, probably not of interest to many people. Those who don't know about this sort of stuff will (rightfully) continue to not care, and those who do know about these things will be using far better implementations than what I'm presenting below.

Converting back and forth from Mel to Frequency looks like this:

-- Convert **frq** from frequency to non-linear mel space function frequencyToMel( frq )   return 1127.01048 * math.log(1+frq/700) end   -- Convert a **mel** from mel space to frequency space function melToFrequency(mel)   return 700 * ((math.exp(mel/1127.01048)-1)) end

A Mel Filterbank is a set of overlapping bands that are linear in Mel space, and non-linear in Frequency (Hz) space:

A Mel Filterbank of overlapping bands. Note that it's non-linear in frequency space. Not my artwork, but I'm not sure who to credit here.

So once a chunk of audio is analyzed via FFT, the amplitudes are placed into the Mel Filterbank. For aspirated sounds, the resynthesis is accomplished by passing white noise through the filterbank.

The filterbank is created by specifying the maximum frequency, and the number of bands:

-- create a filter bank local fb = filterBank( 5000, 16 )

At this point, I'm just performing copy synthesis. A buffer of audio is analyzed by the FFT to determine the real and imaginary elements. From this, the magnitude of the value in the bin can be calculated. This is then stored in the Mel Filterbank. Once that's done, I can pass in white noise and write that to a buffer:

  -- perform analysis   local re, im = fft( buffer )      -- clear the filterBank values   fb:clear()      -- calculate the frequencies and magnitudes   for i = 1, fftSize/2 do     local frq = binFrequency( i, fftSize )     local mag = sqrt( re[i]*re[i] + im[i]*im[i] )     fb:add( frq, mag )   end      -- generate output   for i = startSample, endSample do     -- send white noise into the filter bank and place in the **outBuffer**     outBuffer[i] = fb:tick( whiteNoise() )   end

For completeness, here's the binFrequency routine, which calculates the frequency associated with a given FFT bin:

-- Return the frequency of a bin local function binFrequency( binNumber, fftSize )   -- frequency per bin is SAMPLE_RATE / FFT_SIZE   return (binNumber-1) * SAMPLE_RATE/ fftSize end   -- Return the rounded bin number a frequency belongs to local function frequencyToBin( frequency, fftSize )   -- frequency per bin is SAMPLE_RATE / FFT_SIZE     return floor((fftSize+1)*frequency/SAMPLE_RATE) end

The filter bank uses band pass filters, the same as Dennis Klatt used in MITalk:

---------------------- -- BAND PASS FILTER -- ----------------------  -- Calculate the response parameters for a simple bandpass filter -- given the **frequency** and **bandwidth** local function resonator_set( self, frequency, bandwidth )    -- calculate resonator response   local r = exp(MINUS_PI_T * bandwidth)   self.c = -(r*r)   self.b = r * 2 * cos(TWO_PI_T * frequency)   self.a = 1 - self.b - self.c  end  -- pass **sampleIn** into the resonator, getting **out** as the result local function resonator_tick( self, sampleIn )   -- oral resonator 1   local out = self.a*sampleIn + self.b*self.z1 + self.c*self.z2   self.z2 = self.z1   self.z1 = out   return out end  -- clear the resonator history local function resonator_clear( self )   self.z1, self.z2 = 0, 0 end  -- create a new resonator with the methods **tick**, **set** and **clear** function resonator( frequency, bandwidth )   -- set the initial values   local self = { z1=0, z2=0, a=0, b=0, c=0, tick=resonator_tick, set=resonator_set, clear=resonator_clear }   -- set the resonator   self:set( frequency, bandwidth )   return self end

Here's the complete Mel Filterbank class. No guarantees that it's not riddled with bugs.

The add method works in Mel space to determine which filters (if any) the frequency falls in, and adds the weight to the filter's sum.

The tick method runs a sample through all the filters in parallel, outputting the net result. Note that I "cheat" the bandwidth a bit by dividing it by the somewhat arbitrary value of 3.5 to give the output sound a bit more focus.

-- Add **amp** to the corresponding banks in **self**. local function filterBank_add( self, frq, amp )    -- reject frequencies greater than maximum frequency   if frq > self.maxFrq then     return   end    -- convert to mels   local mels = frequencyToMel( frq )    -- get the half-wdith of the bank   local halfWide = self.halfWide    -- find the mel band **mels** that lies on the left side of the frequency   local index = math.floor( mels / halfWide )   local left = index * halfWide      -- get the weight - how close the frequency is to the center of the band   local t = (mels-left) / halfWide      -- add to value   self[index] = (self[index] or 0) + (t * amp)    -- add to the overlapping band on the left?   if index > 0 then     -- overlaps with band to the left, put the remaining weight into it     self[index-1] = (self[index-1] or 0) + ( (1-t) * amp)   end  end   -- run the sample through all the filterbank resonators, returns result local function filterBank_tick( self, sample )      -- run the sample through all the samplebank resonators   local res = self.res       -- holds sum   local out = 0      -- iterate through all the resonators   for i, r in ipairs( res ) do     -- run the value through the resonator, scale and add to the sum     out = out + (r:tick( sample ) * self[i])   end      -- return the sum   return out    end   local function filterBank_clear( self )   for i = 1, self.count do     self[i] = 0   end end   -- Return a table with empty values, and **halfWide** representing  -- half-width of a single filter, in mel space function filterBank( maxFrq, bankCount )    -- the banks   local self = {}    -- convert frequency to mels   local mels = frequencyToMel( maxFrq )   local halfWide = mels / bankCount      -- set values   self.halfWide = halfWide   self.count = bankCount   self.maxFrq = maxFrq    -- zero the bank values   for i = 0, bankCount-1 do     self[i] = 0   end     -- create the resonators   local res = {}   self.res = res   local priorMid = 0   for i = 0, bankCount-1 do     -- center frequency for the band     local nextMid = priorMid + halfWide     local frq = melToFrequency( nextMid )          -- bandwidth extends full width     local bw = melToFrequency( nextMid+halfWide ) - melToFrequency( priorMid )          -- create a resonator     -- Note the arbitrary shrinking of the bandwidth to focus it a bit more     res[i] = resonator( frq, bw / 3.5 )          -- change history     priorMid = nextMid   end      -- add methods   self.add = filterBank_add   self.tick = filterBank_tick   self.clear = filterBank_clear    return self    end

That's about all there is to this. All code is subject to change, and I may decided tomorrow to take an entirely different approach.


This free site is ad-supported. Learn more