Previous Up Next

21.4.5  Short-time Fourier transform

The stft command computes the short-time Fourier transform (STFT) of a real signal. Input data is split into chunks of the specified size with half-overlap, and FFT is computed on each chunk separately, producing a time-evolving spectrum.

The istft command computes the inverse short-time Fourier transform by restoring the individual chunks and summing them up to restore the original signal.

The STFT is a tool for spectral manipulation of a signal. Individual chunk spectra can be modified (e.g. in order to remove the noise or do a sharp lowpass/bandpass/highpass filtering) after calling stft and before calling istft. Overlap summation in inverse STFT provides for seamless crossfading between the modified chunks.

Note that by choosing the window size m you also set the number of bins for frequency spectra to m. Thus smaller m means finer time resolution and coarser frequency resolution, while larger m means coarser time resolution and finer frequency resolution.

Example

With the file CantinaBand60.wav downloaded from here, enter:

music:=readwav("/home/luka/Downloads/CantinaBand60.wav")(0.0..15.0)
     
a sound clip with 330750 samples at 22050 Hz (16 bit, mono)           

To hear the loaded audio, enter:

playsnd(music)

The clip contains 15 seconds of a band playing. To hear only the bass part and hi-hat drum, you can use stft to transform the original clip, apply a spectral band-reject filter between 180 and 6000 Hz, and obtain the filtered version with istft. In this example the filter decays over time, so it appears as if the rest of the band enter with a fade-in.

A window of half-size m=8192 is used in order to obtain a good frequency resolution. Enter:

data,N,lo,hi,sr:=channel_data(music),8192,180,6000,samplerate(music):;

Boundary bin indices for the band-reject filter are obtained by multiplying m resp. n with the quotient of lo resp. hi and the upper frequency bound of the spectrum (i.e. the Nyquist frequency sr/2):

m,n:=round(2*N*lo/sr,2*N*hi/sr)
     
134,4458           

To compute the STFT:

spctrm:=stft(data,N):;

To attenuate the bins between m and n from zero to one over time, enter:

tmp:=tran(spctrm):; alpha:=linspace(0.0,1.0,length(spctrm)).^4:; band:=mid(tmp,m,n-m):; for k from 0 to n-m-1 do band[k]=<band[k].*alpha; od:;
spctrm2:=tran(concat(left(tmp,m),band,right(tmp,N-n+m))):;

The last commandline assembles the modified spectrum. Now compute the inverse STFT and create an audio clip from the obtained data:

fadein:=createwav(istft(spctrm2),samplerate=sr,normalize=-1)
     
a sound clip with 341366 samples at 22050 Hz (16 bit, mono)           

Now by entering

playsnd(fadein)

you can clearly hear the bass part accompanied by the hi-hat rhythm, but no other instruments in the beginning. About halfway into the excerpt, other instruments fade in gradually, reaching the full intensity in the end.


Previous Up Next