Short-time Fourier transform

21.4.5 Short-time Fourier transform

The stft command computes the short-time Fourier transform (STFT) of a real signal. Input data is split into chunks of the specified size with half-overlap, and FFT is computed on each chunk separately, producing a time-evolving spectrum.

stft takes one mandatory argument and one or two optional arguments:
- L, a list of n real numbers.
- Optionally, m, a positive integer which is a power of 2 (by default, m=128). This is the half-size of a chunk.
- Optionally, wf, a window function (by default, wf=hamming_window).
stft(L ⟨,m,wf ⟩) returns a matrix with N=n/m−1 rows and m columns, where n is first snapped to the smallest multiple of m greater than or equal to n (the list L is padded with zeros). The kth row of the matrix is the left half of the FFT spectrum (since the signal L is real, we only go up to the Nyquist frequency) of the kth chunk of length 2m starting at offset k m for k=0,1,…,N−1. The window function wf is applied to each chunk before computing FFT.

The istft command computes the inverse short-time Fourier transform by restoring the individual chunks and summing them up to restore the original signal.

istft takes S, a matrix of complex numbers with N rows and m columns.
istft returns L, a list of (m+1)N real numbers represeting the signal with STFT equal to S. Note that the restoration is generally not perfect: the restored signal will have a fade-in and fade-out of length m, resulting from applying the window function when perfoming STFT, and possibly trailing zeros at the end. Also, the window function for STFT must be chosen so that it sums to a constant on the half-overlap (e.g. Hamming, Hann, and Bartlett-Hann window, see Section 21.5).

The STFT is a tool for spectral manipulation of a signal. Individual chunk spectra can be modified (e.g. in order to remove the noise or do a sharp lowpass/bandpass/highpass filtering) after calling stft and before calling istft. Overlap summation in inverse STFT provides for seamless crossfading between the modified chunks.

Note that by choosing the window size m you also set the number of bins for frequency spectra to m. Thus smaller m means finer time resolution and coarser frequency resolution, while larger m means coarser time resolution and finer frequency resolution.

Example

With the file CantinaBand60.wav downloaded from here, enter:

music:=readwav("/home/luka/Downloads/CantinaBand60.wav")(0.0..15.0)

a sound clip with 330750 samples at 22050 Hz (16 bit, mono)

To hear the loaded audio, enter:

playsnd(music)

The clip contains 15 seconds of a band playing. To hear only the bass part and hi-hat drum, you can use stft to transform the original clip, apply a spectral band-reject filter between 180 and 6000 Hz, and obtain the filtered version with istft. In this example the filter decays over time, so it appears as if the rest of the band enter with a fade-in.

A window of half-size m=8192 is used in order to obtain a good frequency resolution. Enter:

data,N,lo,hi,sr:=channel_data(music),8192,180,6000,samplerate(music):;

Boundary bin indices for the band-reject filter are obtained by multiplying m resp. n with the quotient of lo resp. hi and the upper frequency bound of the spectrum (i.e. the Nyquist frequency sr/2):

m,n:=round(2*N*lo/sr,2*N*hi/sr)

134,4458

To compute the STFT:

spctrm:=stft(data,N):;

To attenuate the bins between m and n from zero to one over time, enter:

tmp:=tran(spctrm):; alpha:=linspace(0.0,1.0,length(spctrm)).^4:; band:=mid(tmp,m,n-m):; for k from 0 to n-m-1 do band[k]=<band[k].*alpha; od:;

spctrm2:=tran(concat(left(tmp,m),band,right(tmp,N-n+m))):;

The last commandline assembles the modified spectrum. Now compute the inverse STFT and create an audio clip from the obtained data:

fadein:=createwav(istft(spctrm2),samplerate=sr,normalize=-1)

a sound clip with 341366 samples at 22050 Hz (16 bit, mono)

Now by entering

playsnd(fadein)

you can clearly hear the bass part accompanied by the hi-hat rhythm, but no other instruments in the beginning. About halfway into the excerpt, other instruments fade in gradually, reaching the full intensity in the end.