21.4.5 Short-time Fourier transform
The stft command computes the short-time
Fourier transform (STFT) of a real signal. Input data is split into chunks of the
specified size with half-overlap, and FFT is computed on each chunk separately,
producing a time-evolving spectrum.
-
stft takes one mandatory argument and one or two optional arguments:
-
L, a list of n real numbers.
- Optionally, m, a positive integer which is a power of 2 (by default, m=128).
This is the half-size of a chunk.
- Optionally, wf, a window function
(by default, wf=hamming_window).
- stft(L ⟨,m,wf ⟩) returns a matrix
with N=n/m−1 rows and m columns, where n is first snapped to the smallest multiple of m
greater than or equal to n (the list L is padded with zeros). The kth row of
the matrix is the left half of the FFT spectrum (since the signal L is real, we only
go up to the Nyquist frequency) of the kth chunk of length 2m starting at
offset k m for k=0,1,…,N−1. The window function wf is applied
to each chunk before computing FFT.
The istft command computes the inverse
short-time Fourier transform by restoring the individual chunks and summing them
up to restore the original signal.
-
istft takes S, a matrix of complex numbers with N rows and m columns.
- istft returns L, a list of (m+1)N real numbers represeting the
signal with STFT equal to S. Note that the restoration is generally not perfect:
the restored signal will have a fade-in and fade-out of length m, resulting from
applying the window function when perfoming STFT, and possibly trailing zeros at the end.
Also, the window function for STFT must be chosen so that it sums to a constant on the
half-overlap (e.g. Hamming, Hann, and Bartlett-Hann window, see Section 21.5).
The STFT is a tool for spectral manipulation of a signal. Individual chunk spectra
can be modified (e.g. in order to remove the noise or do a sharp lowpass/bandpass/highpass filtering)
after calling stft and before calling istft. Overlap summation in inverse
STFT provides for seamless crossfading between the modified chunks.
Note that by choosing the window size m you also set the number of bins for frequency
spectra to m. Thus smaller m means finer time resolution and coarser frequency resolution,
while larger m means coarser time resolution and finer frequency resolution.
Example
With the file CantinaBand60.wav downloaded from
here, enter:
music:=readwav("/home/luka/Downloads/CantinaBand60.wav")(0.0..15.0) |
|
a sound clip with 330750 samples at 22050 Hz (16 bit, mono)
| | | | | | | | | | |
|
To hear the loaded audio, enter:
The clip contains 15 seconds of a band playing.
To hear only the bass part and hi-hat drum, you can use stft to transform
the original clip, apply a spectral band-reject filter between 180 and 6000 Hz,
and obtain the filtered version with istft. In this example the
filter decays over time, so it appears as if the rest of the band enter with a fade-in.
A window of half-size m=8192 is used in order to obtain a good frequency
resolution. Enter:
data,N,lo,hi,sr:=channel_data(music),8192,180,6000,samplerate(music):; |
Boundary bin indices for the band-reject filter are obtained by multiplying m resp. n
with the quotient of lo resp. hi and the upper frequency bound of
the spectrum (i.e. the Nyquist frequency sr/2):
m,n:=round(2*N*lo/sr,2*N*hi/sr) |
To compute the STFT:
To attenuate the bins between m and n from zero to one over time, enter:
tmp:=tran(spctrm):;
alpha:=linspace(0.0,1.0,length(spctrm)).^4:;
band:=mid(tmp,m,n-m):;
for k from 0 to n-m-1 do band[k]=<band[k].*alpha; od:; |
spctrm2:=tran(concat(left(tmp,m),band,right(tmp,N-n+m))):; |
The last commandline assembles the modified spectrum.
Now compute the inverse STFT and create an audio clip
from the obtained data:
fadein:=createwav(istft(spctrm2),samplerate=sr,normalize=-1) |
|
a sound clip with 341366 samples at 22050 Hz (16 bit, mono)
| | | | | | | | | | |
|
Now by entering
you can clearly hear the bass part accompanied by the hi-hat rhythm, but no other instruments
in the beginning. About halfway into the excerpt, other instruments fade in gradually, reaching
the full intensity in the end.