## Overview

As almost all bioacoustic studies examine frequency in some way or another, it’s important to understand how audio signals are projected from the time domain to the frequency domain. The term frequency domain simply refers to viewing the content of a signal with respect to frequency, as opposed to time. Power spectral density or spectrogram plots are common frequency domain representations of signals. This post will deal with the process of executing this projection and provide context to the interpretation of measures derived from the frequency domain.

As this material can be quite intimidating for biologists, I’ve tried to present the information in a visual manner and write in a less formal language than you’d come across in an acoustics text. Don’t stress about getting lost in the formulas as it may take a few readings before it sinks in (at least it did for me).

## Discrete Fourier Transform

The discrete Fourier transform (DFT) lies at the heart of signal processing. Any periodic signal with $$N$$ samples can be reconstructed using $$N$$ weighted sine functions across a spectrum of frequencies. This allows us to project time domain information into the frequency domain. If $$\mathbf{x} = {x_0 , x_1 , . . . , x_{N −1} }$$ is a vector of pressure measurements over time from some recorded signal, the discrete Fourier transform can be written as:

This may look a bit abstract to those who are new to the concept. A more intuitive way to write this is to apply Euler’s formula $$e^{ix} = \cos(x) + i \cdot \sin(x)$$ and rewrite it with respect to trigonometric functions you’re likely more familiar with.

Here, the term $$k \in [0, N − 1]$$ determines the frequency being examined. $$k$$ can be converted to hertz with

where $\text{Fs}$ is the sample rate of the signal. The image below provides a visual breakdown of the components of the discrete Fourier transform when applied to a 100Hz pure tone. We’ve shown the DFT with respect to two $k$ values: 100Hz and 33.3Hz. The top panels show the pure tone in the time domain and the circular domain the DFT operates in. In the middle and lower plots, blue lines show the measured signal, $\color{blue}x_n$. Shaded green and red areas represent $$\color{green} x_n \cdot \cos \left( 2 \pi n \frac{k}{N} \right)$$ and $$\color{red} -x_n \cdot \sin \left( 2 \pi n \frac{k}{N} \right)$$ which are averaged and combined into a complex value where the complex modulus and argument show the amplitude and phase of the sine wave at the given $k$ value.

The complex modulus of these resulting values in $X_k$ represent the spectral density of each frequency in the signal. If we plot the modulus of $X_k$ from the DFT above and convert our $k$ values to Hz, we may see something like the following spectral density plots. The mirrored values on the two-sided spectrum are a result of the conjugate symmetric property of the DFT when analysing $k$ values beyond the Nyquist frequency. The one-sided spectrum can be calculated by simply removing the negative frequencies and normalizing the remaining y-axis values by multiplying them by 2.

It’s important to note that as the Fourier transform works in a circular domain, it assumes periodic signals. This is an assumption that is never met in practice and simply means that the signal consists of a full cycle of a repeating pattern. To be clear, the figure below provides an illustrated guide to what we mean by periodicity. When we apply the DFT to our non-periodic 100Hz tone, the negative values of the signal are over represented. This causes the incorrect attribution of spectral energy to frequenices outside 100Hz.

## Power Spectral Density

As mentioned, measuring real signals inevitably results in violating the DFT assumption of periodicity. To reduce the effects of spectral leakage on our spectral density estimates, we can apply windowing and averaging.

Windowing reduces spectral leakage by applying an envelope function to our measured signal which tapers the amplitude toward the start and end of the signal. This will result in a more periodic signal at the cost of altering the information within signal, hence the choice of window function is important. Two common choices are the Hann and Hamming windows. Hann windows are designed to minimize the spectral leakage across the whole spectrum while Hamming windows offer less of a reduction of spectral leakage across the spectrum but provides more precise estimates of those frequencies present in the signal. The choice of window type needs to be determined with respect to the purpose of the analysis. The figure above shows DFT results with and without the application of a Hann window. You may have noticed that the areas under each line in the DFT plot are equal. This is termed Parseval’s theorem. When we apply a DFT to a signal, the resulting energy projected onto the frequency domain must be equal to the energy found in the original time domain. Hence, high degrees of spectral leakage can lead to underestimates of the true spectral densities of frequencies (peaks in the DFT will be underestimated more severely) as the spectral energy from the true frequency is incorrectly attributed to other frequencies.

As a consequence of windowing, information located near the beginning or end of our signal is going to be under-represented in the spectral density estimates. With ideal stationary signals this isn’t much of a concern, but when dealing with noisy, non-stationary signals this needs to be addressed. A popular routine to control for this problem is:

• Split the signal into multiple small windows which overlap by 50%.

• Apply a window function to each of these window segments to reduce the spectral leakage.

• Average the DFT results across all window segments.

This protocol for estimating spectral density is termed Welch’s method and is the kernel of all power spectral density (PSD) and spectrogram plots reported in the bioacoustic literature. By choosing smaller window sizes, you’re reducing the number of samples in the DFTs and thus reducing the frequency resolution of the resulting PSD plot. However, smaller window sizes allow averaging across more window segments which results in a smoother plot providing a more accurate description of the spectral power. Converting our spectral density estimates to PSD values is simple as we just need to convert from pressure (root power quantity) to power by squaring. We can then report our PSD values as decibels

where $P$ and $P_0$ represent your measured and reference power values, respectively. If we want to represent our measured signal as a spectrogram rather than a PSD plot, instead of averaging our PSD values over time we simply calculate the PSD values for each window segment and plot them with respect to time and frequency.

Updated on: 2019-07-20