3 Analyses: Formulas & Code

This section provides the definitions for all the analysis types covered by paPAMv2. In the formulae, signal vectors are represented by \(x\) and specific sample \(i\) in a signal is denoted by \(x_i\). Additionally, example code is given in both R and python (blue and peach shaded boxes, respectively). In the example code blocks, the calibrated signal to be analyzed will be represented by the variable x — an R vector or single channel numpy array in Python. For multi-channel analysis types, the data for each channel will be represented as x_x, x_y, x_z, and x_p — the suffix _p indicates the pressure channel of the vector sensor.

For measures presented in decibels, the root-power reference units for pressure, displacement, velocity, and acceleration are 1μPa, 1pm, 1nms⁻¹ and 1μms⁻², respectively.

The analyses below may return either power quantities or root-power quantities. To clarify this, power units will be denoted by the upper case variable P, such as \(P\), and root-power by lower case, \(p\). A power quantity simply means that the returned unit scales linearly with power or energy units. Root-power values on the other hand must be squared to scale linearly. Hence, the decibel formula must be adjusted for root-power quantities: \[P \triangleq p^2.\]

To convert a power quantity, \(P\), into a decibel level, we use \[L = 10 \cdot \mathrm{log}_{10} \left( \frac{P}{P_0} \right),\] where \(P_0\) is our reference power value. When working with root-power quantities, \(p\) and \(p_0\) , we must square our root-power terms, giving \[L = 10 \cdot \mathrm{log}_{10} \left( \frac{p^2}{p_0^2} \right),\] or the following equivalent form \[L = 20 \cdot \mathrm{log}_{10} \left( \frac{p}{p_0} \right).\] The latter form may seem more familiar as its the typical equation given for Sound Pressure Level calculations.

L = 10*log10(P/P_0)
L = 20*log10(p/p_0)

import numpy as np
L = 10*np.log10(P/P_0)
L = 20*np.log10(p/p_0)

3.1 Continuous Sounds

3.1.1 Root-mean-square

A measure of the mean acoustic energy, i.e. acoustic power, of a signal. This has previously been referred to as Sound Pressure Level, Sound Velocity Level, and Sound Acceleration Level, depending of which acoustic quantity is being examined. \[p = \left[ \frac{\sum_{i = 1}^n x_i^2 }{n} \right]^{0.5}, \] where \(n\) is the number of samples in \(x\).

p = sqrt(mean(x^2))

p = mean(x**2)**0.5

3.1.2 Kurtosis

Kurtosis characterizes the shape of your sample distribution. For a recording ofGaussian noise, Kurtosis is expected to be 3. Higher values suggest that your sample has heavy-tails — i.e. you have a relatively more extreme-valued measurements.

\[\beta = \left(t_n - t_0\right) \cdot \frac{\sum^n_{i = 1} x_i^4 \cdot dt} {\left[\sum^n_{i = 1} x_i^2 \cdot dt \right]^2},\] where \(t_0\) and \(t_n\) are the starting and stopping times of your signal, in seconds, and \(dt\) is the sampling period.

# fs: sample rate of signal
t = c(0, length(x)/fs)  # start and stop times of signal
dt = 1/fs # Sample period
num = sum((x^4) * dt)
dnm =  sum((x^2) * dt) ^ 2
kurtosis = (t[2] - t[1]) * num/dnm

# fs: sample rate of signal
t = (0, len(x)/fs)  # start and stop times of signal
dt = 1/fs # Sample period
num = sum((x**4) * dt)
dnm =  sum((x**2) * dt) ** 2
kurtosis = (t[1] - t[0]) * num/dnm

3.2 Transient Sounds

3.2.1 Exposure

The cumulative acoustic energy measured by the sensor. Samples are normalized with respect to 1 second, hence exposure is reported in units such as μPa²·s.

\[ P = \sum_{i = 1}^n x_i^2 \cdot dt,\] where \(dt\) is the sampling period.

# fs: sample rate
dt = 1/fs
P = sum(x^2)*dt

# fs: sample rate
dt = 1/fs
P = sum(x**2)*dt

3.2.2 Zero-to-peak

The maximum absolute value in your sample. A measurement of how loud a transient sound is.

\[p = \textrm{max}\left(|x|\right)\]

p = max(abs(x))

p = max(abs(x))

3.2.3 Impuse Kurtosis

Impulse kurtosis is taken from Mueller et al. (2020). This measure of Kurtosis is unaffected by quiet regions at the start and end of a signal. It was developed to calculate the kurtosis of seismic survey shots in a standardized manner.

\[\beta_\textrm{impulse} = \frac{\sum^n_{i = 1} x_i^4 \cdot dt} {\left[\sum^n_{i = 1} x_i^2 \cdot dt \right]^2}.\]

# fs: sample rate of signal
dt = 1/fs # Sample period
num = sum((x^4) * dt)
dnm =  sum((x^2) * dt) ^ 2
kurtosis_impulse = num/dnm

# fs: sample rate of signal
dt = 1/fs # Sample period
num = sum((x**4) * dt)
dnm =  sum((x**2) * dt) ** 2
kurtosis_impulse = num/dnm

3.2.4 Impuse Root-mean-square

A measure of root-mean-square that is unaffected by quiet periods at the start and end of recordings.

\[p = \left[ \frac{\sum_{i = i_{0}}^{i_n} x_i^2 }{n} \right]^{0.5}, \] where \(i_0\) and \(i_n\) are the start and stop samples of the measurement period and \(n\) is the number of samples within that period. The measurement period can be taken to be the interval where the cumulative exposure lies between the 5th and 95th percentile of the total exposure in the sample.

#fs: sample rate
dt = 1/fs
E = sum(x^2) * dt
n = length(E)

# Cumulative energy quantiles
p_0 = 0.05
p_1 = 0.95

i_0 = which(E >= E[n]*p_0)[1]
i_1 = which(E >= E[n]*p_1)[1]

idx = i_0:i_1
p = sqrt(mean(x[idx]^2))

#fs: sample rate
dt = 1/fs
E = sum(x**2) * dt
n = len(E)

# Cumulative energy quantiles
p_0 = 0.05
p_1 = 0.95

# Get indices of exposure values within period spanning the 5th and 95th
# percent exposure values.
i_all = [i for i, v in enumerate(E) if (v >= E[n]*p_0 or v <= E[n]*p_1) ]
i_0 = i_all[0]
i_1 = i_all[len(i_all)]

idx = range(i_0, i_1)
p = mean(x[idx]**2)**0.5

3.2.5 Crest factor

Crest factor has been used as a measure of for detecting whether or not a sound is “impulsive” (Martin, et al. 2020). It’s the zero-to-peak of the signal divided by the root-mean-square.

\[p = \frac{\textrm{max} \left( | x | \right) }{\left[ \frac{\sum_{i = 1}^n x_i^2 }{n} \right]^{0.5}}\]

p = max(abs(x)) / mean(x^2)^0.5)

import numpy as np
p = np.max(np.abs(x)) / np.mean(x**2)**0.5)

3.2.6 Rise time

Rise time is the number of seconds it takes for a signal to pass between two thresholds defined as some percentiles of the zero-to-peak of the signal. For instance, 5th and 95th percentiles. For an impulse, this is roughly interpreted how long the initial rising slope of the signal lasts.

\[t_\textrm{rise} = t_{95} - t_{5},\] where \(t_5\) and \(t_{95}\) represent the seconds where the 5th and 95th percentiles of zero-to-peak of the signal are first exceeded.

# Set zero-to-peak percentiles
limits = c(0.05, 0.95)

# Calculate peak
peak = max(abs(x))

# Mark start and end points
start = which(abs(x) >= peak*limits[1])[1]
end = which(abs(x) >= peak*limits[2])[1]

# return value
# fs: Sample rate
seconds_rise = (end - start) / fs

import numpy as np

# Set zero-to-peak percentiles
limits = (0.05, 0.95)

# Calculate peak
peak = np.max(abs(x))

# Mark start and end points
start = np.where(abs(x) >= peak*limits[0])[0][0]
end = np.where(abs(x) >= peak*limits[1])[0][0]

# return value
# fs: Sample rate
seconds_rise = (end - start) / fs

3.2.7 Pulse length

The period of time spanning between two thresholds of cumulative exposure. This is a method of measuring the duration of a transient signal which has is robust to quiet periods at the start or end of the signal.

\[t_\textrm{pulse} = t_{95} - t_{5},\] where \(t_5\) and \(t_{95}\) represent the seconds where the cumulative exposure exceeds the 5th and 95th percentiles of total exposure.

# Calculate (un-normalized) cumulative exposure
E = cumsum(x^2)
# total exposure
E_n = E[length(E)]

# Percentiles to use for thresholds
limits = c(0.05, 0.95)

# Mark start and end points
start = which(E >= E_n*limits[1])[1]
end = which(E >= E_n*limits[2])[1]

# return value
# fs: sample rate
t_pulse = (end - start) / fs

import numpy as np
# Calculate (un-normalized) exposure
E = np.cumsum(x**2)

# Percentiles to use for thresholds
limits = (0.05, 0.95)

# Mark start and end points
start = np.where(E >= E[-1]*limits[0])[0][0]
end = np.where(E >= E[-1]*limits[1])[0][0]

# return value
# fs: sample rate
t_pulse = (end - start) / fs

3.3 Frequency Domain

Frequency domain analyses utilize Fourier transformations on overlapping binned segments of a signal. These bins typically have window functions applied to them, such as Hann or Hamming functions. Using overlapping window segments to generate a spectrum is broadly referred to as Welch’s method (Welch, 1967) in the literature. We’ll refer to the vector holding a windowed bin with the variable \(w_b\), where \(b\) refers to the bin index. \(w_{b,i}\) refers to a specific measurement within that bin’s vector.

As windowing reduces the energy of a signal, windowed bins must have correction factors to compensate for this loss. Here, \(w_b\) is assumed to have the appropriate energy correction factor applied to it.

Fourier transformations will be denoted by \(\hat{w}_{b} = \mathcal{F}(w_b)\), where \(\hat{w}_b\) holds the single-sided fourier spectrum which has been scaled by the window length, \(n\). \(\hat{w}_{b,f}\) indexes a specific frequency element from that Fourier-transformed window.

3.3.1 Power/Energy Spectral Density

Power spectral density is the power of a signal distributed over the frequency spectrum. A common method to show the acoustic spectrum of continuous sounds.

\[ S_{xx}(f) = \frac{\sum_{b = 1}^B |\hat{w}_{b,f}|^2}{B} \cdot \frac{1}{f_s},\] where \(f_s\) is the sample rate of the signal. The term \(\frac{1}{f_s}\) normalizes the measure to 1Hz frequency bins.

Energy spectral density shows the total energy of the signal, rather than power. Converting power to energy just requires a multiplication by the duration of the signal.

\[ \bar{S}_{xx}(f) = \frac{\sum_{b = 1}^B |\hat{w}_{b,f}|^2}{B} \cdot \frac{1}{f_s} \cdot \frac{f_s}{N},\] where \(N\) is the number of samples in the whole signal.

# ---- user parameters
wl = 512    # window length
olap = 0.5  # proportion of overlap

# Lengh of signal in seconds
t = length(x)/fs
# Length of single-sided spectrum
n_single = floor(wl/2) + 1
# Find number of bins in signal
bins = floor((length(x)-wl)/(wl*olap))
# window(wl) is a function that returns a window envelope, such as Hann or Hamming
win = window(wl)
# Calculate energy correction factor for window function
# This compensates for energy that the windowing removed
c = 1/sqrt(mean(win^2))
# init output vector
psd = rep.int(0, times = n_single)
# ---- loop bins
for(i in 0:bins){
  
  # ---- Extract window segment
  w_idx <- ((i)*(wl*olap)):((i)*(wl*olap) + wl - 1) + 1 
  w = x[w_idx] * c * win
  # ---- make 2 sided fft (scale by window length)
  w_hat = fft(w)/wl  
  # ---- Extract complex magnitudes
  w_hat = abs(w_hat)
  # ---- Convert to power
  w_hat = w_hat^2
  # ---- Convert to density
  w_hat = w_hat / (fs)
  # ---- convert to one-sided spectrum
  w_hat = w_hat[1:n_single]
  if (n_single %% 2 == 1){
    idx <- 2:n_single     # if odd window length
  }else{
    idx <- 2:(n_single-1) # if even window length
  } 
  w_hat[idx] = w_hat[idx]*2
  # ---- add to output vector
  psd[] = psd[] + w_hat/bins
}
# ---- get frequency labels
freq = (0:(n_single-1)) * (fs/wl)

# ---- Verify result with Paserval's theorem, These should be near 0dB
# Power
10*log10(sum(esd)*(fs/wl)) - 10*log10(mean(x^2))
# Exposure
10*log10(sum(esd)*(fs/wl)*t) - 10*log10(sum(x^2)*(1/fs))

# ---- Plot PSD
plot(freq, 10*log10(psd), t = 'l', ylab = "PSD (dB)", xlab = 'Frequency (Hz)')
# ---- Plot ESD
plot(freq, 10*log10(psd*t), t = 'l', ylab = "ESD (dB)", xlab = 'Frequency (Hz)')

import numpy as np

3.3.2 Spectrogram

A spectrogram is simply a plot of the binned power spectral density values over both time and frequency. Summing or averaging a spectrogram over its time bins will result in a energy- or power-spectral-density plot, respectively.

\[ S_{xx}(b,f) = |\hat{w}_{b,f}|^2 \cdot \frac{1}{f_s},\]

3.3.3 Decidecade Exposure/Root-mean-square

Decidecades are roughly equal to one-third-octave intervals. The upper and lower limits of a decidecade band are given by

\[f_u = 100\cdot 10^{(k + 0.5)/10}\] and \[f_l = 100\cdot 10^{(k - 0.5)/10},\] where \(k\) is some integer number.

Within these bands, we can calculate root-mean-square pressure and exposure measures. Decidecade measures are essentially power spectral density plots with a window length equal to the sample rate. This results in 1Hz frequency bins. The power spectral density values are then summed within these bins to generate the root-mean-square and exposure, respectively.

Decidecade root-mean-square is given by \[ P(k) = \sum_{f=f_l}^{f_u}\frac{\sum_{b = 1}^B |\hat{w}_{b,f}|^2}{B} \cdot \frac{1}{f_s} \cdot df,\] where \(df \triangleq \frac{f_s}{n}= 1\), as we’ve chosen our window length to equal our sample rate. Decidecade exposure is then given then by \[ P(k) = \sum_{f=f_l}^{f_u}\frac{\sum_{b = 1}^B |\hat{w}_{b,f}|^2}{B} \cdot \frac{1}{f_s} \cdot \frac{f_s}{N} \cdot df,\]

3.4 Sound Intensity

Mean sound intensity is a measure of the direction of acoustic propagation. For a pressure channel and particle motion channel, instantaneous sound intensity is given by \[I_{x,i} = x_{p,i} \cdot x_{x,i},\] and the mean intensity with \[\bar{I_x} = \frac{\sum_{i=1}^N I_{x,i}}{N}.\] By taking the mean sound intensity of all vector sensor channels, the result is a vector pointing in the direction of sound propagation.

For directional spectrograms, mean sound intensity is calculated directly in the frequency domain. For two sine functions which have a phase offset: \(a\cdot \mathrm{sin}(t)\) and \(b\cdot \mathrm{sin}(t + \theta_f)\), where \(\theta_f\) is the offset in radians for a frequency \(f\); the product is given by \[2ab \cdot \mathrm{sin}(t+\theta_f) \cdot \mathrm{sin}(t) = ab\cdot \mathrm{cos}(t - \theta_f - t) - cos(t + t + \theta_f) = ab\cdot \mathrm{cos}(-\theta_f) - \mathrm{cost}(2t+ \theta_f).\] Therefore, calculating the sound intensity in the frequency domain simply requires finding the phase difference between the particle motion and pressure channels \[\theta_f = \mathrm{Arg}(\hat{w}_{x,b,f}) - \mathrm{Arg}(\hat{w}_{p,b,f}),\] and calculating the expectation of the product between the two sine functions \[\bar{I}_{x,f} = E\left[ \frac{ab\cdot \mathrm{cos}(-\theta_f) - \mathrm{cos}(2t + \theta_f)}{2} \right] = \frac{ab\cdot\mathrm{cos}(-\theta_f)}{2}.\] This is the method that paPAMv2 uses to calculate sound intensity vectors for directional spectrograms.

3.5 References

Martin SB, Lucke K, Barclay DR. (2020). Techniques for distinguishing between impulsive and non-impulsive sound in the context of regulating sound exposure for marine mammals. J Acoust Soc Am. 147(4):2159. doi: 10.1121/10.0000971.
Müller, R. A. J., Benda-Beckmann, A. M. von, Halvorsen, M. B. and Ainslie, M. A. (2020). Application of kurtosis to underwater sound. J Acoust Soc Am 148, 780–792.
Welch, P. (1967). The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics 15(2), 70–73.