Chirplet Decomposition
Joint estimation of Gaussian chirplet parameters via mixture modeling
Chirplet Decomposition
Introduction
Many natural signals are nonstationary or transient: bat and whale echoes, evoked potentials, and radar clutter are common examples. Such signals can be analyzed with wavelet transforms, short-time Fourier transforms, adaptive time-frequency distributions, matching pursuit, and related methods.
Representations that adapt their tiling of the time-frequency plane can achieve high resolution in time and frequency simultaneously. Atomic decomposition [1, 2, 3] is one such class of methods. It approximates a signal as a weighted combination of atoms chosen from a dictionary. If the dictionary is finite, some prior knowledge of the signal is needed so the dictionary contains atoms suited to the signal structure. If an efficient atom-selection method is available, an infinite dictionary is another possibility.
Chirplets, which are windowed linear frequency-modulated signals, provide flexible tilings through shearing, translation, and rotation. Gaussian chirplets are especially attractive because they are computationally tractable and attain the minimum time-bandwidth product. Many algorithms decompose a signal as a weighted combination of chirplets, but most use a divide-and-conquer strategy: repeatedly decompose the residual until its energy is negligible. Since chirplets are not orthogonal, such decompositions are not unique. In addition, most methods do not formally estimate the number of components, and estimator efficiency is rarely characterized through Cramer-Rao bounds except in single-component cases.
In this work, we jointly estimate chirplet parameters while treating the number of components as unknown. We model the spectrogram of the signal as a Gaussian mixture model (GMM), estimate the mixture parameters using the Expectation-Maximization (EM) algorithm, and use functional merging to estimate the number of components. Chirplet parameters, except phase, are obtained by the method of moments and refined using Fisher scoring. Cramer-Rao bounds are then derived for all parameters.
Chirplet Decomposition
Employing chirplets to decompose a signal unifies the time-frequency tilings such as rotation, shearing, scaling and shifting. Hence, we proceed with the following assumption that any signal can be represented as a weighted sum of Gaussian chirplets given by:
\[ y(t) = s(t) + w(t) \tag{1a} \]
\[ s(t) = \sum_{k=1}^{K} (a_k + jb_k)\, g_k(t) \tag{1b} \]
where
\[ g_{k;t_c,\omega_c,\beta,d^2}(t) = \left(\frac{1}{\pi d^2}\right)^{\frac{1}{4}} \exp\left(-\frac{(t-t_c)^2}{4d^2} + j\frac{\beta(t-t_c)^2}{2} + j\omega_c(t-t_c)\right) \tag{2} \]
\(t_c, \omega_c, \beta, d^2\) represent the location in time, location in frequency, chirp rate and the duration parameter of the chirplet and \(w(t)\) is additive complex white Gaussian noise with unknown variance. Further
\[ \| g_{t_c,\omega_c,\beta,d^2} \| = 1 \tag{3} \]
to make them unit energy signals.
Time-frequency modeling
We model the time-frequency plane as a mixture of normal PDFs [4] and estimate the parameters of the chirplets. It requires interpreting the time-frequency representation as an arbitrary bivariate PDF which would be modeled as GMM. It may appear strange that we are trying to view the time-frequency space as a mixture of normal PDFs by applying a random process modeling technique to a deterministic problem. However, we are using it just as a means to accomplish the task of decomposing the signal into chirplets. The initial choice of the time-frequency representation affects the model parameter estimation. We observe that the spectrogram of a Gaussian chirplet and normal density have structural similarities. We will utilize this fact to derive a set of mapping rules. Let \(T(t, f)\) be the spectrogram of the signal under consideration. Also assume that \(t\) and \(f\) axis, representing time and frequency respectively, are sampled sufficiently smoothly on an \(N \times M\) grid and let \(\theta_{mn} = [t(n), f(m)]^T\)
\[ f(\theta_{mn}; \Phi) = \sum_{k=1}^{K} \lambda_k f_k(\theta_{mn}; \phi_k) \tag{4a} \]
\[ T(\theta_{mn}) \approx f(\theta_{mn}; \Phi) \tag{4b} \]
where
- \(K\) is the total number of components
- \(\theta_{mn}\) is the \(mn\)th sample in the \(N \times M\) grid
- \(\phi_k = [\mu_k, \Sigma_k]\) represent the mean vector and covariance matrix of the \(k\)th density
- \(f_k \sim \mathcal{N}(\mu_k, \Sigma_k)\)
- \(\Phi = [\phi_1 \ldots \phi_K, \lambda_1 \ldots \lambda_K]\)
- \(\{\lambda_k\}\) represent the mixing weights with constraint \(\sum_{k=1}^{K} \lambda_k = 1\)
We can estimate these parameters by the application of EM algorithm as:
\[ \hat{\lambda}_{k,new} = \sum_{n=1}^{N} \sum_{m=1}^{M} \gamma_k(\theta_{mn}; \hat{\Phi}) \tag{5} \]
\[ \hat{\mu}_{k,new} = \frac{1}{\hat{\lambda}_{k,new}} \sum_{n=1}^{N} \sum_{m=1}^{M} \gamma_k(\theta_{mn}; \hat{\Phi})\, \theta_{mn} \tag{6} \]
\[ \hat{\Sigma}_{k,new} = \frac{1}{\hat{\lambda}_{k,new}} \sum_{n=1}^{N} \sum_{m=1}^{M} \gamma_k(\theta_{mn}; \hat{\Phi})(\theta_{mn} - \hat{\mu}_k)^T (\theta_{mn} - \hat{\mu}_k) \tag{7} \]
where
\[ \gamma_k(\theta_{mn}; \hat{\Phi}) = \frac{\hat{\lambda}_k f_k(\theta_{mn}; \hat{\theta}_k) T(\theta_{mn}; \Phi)}{f(\theta_{mn}; \hat{\Phi})} \tag{8} \]
In [4], time-frequency projection filters were used synthesize time domain signals based on the regions identified by the individual densities in the mixture. However, we show that, if the initial time-frequency representation is chosen as spectrogram, one can immediately synthesize corresponding time domain signals except for the phase. This is so because phase information is lost in the spectrogram decomposition. We map the parameters of a bivariate normal density to a chirplet by matching the moments. A Gaussian window with duration parameter \(\alpha^2\) is chosen in computing the spectrogram. The spectrogram of a chirplet \(g_k\) is given by [5]:
\[ T_{g_k}(t, f) = \frac{P(t - t_c)}{\sqrt{2\pi\sigma^2_{\omega/t}}} \exp\left(-\frac{(\omega - \omega_c - \langle \omega_t \rangle)^2}{2\sigma^2_{\langle \omega_t \rangle}}\right) \tag{9a} \]
\[ P(t - t_c) = \left(\frac{1}{\pi(d^2 + \alpha^2)}\right)^{\frac{1}{2}} \exp\left(\frac{1}{\pi(d^2 + \alpha^2)} t^2\right) \tag{9b} \]
\[ \langle \omega_t \rangle = \left(\frac{1}{\pi(d^2 + \alpha^2)}\right) \beta(t - t_c) \tag{9c} \]
and a bivariate normal density is given as:
\[ f_k(\theta) = \frac{1}{\sqrt{2\pi|\Sigma_k|}} \exp\left(-\frac{1}{2}(\mu_k - \theta)^T \Sigma_k^{-1} (\mu_k - \theta)\right) \tag{10a} \]
\[ \Sigma_k = \begin{bmatrix} \sigma^2_{k1} & \sigma_{k1}\sigma_{k2} \\ \sigma_{k1}\sigma_{k2} & \sigma^2_{k2} \end{bmatrix} \tag{10b} \]
\[ \mu_k = \begin{bmatrix} \mu_{k1} \\ \mu_{k2} \end{bmatrix} \tag{10c} \]
By matching the moments (method of moments) of the spectrogram (a time-frequency distribution) and the bivariate normal density given above, we obtain:
\[ \frac{\alpha^2}{(d^2 + \alpha^2)}\beta = \rho\left(\frac{\sigma_1}{\sigma_2}\right) \tag{11a} \]
\[ \frac{1}{(d^2 + \alpha^2)} = \frac{1}{2\sigma^2_1} \tag{11b} \]
\[ \sigma^2_2(1 - \rho^2) = \frac{d^2 + \alpha^2}{2d^2\alpha^2} + \frac{\beta^2 d^2 \alpha^2}{2(d^2 + \alpha^2)} \tag{11c} \]
\[ \begin{bmatrix} \mu_{k1} \\ \mu_{k2} \end{bmatrix} = \begin{bmatrix} t_c \\ \omega_c \end{bmatrix} \tag{11d} \]
The parameters of the chirplet are obtained by solving the above equations simultaneously as:
\[ d^2 = \frac{1 + (\alpha^2 \rho\, r)^2}{r^2(1 - \rho^2)} \tag{12a} \]
\[ \beta = \frac{\rho\, r(d^2 + \alpha^2)}{d^2 \alpha^2} \tag{12b} \]
\[ \begin{bmatrix} \mu_{k1} \\ \mu_{k2} \end{bmatrix} = \begin{bmatrix} t_c \\ \omega_c \end{bmatrix} \tag{12c} \]
where \(r = \sigma_1/\sigma_2\). It should be noted that we cannot estimate the complex weight \(c_k\) since we lose phase information in the spectrogram. We consider using quasi-Newton optimization algorithms to refine the parameters and also to estimate the amplitudes of the individual components. We use the estimates provided by the EM algorithms as initial estimates to the quasi-Newton algorithms. Besides providing faster convergence, if the initial estimates are carefully chosen, they also provide error covariances which could be used in designing time-frequency detectors.
Nested Fisher scoring
We rewrite the model in vector form as:
\[ \mathbf{y} = \mathbf{s} + \mathbf{w} \tag{13a} \]
\[ \mathbf{s} = \mathbf{G}\boldsymbol{\lambda} \tag{13b} \]
\[ \mathbf{G} = [\mathbf{g}_1, j\mathbf{g}_1, \ldots \mathbf{g}_K, j\mathbf{g}_K] \tag{13c} \]
where
- \(\mathbf{y}\): \(N \times 1\) vector of measurements
- \(\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2 \mathbf{I})\) complex AWGN with unknown variance \(\sigma^2\)
- \(\boldsymbol{\theta}_k = [t_{ck}, f_{ck}, \alpha_k, \beta_k]^T\)
- \(\boldsymbol{\theta} = [\boldsymbol{\theta}_1^T, \ldots, \boldsymbol{\theta}_K^T]^T\)
- \(\boldsymbol{\lambda} = [a_1, b_1, \ldots, a_K, b_K]^T\)
- \(\boldsymbol{\phi} = [\boldsymbol{\lambda}^T, \boldsymbol{\theta}^T]\)
- \(\boldsymbol{\gamma} = [\sigma^2, \boldsymbol{\phi}^T]^T\)
- \(\mathbf{g}_k\) is a function of \(\boldsymbol{\theta}_k\)
The log-likelihood is given as:
\[ \ell(\mathbf{y}; \boldsymbol{\gamma}) = -N \log(\pi\sigma^2) - \frac{1}{\sigma^2}(\mathbf{y} - \mathbf{s})^H (\mathbf{y} - \mathbf{s}) \tag{14} \]
and the Fisher’s method of scoring produces the estimates using the following iteration,
\[ \hat{\boldsymbol{\gamma}}^{i+1} = \hat{\boldsymbol{\gamma}}^i + \mathcal{I}_{\boldsymbol{\gamma}}^{-1}\, 2\mathcal{R}e\left\{\frac{\partial}{\partial\boldsymbol{\gamma}}\ell^H(\mathbf{y}; \boldsymbol{\gamma})\right\}\bigg|_{\boldsymbol{\gamma} = \hat{\boldsymbol{\gamma}}^i} \tag{15} \]
Using [6, Appendix 15.c], the Fisher Information matrix is given as:
\[ \mathcal{I}_{\boldsymbol{\gamma}} = \begin{bmatrix} \mathcal{I}_{\sigma^2} & \mathbf{0} \\ \mathbf{0} & \mathcal{I}_{\boldsymbol{\phi}} \end{bmatrix} \tag{16} \]
where
\[ \mathcal{I}_{\sigma^2} = \frac{N}{\sigma^4} \quad \text{and} \quad \mathcal{I}_{\boldsymbol{\phi}} = \frac{2}{\sigma^2}\left[\frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\phi}} \frac{\partial \mathbf{s}}{\partial\boldsymbol{\phi}}\right] \tag{17} \]
Ignoring the nuisance parameter \(\sigma^2\),
\[ \hat{\boldsymbol{\phi}}^{i+1} = \hat{\boldsymbol{\phi}}^i + \mathcal{I}_{\boldsymbol{\phi}}^{-1}\, 2\mathcal{R}e\left\{\frac{\partial}{\partial\boldsymbol{\phi}}\ell^H(\mathbf{y}; \boldsymbol{\gamma})\right\}\bigg|_{\boldsymbol{\phi} = \hat{\boldsymbol{\phi}}^i} \tag{18} \]
Since the parameters \(\boldsymbol{\lambda}\) enter linearly, we could use nested-Fisher scoring [7]. This also becomes the separable non-linear least squares since the noise is Gaussian. Fisher-scoring iteration on the concentrated log-likelihood is then given as:
\[ \hat{\boldsymbol{\theta}}^{i+1} = \hat{\boldsymbol{\theta}}^i + \mathcal{I}_{\boldsymbol{\theta}}'^{-1}\, 2\mathcal{R}e\left\{\frac{\partial}{\partial\boldsymbol{\phi}}\ell^H(\mathbf{y}; \boldsymbol{\gamma})\right\} \tag{19a} \]
\[ \mathcal{I}_{\boldsymbol{\theta}}' = \frac{2}{\sigma^2} \frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\theta}} \left(\mathbf{I} - \frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\lambda}} \left(\frac{\partial \mathbf{s}}{\partial\boldsymbol{\lambda}} \frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\lambda}}\right)^{-1} \frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\lambda}}\right) \frac{\partial \mathbf{s}}{\partial\boldsymbol{\theta}} \tag{19b} \]
\[ \frac{\partial}{\partial\boldsymbol{\phi}}\ell^H(\mathbf{y}; \boldsymbol{\gamma}) = \frac{2}{\sigma^2} \frac{\partial \mathbf{s}^H}{\partial\boldsymbol{\theta}} (\mathbf{y} - \mathbf{s}) \tag{19c} \]
The score vector and the Fisher information matrix are evaluated at:
\[ \boldsymbol{\theta} \leftarrow \hat{\boldsymbol{\theta}}^i \tag{20a} \]
\[ \boldsymbol{\lambda} \leftarrow (\mathbf{G}^H\mathbf{G})^{-1}\mathbf{G}^H\mathbf{y} \tag{20b} \]
Cramer-Rao bound
The Cramer-Rao lower bounds on the variance of the parameters are given as:
- \(\mathrm{CRLB}_{\sigma^2} = \dfrac{\sigma^4}{N}\)
- \(\mathrm{CRLB}_{\boldsymbol{\lambda}} = \dfrac{1}{\sigma^2}\mathbf{G}^H\mathbf{G}\)
- \(\mathrm{CRLB}_{\boldsymbol{\theta}} = \mathcal{I}_{\boldsymbol{\theta}}'^{-1}\)
Gradients
Here analytical expressions for the gradients are given. The discrete version of the signal is given by:
\[ g_k(n) = \left(\frac{1}{\pi d^2_k}\right)^{\frac{1}{4}} \exp\left(-\frac{(n - t_{c,k})^2}{4d^2_k} + j\frac{\beta(n - t_{c,k})^2}{2} + j\omega_{c,k}(n - t_{c,k})\right) \tag{21} \]
By noting that,
\[ \frac{\partial s(n)}{\partial\boldsymbol{\theta}_k} = \frac{\partial g_k(n)}{\partial\boldsymbol{\theta}_k} \tag{22} \]
we calculate the gradients of a chirplet and drop the subscript \(k\) for simplicity
\[ \frac{\partial g(n)}{\partial t_c} = g(n)\left(\frac{n - t_c}{4d^2} - j\beta(n - t_c) - j\omega_c\right) \tag{23a} \]
\[ \frac{\partial g(n)}{\partial\omega_c} = g(n)\left(j\omega(n - t_c)\right) \tag{23b} \]
\[ \frac{\partial g(n)}{\partial d^2} = g(n)\left(\frac{(n - t_c)^2}{4d^4} - \frac{1}{d^2}\right) \tag{23c} \]
\[ \frac{\partial g(n)}{\partial\beta} = g(n)\left(\frac{j}{2}(n - t_c)^2\right) \tag{23d} \]
We can use [8] to obtain closed-form expressions for the Cramer-Rao bound. This completes the estimation process.
References
[1] A. Bultan, “A four-parameter atomic decomposition of chirplets,” IEEE Trans. on Signal Processing, vol. 41, pp. 731–745, 1999.
[2] J. C. O’Neill and P. Flandrin, “Chirp hunting,” in Proc. of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, 1998, pp. 425–428.
[3] R. N. Bracewell and D Mihovilovic, “Adaptive chirplet representation of signals on time-frequency plane,” Electronics Letters, vol. 27, pp. 1159–1161, 1991.
[4] Mark Coates, Time-frequency modeling, Ph. D, Univ. of Cambridge, Signal Processing and Communications Laboratory, Dept. of Engineering, August 1998.
[5] Leon Cohen, Time-Frequency Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1995.
[6] Steven M. Kay, Fundamentals of Statistical Signal Processing, Vol-1 Estimation theory, Prentice-Hall PTR, 1993.
[7] Gordon K Smith, “Partitioned algorithms for maximum likelihood and other nonlinear estimation,” Statistics and Computing, vol. 6, pp. 201–216, 1996.
[8] Kostas Triantafyllopoulos, “Moments and cumulants of the multivariate real and complex Gaussian distributions,” 2002, www.stats.bris.ac.uk/research/stats/pub/ResRept/2002.html.