Skip to content

梅尔倒谱与全通滤波器

Posted on:June 21, 2020 at 12:00 AM

Table Of Contents

Open Table Of Contents

1. 倒谱

倒谱

c(m)=12πππlogX(w)ejωmdω=1KΣk=0KlogX(k)ej2πmkK(1)\begin{aligned} c(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}logX(w) e^{j\omega m} d \omega \\ =& \frac{1}{K} \Sigma_{k=0} ^K logX(k) e^{j2\pi \frac{ mk}{K}} \end{aligned} \tag{1}

2. 梅尔倒谱

梅尔尺度:人耳的频率分辨率在不是均等的,而是随着频率升高而递减

fmel=2595 log10(1+f700)(2) f_{mel} = 2595\ \log_{10}(1+\frac{f}{700}) \tag{2}

根据(4)构造一个三角滤波器组 HmelH_{mel}

Xmel=HmelX(3)X_{mel} = H_{mel} X \tag{3}

XmelX_{mel}为梅尔频谱,进一步由梅尔频谱可以得到梅尔倒谱

cmel(m)=12πππlog[Hmel(w)X(w)]ejωmdω=1LΣl=0LlogXmel(l)ej2πmlL(4)\begin{aligned} c_{mel}(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}log[H_{mel}(w)X(w)] e^{j\omega m} d \omega \\ =& \frac{1}{L} \Sigma_{l=0} ^L logX_{mel}(l) e^{j2\pi \frac{ ml}{L}} \end{aligned}\tag{4}
Xmel(l)=Σk=0KHmel(l,k)X(k)(5)X_{mel}(l) = \Sigma_{k=0}^K H_{mel}(l,k) X(k) \tag{5}

梅尔三角滤波器组的成分如下:

Hmel(l,k)={0fk<fl1 or fk>fl+1fkfl1(flfl1)(fl+1fl1)fl1fkflfl+1fk(fl+1fl)(fl+1fl1)flfkfl+1 for l=1 to L, k=1 to K(6)H_{mel}(l,k) =\left\{ \begin{aligned} &0 && f_k < f_{l-1}\text{ or } f_k > f_{l+1}\\ &\frac{f_k - f_{l-1}}{(f_l - f_{l-1})(f_{l+1}-f_{l-1})} && f_{l-1} \leq f_k \leq f_{l} \\ &\frac{f_{l+1} - f_k}{(f_{l+1} - f_{l})(f_{l+1}-f_{l-1})} && f_{l} \leq f_k \leq f_{l+1} \\ \end{aligned} \right. \text{ for } l = 1 \text{ to } L,\ k = 1 \text{ to } K \tag{6}

这种方法计算的梅尔倒谱一般叫做MFCC特征,一般取L=13L=13

根据(3), HmelH_{mel}(LK)(L * K)的矩阵,由于 L<KL<K, 由XmelX_{mel}恢复XX是个欠定问题,因此MFCC特征通常用在不需要恢复频谱包络的问题中,例如语音识别,音乐风格分类等。 对于需要恢复频谱包络的问题(如语音合成), 一般采用全通滤波器进行频率弯折。

3. 全通滤波器

3.1 mcep

如下全通滤波器可以对频率进行可控弯折

H(z)=z1α1αz(5) H(z) = \frac{z^{-1}-\alpha}{1-\alpha z} \tag{5}
ωα=βα(ω)=H=arctan(1α2)sinω(1+α2)cosω2αβα1(ω)=arctan(1α2)sinω(1+α2)cosω+2α(6)\begin{aligned} \omega_{\alpha} = \beta_{\alpha}(\omega) = \angle H = \arctan\frac{(1-\alpha^2)\sin \omega}{(1+\alpha^2)\cos\omega -2 \alpha} \tag{6} \\ \beta^{-1}_\alpha(\omega) = \arctan\frac{(1-\alpha^2)\sin \omega}{(1+\alpha^2)\cos\omega + 2 \alpha} \end{aligned}

通过调节α\alpha,可以调节弯折程度,α>0\alpha>0 时,频率采样率随着频率升高而降低; α<0\alpha<0 时,频率采样率随着频率升高而升高。 在16kHz16kHz的采样率下,α=0.42\alpha=0.42可以作为梅尔尺度的近似估计。

定义cαc_\alpha为频率弯折倒谱,则:

ca(m)=12πππejωαmlog(X(ejw))dωα=12πππejωαmΣnc(n)ejwndωα=12πΣnc(n)ππejwαmejwndwα=ΣnAα(m,n)c(n)(8)\begin{aligned} c_a(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}e^{j\omega_\alpha m} log(X(e^{jw})) d \omega_\alpha \\ =& \frac{1}{2\pi} \int_{-\pi} ^{\pi}e^{j\omega_\alpha m} \Sigma_n c(n) e^{jwn} d\omega_\alpha \\ =& \frac{1}{2\pi} \Sigma_n c(n) \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{jwn} d w_\alpha \\ =& \Sigma_n A_\alpha(m,n) c(n) \end{aligned} \tag{8}
Aα(m,n)=12πππejwαmejwndwα=12πππejwαmejβ1(ωα)ndwα(9)\begin{aligned} A_\alpha(m,n) =& \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{jwn} d w_\alpha \\ =& \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{j\beta^{-1}(\omega_\alpha)n} d w_\alpha \end{aligned} \tag{9}

因此,由线性倒谱到频率弯折倒谱的转换可以通过一个线性变换得到:

cα=Aαcc_\alpha = A_{\alpha} c
Aα=(1αα2...αM101α22α(1α2)...(M1)αM2(1α2)0α(1α2)........................0(1)M(1α2)αM2.........)A_{\alpha} = \begin{pmatrix} 1 & \alpha & \alpha^2 & ... & \alpha^{M-1} \\ 0 & 1-\alpha^2 & 2\alpha(1-\alpha^2) & ... & (M-1)\alpha^{M-2}(1-\alpha^2) \\ 0 & -\alpha(1-\alpha^2) & ... & ... & ... \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ 0 & (-1)^M(1-\alpha^2)\alpha^{M-2} & ... & ...&... \end{pmatrix}
Aα(k,l)=Aα(k1,l1)+α[Aa(k,l1)Aα(k1,l)]      k>1,l>1A_\alpha(k,l) = A_\alpha(k-1,l-1) + \alpha[A_a(k,l-1) -A_\alpha(k-1,l)] \ \ \ \ \ \ \forall k>1, l>1

用这种方法计算的得到的梅尔倒谱在语音合成中一般叫做mcep。

由mcep可以恢复出频谱包络

X(ejw)=expΣm=0Mcα(m)ejωαmX(e^{jw}) = \exp \Sigma_{m=0}^M c_\alpha(m) e^{j\omega_{\alpha}m}

c=AαcαX(ejw)=expFFT(c)c = A_{-\alpha} c_{\alpha} \\ X(e^{jw}) = \exp \text{FFT}(c)

参考文献

  1. Warped Discrete-Fourier Transform: Theory and Applications https://pdfs.semanticscholar.org/bc4b/26ca9d820190a3e363be82ec4f2988e58600.pdf

  2. frequency warping revisted http://www-ist.massey.ac.nz/dbailey/sprg/pdfs/2004_DELTA_23.pdf

  3. Frequency-Warped Signal Processing for Audio Applications https://pdfs.semanticscholar.org/a850/8bc734d2c80f5e304404859cb6bf3e1f2e49.pdf