Table Of Contents
Open Table Of Contents
1. 倒谱
倒谱
c ( m ) = 1 2 π ∫ − π π l o g X ( w ) e j ω m d ω = 1 K Σ k = 0 K l o g X ( k ) e j 2 π m k K (1) \begin{aligned}
c(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}logX(w) e^{j\omega m} d \omega \\
=& \frac{1}{K}
\Sigma_{k=0} ^K logX(k) e^{j2\pi \frac{ mk}{K}}
\end{aligned}
\tag{1} c ( m ) = = 2 π 1 ∫ − π π l o g X ( w ) e jωm d ω K 1 Σ k = 0 K l o g X ( k ) e j 2 π K mk ( 1 )
2. 梅尔倒谱
梅尔尺度:人耳的频率分辨率在不是均等的,而是随着频率升高而递减
f m e l = 2595 log 10 ( 1 + f 700 ) (2) f_{mel} = 2595\ \log_{10}(1+\frac{f}{700}) \tag{2} f m e l = 2595 log 10 ( 1 + 700 f ) ( 2 )
根据(4)构造一个三角滤波器组 H m e l H_{mel} H m e l ,
X m e l = H m e l X (3) X_{mel} = H_{mel} X
\tag{3} X m e l = H m e l X ( 3 )
X m e l X_{mel} X m e l 为梅尔频谱,进一步由梅尔频谱可以得到梅尔倒谱
c m e l ( m ) = 1 2 π ∫ − π π l o g [ H m e l ( w ) X ( w ) ] e j ω m d ω = 1 L Σ l = 0 L l o g X m e l ( l ) e j 2 π m l L (4) \begin{aligned}
c_{mel}(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}log[H_{mel}(w)X(w)] e^{j\omega m} d \omega \\
=& \frac{1}{L}
\Sigma_{l=0} ^L logX_{mel}(l) e^{j2\pi \frac{ ml}{L}}
\end{aligned}\tag{4} c m e l ( m ) = = 2 π 1 ∫ − π π l o g [ H m e l ( w ) X ( w )] e jωm d ω L 1 Σ l = 0 L l o g X m e l ( l ) e j 2 π L m l ( 4 )
X m e l ( l ) = Σ k = 0 K H m e l ( l , k ) X ( k ) (5) X_{mel}(l) = \Sigma_{k=0}^K H_{mel}(l,k) X(k)
\tag{5} X m e l ( l ) = Σ k = 0 K H m e l ( l , k ) X ( k ) ( 5 )
梅尔三角滤波器组的成分如下:
H m e l ( l , k ) = { 0 f k < f l − 1 or f k > f l + 1 f k − f l − 1 ( f l − f l − 1 ) ( f l + 1 − f l − 1 ) f l − 1 ≤ f k ≤ f l f l + 1 − f k ( f l + 1 − f l ) ( f l + 1 − f l − 1 ) f l ≤ f k ≤ f l + 1 for l = 1 to L , k = 1 to K (6) H_{mel}(l,k) =\left\{
\begin{aligned}
&0 && f_k < f_{l-1}\text{ or } f_k > f_{l+1}\\
&\frac{f_k - f_{l-1}}{(f_l - f_{l-1})(f_{l+1}-f_{l-1})} && f_{l-1} \leq f_k \leq f_{l} \\
&\frac{f_{l+1} - f_k}{(f_{l+1} - f_{l})(f_{l+1}-f_{l-1})} && f_{l} \leq f_k \leq f_{l+1} \\
\end{aligned}
\right.
\text{ for } l = 1 \text{ to } L,\ k = 1 \text{ to } K
\tag{6} H m e l ( l , k ) = ⎩ ⎨ ⎧ 0 ( f l − f l − 1 ) ( f l + 1 − f l − 1 ) f k − f l − 1 ( f l + 1 − f l ) ( f l + 1 − f l − 1 ) f l + 1 − f k f k < f l − 1 or f k > f l + 1 f l − 1 ≤ f k ≤ f l f l ≤ f k ≤ f l + 1 for l = 1 to L , k = 1 to K ( 6 )
这种方法计算的梅尔倒谱一般叫做MFCC特征,一般取L = 13 L=13 L = 13
根据(3), H m e l H_{mel} H m e l 是( L ∗ K ) (L * K) ( L ∗ K ) 的矩阵,由于 L < K L<K L < K , 由X m e l X_{mel} X m e l 恢复X X X 是个欠定问题,因此MFCC特征通常用在不需要恢复频谱包络的问题中,例如语音识别,音乐风格分类等。 对于需要恢复频谱包络的问题(如语音合成), 一般采用全通滤波器进行频率弯折。
3. 全通滤波器
3.1 mcep
如下全通滤波器可以对频率进行可控弯折
H ( z ) = z − 1 − α 1 − α z (5) H(z) = \frac{z^{-1}-\alpha}{1-\alpha z} \tag{5} H ( z ) = 1 − α z z − 1 − α ( 5 )
ω α = β α ( ω ) = ∠ H = arctan ( 1 − α 2 ) sin ω ( 1 + α 2 ) cos ω − 2 α β α − 1 ( ω ) = arctan ( 1 − α 2 ) sin ω ( 1 + α 2 ) cos ω + 2 α (6) \begin{aligned}
\omega_{\alpha} = \beta_{\alpha}(\omega) = \angle H = \arctan\frac{(1-\alpha^2)\sin \omega}{(1+\alpha^2)\cos\omega -2 \alpha} \tag{6}
\\ \beta^{-1}_\alpha(\omega) = \arctan\frac{(1-\alpha^2)\sin \omega}{(1+\alpha^2)\cos\omega + 2 \alpha}
\end{aligned} ω α = β α ( ω ) = ∠ H = arctan ( 1 + α 2 ) cos ω − 2 α ( 1 − α 2 ) sin ω β α − 1 ( ω ) = arctan ( 1 + α 2 ) cos ω + 2 α ( 1 − α 2 ) sin ω ( 6 )
通过调节α \alpha α ,可以调节弯折程度,α > 0 \alpha>0 α > 0 时,频率采样率随着频率升高而降低; α < 0 \alpha<0 α < 0 时,频率采样率随着频率升高而升高。 在16 k H z 16kHz 16 k Hz 的采样率下,α = 0.42 \alpha=0.42 α = 0.42 可以作为梅尔尺度的近似估计。
定义c α c_\alpha c α 为频率弯折倒谱,则:
c a ( m ) = 1 2 π ∫ − π π e j ω α m l o g ( X ( e j w ) ) d ω α = 1 2 π ∫ − π π e j ω α m Σ n c ( n ) e j w n d ω α = 1 2 π Σ n c ( n ) ∫ − π π e j w α m e j w n d w α = Σ n A α ( m , n ) c ( n ) (8) \begin{aligned}
c_a(m) =& \frac{1}{2\pi} \int_{-\pi}^{\pi}e^{j\omega_\alpha m} log(X(e^{jw})) d \omega_\alpha \\
=& \frac{1}{2\pi} \int_{-\pi}
^{\pi}e^{j\omega_\alpha m} \Sigma_n c(n) e^{jwn} d\omega_\alpha \\
=& \frac{1}{2\pi} \Sigma_n c(n) \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{jwn} d w_\alpha \\
=& \Sigma_n A_\alpha(m,n) c(n)
\end{aligned}
\tag{8} c a ( m ) = = = = 2 π 1 ∫ − π π e j ω α m l o g ( X ( e j w )) d ω α 2 π 1 ∫ − π π e j ω α m Σ n c ( n ) e j w n d ω α 2 π 1 Σ n c ( n ) ∫ − π π e j w α m e j w n d w α Σ n A α ( m , n ) c ( n ) ( 8 )
A α ( m , n ) = 1 2 π ∫ − π π e j w α m e j w n d w α = 1 2 π ∫ − π π e j w α m e j β − 1 ( ω α ) n d w α (9) \begin{aligned}
A_\alpha(m,n) =& \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{jwn} d w_\alpha \\
=& \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{jw_\alpha m} e^{j\beta^{-1}(\omega_\alpha)n} d w_\alpha
\end{aligned}
\tag{9} A α ( m , n ) = = 2 π 1 ∫ − π π e j w α m e j w n d w α 2 π 1 ∫ − π π e j w α m e j β − 1 ( ω α ) n d w α ( 9 )
因此,由线性倒谱到频率弯折倒谱的转换可以通过一个线性变换得到:
c α = A α c c_\alpha = A_{\alpha} c c α = A α c
A α = ( 1 α α 2 . . . α M − 1 0 1 − α 2 2 α ( 1 − α 2 ) . . . ( M − 1 ) α M − 2 ( 1 − α 2 ) 0 − α ( 1 − α 2 ) . . . . . . . . . . . . . . . . . . . . . . . . 0 ( − 1 ) M ( 1 − α 2 ) α M − 2 . . . . . . . . . ) A_{\alpha} = \begin{pmatrix}
1 & \alpha & \alpha^2 & ... & \alpha^{M-1} \\
0 & 1-\alpha^2 & 2\alpha(1-\alpha^2) & ... & (M-1)\alpha^{M-2}(1-\alpha^2) \\
0 & -\alpha(1-\alpha^2) & ... & ... & ... \\
. & . & . & . & . \\
. & . & . & . & . \\
. & . & . & . & . \\
0 & (-1)^M(1-\alpha^2)\alpha^{M-2} & ... & ...&...
\end{pmatrix} A α = 1 0 0 . . . 0 α 1 − α 2 − α ( 1 − α 2 ) . . . ( − 1 ) M ( 1 − α 2 ) α M − 2 α 2 2 α ( 1 − α 2 ) ... . . . ... ... ... ... . . . ... α M − 1 ( M − 1 ) α M − 2 ( 1 − α 2 ) ... . . . ...
A α ( k , l ) = A α ( k − 1 , l − 1 ) + α [ A a ( k , l − 1 ) − A α ( k − 1 , l ) ] ∀ k > 1 , l > 1 A_\alpha(k,l) = A_\alpha(k-1,l-1) + \alpha[A_a(k,l-1) -A_\alpha(k-1,l)] \ \ \ \ \ \ \forall k>1, l>1 A α ( k , l ) = A α ( k − 1 , l − 1 ) + α [ A a ( k , l − 1 ) − A α ( k − 1 , l )] ∀ k > 1 , l > 1
用这种方法计算的得到的梅尔倒谱在语音合成中一般叫做mcep。
由mcep可以恢复出频谱包络
X ( e j w ) = exp Σ m = 0 M c α ( m ) e j ω α m X(e^{jw}) = \exp \Sigma_{m=0}^M c_\alpha(m) e^{j\omega_{\alpha}m} X ( e j w ) = exp Σ m = 0 M c α ( m ) e j ω α m
或
c = A − α c α X ( e j w ) = exp FFT ( c ) c = A_{-\alpha} c_{\alpha} \\
X(e^{jw}) = \exp \text{FFT}(c) c = A − α c α X ( e j w ) = exp FFT ( c )
参考文献
Warped Discrete-Fourier Transform:
Theory and Applications
https://pdfs.semanticscholar.org/bc4b/26ca9d820190a3e363be82ec4f2988e58600.pdf
frequency warping revisted
http://www-ist.massey.ac.nz/dbailey/sprg/pdfs/2004_DELTA_23.pdf
Frequency-Warped Signal Processing for Audio
Applications
https://pdfs.semanticscholar.org/a850/8bc734d2c80f5e304404859cb6bf3e1f2e49.pdf