直观理解Law of Total Variance(方差分解公式)

佚名 2024-04-18 10:55:06 举报

篇首语：本文由小编为大家整理，主要介绍了直观理解Law of Total Variance(方差分解公式)相关的知识，希望对你有一定的参考价值。

Law of Iterated Expectations (LIE)

在讲方差分解之前，我们需要先理解双期望定理。对于一个X，我们可以根据不同的Y将其任意的划分为几部分：

于是经过这样的划分，X总体的均值其实是等价于每一个划分下均值的总体均值。

E ⁡ [ X ] = E ⁡ [ E ⁡ [ X ∣ Y ] ] \operatornameE [X]=\operatornameE [\operatornameE [X|Y]] E[X]=E[E[X∣Y]]

举个例子，假设一共划分为三部分，每部分的均值分别为70 60 80, 于是

E [ X ] = E [ E [ X ∣ Y ] ] = E [ E [ X ∣ Y = y 1 ] + E [ X ∣ Y = y 2 ] + E [ X ∣ Y = y 3 ] ] = 70 + 60 + 80 3 = 70 \beginaligned & E[X]=E[E[X\mid Y]]\\ = & E[E[X\mid Y=y_1 ]+E[X\mid Y=y_2 ]+E[X\mid Y=y_3 ]]\\ = & \frac70+60+803\\ = & 70 \endaligned ===E[X]=E[E[X∣Y]]E[E[X∣Y=y1]+E[X∣Y=y2]+E[X∣Y=y3]]370+60+8070

从理论上，
E [ E [ X ∣ Y ] ] = ∫ p ( y ) ∫ x p ( x ∣ y ) d x d y = ∫ p ( x , y ) x d x d y = ∫ p ( x ) x d x = E [ X ] \beginaligned E[E[X\mid Y]] & =\int p( y)\int xp( x|y) dxdy\\ & =\int p( x,y) xdxdy\\ & =\int p( x) xdx\\ & =E[ X] \endaligned E[E[X∣Y]]=∫p(y)∫xp(x∣y)dxdy=∫p(x,y)xdxdy=∫p(x)xdx=E[X]

Mathematical Derivation of the Law of Total Variance

另一个重要的规则是total variance：
V a r ( X ) = E ⁡ [ V a r ( X ∣ Y ) ] + V a r ( E ⁡ [ X ∣ Y ] ) Var(X)=\operatornameE [Var(X\mid Y)\ ]+Var(\operatornameE [X\mid Y]) Var(X)=E[Var(X∣Y) ]+Var(E[X∣Y])

它刻画了方差的两个组成成分：
E ⁡ [ V a r ( X ∣ Y ) ] = E ⁡ [ E ⁡ [ X 2 ∣ Y ] − ( E ⁡ [ X ∣ Y ] ) 2 ] Def. of variance = E ⁡ [ E ⁡ [ X 2 ∣ Y ] ] − E ⁡ [ ( E ⁡ [ X ∣ Y ] ) 2 ] Lin. of Expectation = E ⁡ [ X 2 ] − E ⁡ [ ( E ⁡ [ X ∣ Y ] ) 2 ] law of Ite. Expect V a r ( E [ X ∣ Y ] ) = E [ ( E [ X ∣ Y ] ) 2 ] − E [ E [ X ∣ Y ] ] 2 Def. of variance = E [ ( E [ X ∣ Y ] ) 2 ] − E [ X ] 2 law of Ite. Expect ∴ E ⁡ [ V a r ( X ∣ Y ) ] + V a r ( E ⁡ [ X ∣ Y ] ) = E ⁡ [ X 2 ] − E [ X ] 2 = V a r ( X ) \beginaligned \operatornameE [Var(X\mid Y)\ ] & =\operatornameE [\ \operatornameE [X^2 \mid Y\ ]-(\operatornameE [X\mid Y])^2 \ ] & \textDef. of variance\\ & =\operatornameE [\ \operatornameE [X^2 \mid Y]\ ]-\operatornameE [\ (\operatornameE [X\mid Y])^2 \ ] & \textLin. of Expectation\\ & =\operatornameE [X^2 ]-\operatornameE [\ (\operatornameE [X\mid Y])^2 \ ] & \textlaw of Ite. Expect \endaligned\\ \\ \beginaligned Var(E[X\mid Y]) & =E[( E[X\mid Y])^2 ]-E[E[X\mid Y]]^2 & \textDef. of variance\\ & =E[( E[X\mid Y])^2 ]-E[X]^2 & \textlaw of Ite. Expect \endaligned\\ \\ \therefore \ \operatornameE [Var(X\mid Y)\ ]+Var(\operatornameE [X\mid Y])=\operatornameE [X^2 ]-E[X]^2 =Var( X) E[Var(X∣Y) ]=E[ E[X2∣Y ]−(E[X∣Y])2 ]=E[ E[X2∣Y] ]−E[ (E[X∣Y])2 ]=E[X2]−E[ (E[X∣Y])2 ]Def. of varianceLin. of Expectationlaw of Ite. ExpectVar(E[X∣Y])=E[(E[X∣Y])2]−E[E[X∣Y]]2=E[(E[X∣Y])2]−E[X]2Def. of variancelaw of Ite. Expect∴ E[Var(X∣Y) ]+Var(E[X∣Y])=E[X2]−E[X]2=Var(X)

怎么理解呢？

什么是 E ⁡ [ V a r ( X ∣ Y ) ] \displaystyle \operatornameE [Var(X\mid Y)\ ] E[Var(X∣Y) ]? 直观来看，他是每个划分下方差的均值，因此，它刻画了样本内差异的均值。
什么是 V a r ( E [ X ∣ Y ] ) \displaystyle Var(E[X\mid Y]) Var(E[X∣Y])? 它刻画了不同分组下均值的差异程度，因此，它刻画了样本间的差异程度。

因此，方差刻画了样本内和样本间差异的叠加，这就是Law of Total Variance.

与k-means聚类的联系

熟悉聚类算法的同学可能意识到，k means聚类其实有两种等价的学习方式，分别是，最小化类内距离(within-cluster sum of squares (WCSS))：
arg min ⁡ S ∑ i = 1 k ∑ x ∈ S i ∥ x − μ i ∥ 2 = arg min ⁡ S ∑ i = 1 k ∣ S i ∣ Var ⁡ S i \displaystyle \underset\mathbfS\operatornamearg\ min\sum ^k_i=1\sum _\mathbfx \in S_i\Vert \mathbfx -\boldsymbol\mu _i\Vert ^2 =\underset\mathbfS\operatornamearg\ min\sum ^k_i=1 |S_i |\operatornameVar S_i

以上是关于直观理解Law of Total Variance(方差分解公式)的主要内容，如果未能解决你的问题，请参考以下文章