エンジニアを目指す浪人のブログ

情報系に役立ちそうな応用数理をゆるめにメモします

多変量正規分布の条件付き確率分布を導出する

本記事は以下の過去記事の内容を用います.

シューア補行列の定義とその背景,逆行列補題との関係をまとめる - エンジニアを目指す浪人のブログ

対称行列のシューア補行列は対称行列であることを証明する - エンジニアを目指す浪人のブログ


応用上よく使われると思われる多変量正規分布(multivariate normal distribution)の条件付き分布(conditional probability distribution)について,その導出をメモしておくことにしました.


問題を設定するため,いくつか準備をします.

記号を準備します.

{\displaystyle \;\;\; X_1 \in \mathbb{R}^{r} \;\;\; } 確率変数
{\displaystyle \;\;\; X_2 \in \mathbb{R}^{s} \;\;\; } 確率変数
{\displaystyle \;\;\; x_1 \in \mathbb{R}^{r} \;\;\; } {\displaystyle X_1 } の実現値
{\displaystyle \;\;\; x_2 \in \mathbb{R}^{s} \;\;\; } {\displaystyle X_2 } の実現値
{\displaystyle \;\;\; \mu_1 \in \mathbb{R}^{r} \;\;\; } {\displaystyle X_1 } の平均
{\displaystyle \;\;\; \mu_2 \in \mathbb{R}^{s} \;\;\; } {\displaystyle X_2 } の平均

{\displaystyle \;\;\; X  = \begin{bmatrix} X_{1}^T & X_{2}^T \end{bmatrix}^T \in \mathbb{R}^{r+s}  }
{\displaystyle \;\;\; x  = \begin{bmatrix} x_{1}^T & x_{2}^T \end{bmatrix}^T \in \mathbb{R}^{r+s}  }
{\displaystyle \;\;\; \mu  = \begin{bmatrix} \mu_{1}^T & \mu_{2}^T \end{bmatrix}^T \in \mathbb{R}^{r+s}  }


分散共分散行列 {\displaystyle \Sigma \ (= \Sigma^T ) } を以下とします.

{\displaystyle \;\;\; \Sigma  = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22}  \end{bmatrix} \in \mathbb{R}^{ (r+s) \times (r+s)},   \;\;\;\;\;\; \Sigma_{11} \in \mathbb{R}^{r \times r}, \ \Sigma_{12} \in \mathbb{R}^{r \times s}, \ \Sigma_{21} \in \mathbb{R}^{s \times r}, \ \Sigma_{22} \in \mathbb{R}^{s \times s} }

(c.1) {\displaystyle \;\;\; \Sigma_{11} = \left( \Sigma_{11} \right)^T }
(c.2) {\displaystyle \;\;\; \Sigma_{22} = \left( \Sigma_{22} \right)^T }
(c.3) {\displaystyle \;\;\; \Sigma_{12} = \left( \Sigma_{21} \right)^T }


確率変数(ベクトル) {\displaystyle X } は平均(ベクトル) {\displaystyle \mu },分散共分散行列 {\displaystyle \Sigma } の多変量正規分布に従う,すなわち以下であるとします.多変量正規分布の定義は文献[4]にあります.

{\displaystyle \;\;\;\;\;\; X  \;\; \sim \mathcal{N}(\mu,\Sigma) }

{\displaystyle  \Leftrightarrow  \begin{bmatrix} X_{1} \\ X_{2} \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix}, \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22}  \end{bmatrix} \right) }


転置行列についての以下の性質は文献[5]にあります.

(t.1) {\displaystyle \;\;\; \left( A_1 A_2 A_3 \right)^T = \left( A_3 \right)^T  \left( A_2 \right)^T \left( A_1 \right)^T }


対称行列についての以下の性質は文献[6]にあります.

(s.1) {\displaystyle \;\; } 対称行列 {\displaystyle A } が正則 {\displaystyle \; \Longrightarrow  \; \left[ A = A^T \;\; \Leftrightarrow  \;\; A^{-1} = \left( A^{-1} \right)^T \right] }



以上の設定のもとで本記事の目的に進みます.以下の事実が成り立ちます.条件付き分布の分散共分散行列は {\displaystyle x_2 } に依存しないことに注意します.

'--------------------------------------------------------------------------------------------------------------------------------------------
事実.

{\displaystyle X_2 = x_2 } が与えられた下での {\displaystyle X_1 },すなわち確率変数 {\displaystyle X_1| X_2 = x_2 } は平均 {\displaystyle \mu_{1|2}  },分散共分散行列 {\displaystyle \Sigma_{1|2} } の多変量正規分布に従う, すなわち以下である.

{\displaystyle \;\;\; X_1| X_2 = x_2 \ \sim \mathcal{N}( \mu_{1|2},\Sigma_{1|2} ) }


{\displaystyle \;\;\; \mu_{1|2} = \mu_1 + \Sigma_{12} \Sigma_{22}^{-1} ( x_2 - \mu_{2} ) }

{\displaystyle \;\;\; \Sigma_{1|2} = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} }


証明.

{\displaystyle X_1| X_2 = x_2 } の条件付き確率分布の確率密度関数が以下であることを示せばよい.

{\displaystyle \;\;\; f_{X_1} \left( x_1 | X_2 = x_2 \right) = \frac{ 1 }{ (2 \pi )^{r/2} \ \left[ \mathrm{det} \left( \Sigma_{1|2} \right) \right]^{1/2} }  \exp \left( - \frac{1}{2} ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \right)  }


以下のようにおく.冒頭の過去記事(シューア補行列の定義)定義1.にあるように {\displaystyle S }{\displaystyle \Sigma } における {\displaystyle \Sigma_{22} } のシューア補行列である.

{\displaystyle \;\;\; \Sigma^{-1}  = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix}^{-1} }

{\displaystyle \;\;\;\;\;\;\;\;\;\;  = \begin{bmatrix} \Lambda_{11} & \Lambda_{12} \\ \Lambda_{21} & \Lambda_{22} \end{bmatrix} }

{\displaystyle \;\;\;\;\;\;\; S = \Sigma_{1|2}  }
{\displaystyle \;\;\;\;\;\;\;\;\; \left( = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right) }


冒頭の過去記事(シューア補行列の定義)事実1.(1.2)より以下を得る.

{\displaystyle \;\;\; \Lambda_{11} = ( \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} )^{-1} }
{\displaystyle \;\;\;\;\;\;\;\;\; = S^{-1} }

{\displaystyle \;\;\; \Lambda_{12} = - ( \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} )^{-1} \Sigma_{12} \Sigma_{22}^{-1} }
{\displaystyle \;\;\;\;\;\;\;\;\; = - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} }

{\displaystyle \;\;\; \Lambda_{21} = - \Sigma_{22}^{-1} \Sigma_{21} ( \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} )^{-1} }
{\displaystyle \;\;\;\;\;\;\;\;\; = - \Sigma_{22}^{-1} \Sigma_{21} S^{-1} }

{\displaystyle \;\;\; \Lambda_{22} = \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} ( \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} )^{-1} \Sigma_{12} \Sigma_{22}^{-1} }
{\displaystyle \;\;\;\;\;\;\;\;\; = \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \Sigma_{12} \Sigma_{22}^{-1} }


冒頭の過去記事(対称行列のシューア補行列)事実.より以下を得る.

(sc.1) {\displaystyle \;\;\; S = S^T  }


以降,{\displaystyle x_1 } 以外は定数であることに注意する.

{\displaystyle (x - \mu )^T \Sigma^{-1} ( x - \mu  ) = \begin{bmatrix} x_1 - \mu_{1} \\ x_2 - \mu_{2} \end{bmatrix}^T \begin{bmatrix} \Lambda_{11} & \Lambda_{12} \\ \Lambda_{21} & \Lambda_{22} \end{bmatrix} \begin{bmatrix} x_1 - \mu_{1} \\ x_2 - \mu_{2} \end{bmatrix}  }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \begin{bmatrix} x_1 - \mu_{1} \\ x_2 - \mu_{2} \end{bmatrix}^T  \begin{bmatrix} \Lambda_{11} ( x_1 - \mu_{1} ) + \Lambda_{12} ( x_2 - \mu_{2} ) \\ \Lambda_{21} ( x_1 - \mu_{1} ) + \Lambda_{22} ( x_2 - \mu_{2} ) \end{bmatrix}  }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \begin{bmatrix} ( x_1 - \mu_{1} )^T  & ( x_2 - \mu_{2} )^T \end{bmatrix}  \begin{bmatrix} \Lambda_{11} ( x_1 - \mu_{1} ) + \Lambda_{12} ( x_2 - \mu_{2} ) \\ \Lambda_{21} ( x_1 - \mu_{1} ) + \Lambda_{22} ( x_2 - \mu_{2} ) \end{bmatrix}  }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T \Lambda_{11} ( x_1 - \mu_{1} ) + ( x_1 - \mu_{1} )^T \Lambda_{12} ( x_2 - \mu_{2} )   }
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + ( x_2 - \mu_{2} )^T \Lambda_{21} ( x_1 - \mu_{1} ) + ( x_2 - \mu_{2} )^T \Lambda_{22} ( x_2 - \mu_{2} )  \;\;\; } (※)


各項を以下のように変形する.

{\displaystyle \;\;\; ( x_1 - \mu_{1} )^T \Lambda_{11} ( x_1 - \mu_{1} ) = ( x_1 - \mu_{1} )^T S^{-1} ( x_1 - \mu_{1} ) }

{\displaystyle \;\;\; ( x_1 - \mu_{1} )^T \Lambda_{12} ( x_2 - \mu_{2} ) = ( x_1 - \mu_{1})^T ( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} ) ( x_2 - \mu_{2} )  }

{\displaystyle \;\;\; ( x_2 - \mu_{2} )^T \Lambda_{21} ( x_1 - \mu_{1} ) = \left( ( x_2 - \mu_{2} )^T \Lambda_{21} ( x_1 - \mu_{1} ) \right)^T }
{\displaystyle  \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \Lambda_{21}^T  ( x_2 - \mu_{2} ) \;\;\; \because } (t.1)
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \left( - \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \right)^T  ( x_2 - \mu_{2} ) }
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \left( - ( S^{-1} )^T  ( \Sigma_{21} )^T ( \Sigma_{22}^{-1} )^T   \right)  ( x_2 - \mu_{2} ) \;\;\; \because } (t.1)
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \left( - ( S^{-1} )^T  ( \Sigma_{21} )^T \Sigma_{22}^{-1}   \right)  ( x_2 - \mu_{2} ) \;\;\;\;\;\;\; \because } (c.2)(s.1)
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \left( - ( S^{-1} )^T  \Sigma_{12} \Sigma_{22}^{-1}   \right)  ( x_2 - \mu_{2} ) \;\;\;\;\;\;\;\;\;\;\; \because } (c.3)
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = ( x_1 - \mu_{1} )^T   \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} ) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \because } (sc.1)(s.1)

{\displaystyle \;\;\; ( x_2 - \mu_{2} )^T \Lambda_{22} ( x_2 - \mu_{2} ) = ( x_2 - \mu_{2} )^T \left( \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} )   }
{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;  = \mathrm{const}  }


(※)
{\displaystyle = ( x_1 - \mu_{1} )^T S^{-1} ( x_1 - \mu_{1} ) + 2 ( x_1 - \mu_{1})^T ( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} ) ( x_2 - \mu_{2} ) + \mathrm{const}  }

{\displaystyle = ( x_1 - \mu_{1} )^T S^{-1} ( x_1 - \mu_{1} ) }
{\displaystyle + 2 \left( x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} ) - \mu_{1}^T   \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} )  \right) + \mathrm{const}  }

{\displaystyle = ( x_1 - \mu_{1} )^T S^{-1} ( x_1 - \mu_{1} ) }
{\displaystyle + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} ) - 2 \mu_{1}^T   \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} ) + \mathrm{const}  }

{\displaystyle = ( x_1 - \mu_{1} )^T S^{-1} ( x_1 - \mu_{1} ) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right)  ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = ( x_1 - \mu_{1} )^T \left( S^{-1} x_1 - S^{-1} \mu_{1} ) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + 2 x_1^T ( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const}  }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - \mu_1^T S^{-1} x_1 + \mu_1^T S^{-1} \mu_1 + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const}  }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - \mu_1^T S^{-1} x_1 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - ( \mu_1^T S^{-1} x_1 )^T \;\;\;\;\;\;\;\;\;\;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - x_1^T  \left( S^{-1} \right)^T (\mu_1^T)^T \;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} \because } (t.1)
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - x_1^T \left(S^{-1} \right)^T \mu_1 \;\;\;\;\;\;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_1 - x_1^T S^{-1} \mu_1 \;\;\;\;\;\;\; + 2 x_1^T ( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} ) ( x_2 - \mu_{2} ) + \mathrm{const} \because } (sc.1)(s.1)
{\displaystyle = x_1^T S^{-1} x_1 - 2 x_1^T S^{-1} \mu_1  \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + 2 x_1^T \left( - S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - 2 x_1^T S^{-1} \mu_1  \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; - 2 x_1^T S^{-1} \left( \Sigma_{12} \Sigma_{22}^{-1} \right) ( x_2 - \mu_{2} ) + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - 2 x_1^T S^{-1} \left( \mu_1 + \Sigma_{12} \Sigma_{22}^{-1} ( x_2 - \mu_{2} ) \right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }

{\displaystyle = x_1^T S^{-1} x_1 -  2 x_1^T S^{-1} \mu_{1|2} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - x_1^T S^{-1} \mu_{1|2} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - ( x_1^T S^{-1} \mu_{1|2} )^T \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - ( \mu_{1|2} )^T \left( S^{-1} \right)^T ( x_1^T )^T \;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} \;\;\; \because } (t.1)
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - (\mu_{1|2})^T \left( S^{-1} \right)^T x_1 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - (\mu_{1|2})^T S^{-1} x_1 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} \;\;\; \because } (sc.1)(s.1)
{\displaystyle = x_1^T S^{-1} x_1 - x_1^T S^{-1} \mu_{1|2} - (\mu_{1|2})^T S^{-1} x_1 + (\mu_{1|2})^T S^{-1} \mu_{1|2} + \mathrm{const} }
{\displaystyle = ( x_1 - \mu_{1|2} )^T \left( S^{-1} x_1 -  S^{-1} \mu_{1|2} \right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = ( x_1 - \mu_{1|2} )^T S^{-1} ( x_1 - \mu_{1|2} ) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} }
{\displaystyle = ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \mathrm{const} \;\;\; } (※※)


{\displaystyle f_{X_1} \left( x_1 | X_2 = x_2 \right) = \frac{ f_{X_1,X_2} (x_1, x_2)}{ f_{X_2}(x_2) } \;\;\; \because } 文献[7]

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \propto f_{X_1,X_2} (x_1, x_2) \;\;\;\; \because X_2 }{\displaystyle x_2 } に固定している

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = f_{X} (x) }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \frac{ 1 }{ (2 \pi )^{(r+s)/2} \ \left[ \mathrm{det} \left( \Sigma \right) \right]^{1/2} } \exp \left( - \frac{1}{2} ( x - \mu )^T \Sigma^{-1} ( x - \mu ) \right) }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \propto \exp \left( - \frac{1}{2} ( x - \mu )^T \Sigma^{-1} ( x - \mu ) \right) }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \exp \left( - \frac{1}{2} \left( ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) + \mathrm{const}  \right) \right) \;\;\; \because } (※※)

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \exp \left( - \frac{1}{2} ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \right) \exp \left( - \frac{1}{2} \left( \mathrm{const} \right) \right) }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \propto \exp \left( - \frac{1}{2} ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \right)  }

{\displaystyle \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \propto  \frac{ 1 }{ (2 \pi )^{r/2} \ \left[ \mathrm{det} \left( \Sigma_{1|2} \right) \right]^{1/2} }  \exp \left( - \frac{1}{2} ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \right)  }


{\displaystyle \;\;\; \Rightarrow f_{X_1} \left( x_1 | X_2 = x_2 \right) = \frac{ 1 }{ (2 \pi )^{r/2} \ \left[ \mathrm{det} \left( \Sigma_{1|2} \right) \right]^{1/2} }  \exp \left( - \frac{1}{2} ( x_1 - \mu_{1|2} )^T \left( \Sigma_{1|2} \right)^{-1} ( x_1 - \mu_{1|2} ) \right)  }

(証明終わり)

'--------------------------------------------------------------------------------------------------------------------------------------------

以上,多変量正規分布の条件付き確率分布を導出しました.



参考文献
[1] Cross Validated https://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution
[2] @mochio様のブログ https://qiita.com/mochio/items/280c229bee5fe282852b
[3] Stanford University Andrew Ng先生のノート http://cs229.stanford.edu/notes/cs229-notes9.pdf
[4] Wikipedia Multivariate normal distribution のページ https://en.wikipedia.org/wiki/Multivariate_normal_distribution
[5] Wikipedia Transpose のページ https://en.wikipedia.org/wiki/Transpose
[6] Wikipedia Symmetric matrix のページ https://en.wikipedia.org/wiki/Symmetric_matrix
[7] Wikipedia Conditional probability distribution のページ https://en.wikipedia.org/wiki/Conditional_probability_distribution