The first step in analyzing multivariate data is computing the mean vector and the variance covariance matrix.
Example: Sample data matrix
Considered the following matrix:
\[x=\left[ \begin{matrix}4.0 & 2.0 & .60 \\4.2 & 2.1 & .59 \\3.9 & 2.0 & .58 \\4.3 & 2.1 & .62 \\4.1 & 2.2 & .63 \\\end{matrix}\right]\]
The set of 5 observations measuring 3 variables can be described by its mean vector and variance-covariance matrix. The three variables, from left to right are length, width, and height of a certain object, for example. Each row vector Xi is another observation of the three variables (or components).
Definition of mean vector and variance covariance matrix
The mean vector consists of the means of each variable and the variance-covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions.
The formula for computing the covariance of the variables $X$ and $Y$ is
$\operatorname{cov}=\frac{\sum\limits_{i=1}^{n}{\left(
{{X}_{i}}-\bar{x} \right)}\left( {{Y}_{i}}-\bar{y} \right)}{n-1}$
Where
$\bar{x}$ and $\bar{y}$ - denoting the means of X and Y, respectively.
$\bar{x}$ and $\bar{y}$ - denoting the means of X and Y, respectively.
The Results are:
\[\bar{x}=\left[ \begin{matrix}
4.10 & 2.08 & .604 \\
\end{matrix} \right]\]
\[\bar{x}=\left[ \begin{matrix}
4.10 & 2.08 & .604 \\
\end{matrix} \right]\]
\[S=\left[ \begin{matrix} 0.025 & 0.0075 & 0.00175 \\
0.0075 & 0.0070 & 0.00135 \\
0.00175 & 0.00135 & 0.00043 \\ \end{matrix} \right]\]
0.0075 & 0.0070 & 0.00135 \\
0.00175 & 0.00135 & 0.00043 \\ \end{matrix} \right]\]
where the mean vector contains the arithmetic
averages of the three variables and the (unbiased) variance-covariance matrix $S$ is calculated by
\[S=\frac{1}{n-1}\sum\limits_{i=1}^{n}{\left( {{X}_{i}}-\bar{X} \right){{\left( {{X}_{i}}-\bar{X} \right)}^{\prime }}}\]
where $n=5$ for this example
Thus, $0.025$ is the variance of the length
variable, $0.0075$ is the covariance between the length and the width variables, $0.00175$ is the covariance between the length and the height variables, $0.007$ is
the variance of the width variable, $0.00135$ is the covariance between the width
and height variables and $.00043$ is the variance of the height variable.
Centroid, dispersion matrix
The mean vector is often referred to as the centroid and the variance-covariance matrix as the dispersion or dispersion matrix. Also, the terms variance-covariance matrix and covariance matrix are used interchangeably.
No comments:
Post a Comment