0% found this document useful (0 votes)
52 views

Tarea 1

1. The document discusses various methods for visually representing multivariate data, including linking multiple two-dimensional scatter plots, rotated three-dimensional plots, arrays of growth curves, star plots, and Chernoff faces. 2. It provides examples of each method using different datasets to illustrate how each can be used to better understand relationships within multivariate data. 3. Specifically, it shows linked scatter plots of variables from an automotive dataset, rotated 3D plots of engine displacement, power, and fuel efficiency, arrays of bear weight curves over time, a star plot of damage factors for composite materials, and a Chernoff face mapping various variables to facial features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Tarea 1

1. The document discusses various methods for visually representing multivariate data, including linking multiple two-dimensional scatter plots, rotated three-dimensional plots, arrays of growth curves, star plots, and Chernoff faces. 2. It provides examples of each method using different datasets to illustrate how each can be used to better understand relationships within multivariate data. 3. Specifically, it shows linked scatter plots of variables from an automotive dataset, rotated 3D plots of engine displacement, power, and fuel efficiency, arrays of bear weight curves over time, a star plot of damage factors for composite materials, and a Chernoff face mapping various variables to facial features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Aspects of Multivariate Analysis

1. Probe that for the sample correlation coefficient the expression 𝑟𝑖𝑘 = 𝑟𝑘𝑖 is true.

The sample covariance measures the association between the 𝑖th and 𝑗th variables.
The covariance reduces to the sample variance when 𝑖 = 𝑘 and is evident that 𝑠𝑖𝑘 =
𝑠𝑘𝑖 due to its Symmetry property 𝑐𝑜𝑣(𝑋, 𝑌) = 𝑐𝑜𝑣(𝑌, 𝑋).

𝑛 𝑛
1 1
𝑠𝑖𝑘 = ∑(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )(𝑥𝑗𝑘 − 𝑥̅ 𝑘 ) = 𝑠𝑘𝑖 = ∑(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )
𝑛 𝑛
𝑗=1 𝑗=1
𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑝 𝑎𝑛𝑑 𝑘 = 1,2, … , 𝑝

The symmetry is possible due to the product of the two independent terms, those
related to 𝑖 and those related to 𝑘, which does not alter the results when changing the
subscripts.

In the same way the result of correlation coefficient equation is not altered changing
subscripts.

𝑠𝑖𝑘 𝑠𝑘𝑖 (∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )(𝑥𝑗𝑘 − 𝑥̅ 𝑘 ))


𝑟𝑖𝑘 = 𝑟𝑘𝑖 = = =
√𝑠𝑖𝑖 √𝑠𝑘𝑘 √𝑠𝑘𝑘 √𝑠𝑖𝑖 2
√∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 ) √∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
2

(∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )(𝑥𝑗𝑖 − 𝑥̅ 𝑖 ))
=
2 2
√∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 ) √∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )
Example:
According to the example 1.2
𝑥̅
̅ = [ 1 ] = [50]
𝒙
𝑥̅ 2 4

𝑠11 = 34
𝑠22 = 0.5
𝑠12 = −1.5
𝑠12 = 𝑠21

Sample variance and covariance and sample correlation arrays:

34 −1.5] 1 −0.36
𝒔=[ 𝑎𝑛𝑑 𝑹 = [ ]
−1.5 0.5 −0.36 1
2. Correlation properties

Probe that: the value of 𝑟𝑖𝑘 remains unchanged if the measurements of the 𝑖th variable
are changed to 𝑦𝑗𝑖 = 𝑎𝑥𝑗𝑖 + 𝑏, 𝑗 = 1,2, . . , 𝑛 and the values of the 𝑘th variable are
changed to 𝑦𝑗𝑘 = 𝑐𝑥𝑗𝑘 + 𝑑, 𝑗 = 1,2, . . , 𝑛. Provided that the constants a and c have
the same sign.

In first instance we know the covariance properties:

• 𝑐𝑜𝑣(𝑎𝑋, 𝑏𝑌) = 𝑎𝑏(𝑐𝑜𝑣(𝑋, 𝑌))


• 𝑐𝑜𝑣(𝑋 + 𝑎, 𝑌 + 𝑏) = 𝑐𝑜𝑣(𝑋, 𝑌)

Substituting 𝑦𝑗𝑘 = 𝑐𝑥𝑗𝑘 + 𝑑 and 𝑦𝑗𝑖 = 𝑎𝑥𝑗𝑖 + 𝑏 into 𝑟𝑖𝑘 we have:

∑𝑛𝑗=1(𝑦𝑗𝑖 − 𝑦̅𝑖 )(𝑦𝑗𝑘 − 𝑦̅𝑘 )


𝑟𝑖𝑘 =
2 2
√∑𝑛𝑗=1(𝑦𝑗𝑖 − 𝑦̅𝑖 ) √∑𝑛𝑗=1(𝑦𝑗𝑘 − 𝑦̅𝑘 )
∑𝑛𝑗=1(𝑎𝑥𝑗𝑖 + 𝑏 − 𝑎𝑥̅ 𝑖 − 𝑏)(𝑐𝑥𝑗𝑘 + 𝑑 − 𝑐𝑥̅ 𝑘 − 𝑑)
=
2 2
√∑𝑛𝑗=1(𝑎𝑥𝑗𝑖 + 𝑏 − 𝑎𝑥̅ 𝑖 − 𝑏) √∑𝑛𝑗=1(𝑐𝑥𝑗𝑘 + 𝑑 − 𝑐𝑥̅ 𝑘 − 𝑑)

Rearranging terms we get the same expression to 𝑟𝑖𝑘

𝑎𝑐 ∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
𝑟𝑖𝑘 =
2 2
√𝑎2 ∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 ) √𝑐 2 ∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
𝑎𝑐 ∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
=
2 2
𝑎𝑐√∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 ) √∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 )(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )
=
2 2
√∑𝑛𝑗=1(𝑥𝑗𝑖 − 𝑥̅ 𝑖 ) √∑𝑛𝑗=1(𝑥𝑗𝑘 − 𝑥̅ 𝑘 )

3. Data displays and pictorial representations.

a. Linking multiple two-dimensional scatter plots.


Linking multiple two-dimensional scatter plots show
plots of pairs of variables organized in a nxn array.
That is when we have (𝑥1 , 𝑥2 ), the 𝑥1 values are
plotted along the horizontal axis, and the 𝑥2 values
are plotted along the vertical axis. This example
shows how to visualize multivariate data using
various statistical plots. In this example, we'll usea
dataset that contains various measured variables for
about 400 automobiles and illustrate multivariate
visualization using the values for fuel efficiency (in
miles per gallon, MPG), acceleration (time from 0-
60MPH in sec), engine displacement (in cubic inches),
weight, and horsepower.

b. Rotated plots in three dimensions

Here, we plot displacement along the x axis,


power along the y axis, and fuel efficiency
along the z axis, and we represent each car with
a dot. This 3D visualization is shown from four
different perspectives. These views are
obtained by continually rotating and turning
the three-dimensional coordinate axes.
Spinning the coordinate axes allows one to get
a better understanding of the three-
dimensional aspects of the data.

c. Graphs of growth curves (arrays of growth curves).

In general, repeated measurements of the same characteristic on the same unit


or subject can gives rise to a growth curve if an increasing, decreasing, or even
an increasing followed by a decreasing, pattern is expected. First, for each
bear, we plot the weights versus ages:

The next figures give the array of seven curves for weight in individual curves:

d. Stars
For nonnegative observations in two dimensions, we can construct circles of
a fixed reference radius with a number p of variables equally spaced rays from
the center of the circle. The lengths of the rays are the values of the variables
at the ends of the rays are connected by straight lines and forma star. Star plots
are useful to standardize observations where the center of the circle represent
the smallest standardized observation. In the next figures we can see the
representation of damage for every layer orientation in a test of a composite
laminated.

e. Chernoff Faces

invented by applied mathematician, statistician, and physicist Herman


Chernoff in 1973, display multivariate data in the shape of a human face. The
individual parts, such as eyes, ears, mouth, and nose represent values of the
variables by their shape, size, placement, and orientation. The idea behind
using faces is that humans easily recognize faces and notice small changes
without difficulty.
The below example maps data onto a real geographical space, but that is not
strictly speaking necessary, the map could have been a set of two-dimensional
data of more abstract nature. Using the following key to map facial features:

You might also like