Previous Up Next

7.2.1  Covariance and correlation: covariance correlation covariance_correlation

The covariance of two random variables measures their connectedness; i.e., whether they tend to change with each other. If X and Y are two random variables, then the covariance is the expected value of (XX)(Y−Ȳ), where X and Ȳ are the means of X and Y, respectively. You can calculate covariances with the covariance command.

If X and Y are given by lists of the same size, then covariance(X,Y) will return their covariance. For example, if you enter

covariance([1,2,3,4],[1,4,9,16])

then you will get

25/4

Alternatively, you could use a matrix with two columns instead of two lists to enter X and Y; the command

covariance([[1,1],[2,4],[3,9],[4,16]])

is another way to enter the above calculation.

If the entries in the lists X=[a0,…,an−1] and Y=[b0,…,bn−1] have different weights, say aj and bj have weight wj, then covariance can be given a third list W=[w0,…,wn−1] (or alternatively, you could use a matrix with three columns). For example, if you enter

covariance([1,2,3,4],[1,4,9,16],[3,1,5,2])

then you will get

662/121

If each pair of entries in the lists X=[a0,…,am−1] and Y=[b0,…,b0] have different weights, say aj and bk have weight wjk, then covariance can be given a third argument of an m× n matrix W=(wjk). (Note that in this case the lists X and Y don’t have to be the same length.) For example, the covariance computed above could also have been computed by entering

covariance([1,2,3,4],[1,4,9,16], [[3,0,0,0],[0,1,0,0],[0,0,5,0],[0,0,0,2]])

which would of course return

662/121

In this case, to make it simpler to enter the data in a spreadsheet, the lists X and Y and the matrix W can be combined into a single matrix, by augmenting W with the list Y on the top and the transpose of the list X on the left, with a filler in the upper left hand corner;



"XY"Y
XTW


When you use this method, you need to give covariance a second argument of -1. The above covariance can then be computed with the command

covariance([["XY", 1,4,9,16],[1,3,0,5,0],[2,0,1,0,0],[3,0,0,5,0],[4,0,0,0,2]],-1)

The linear correlation coefficient of two random variables is another way to measure their connectedness. Given random variables X and Y, their correlation is defined as cov(X,Y)/(σ(X)σ(Y)), where σ(X) and σ(Y) are the standard deviations of X and Y, respectively. The correlation can be computed with the correlation command, which takes the same types of arguments as the covariance command. If you enter

correlation([1,2,3,4],[1,4,9,16])

you will get

100/(4*sqrt(645))

The covariance_correlation command will compute both the covariance and correlation simultaneously, and return a list with both values. This command takes the same type of arguments as the covariance and correlation commands. For example, if you enter

covariance_correlation([1,2,3,4],[1,4,9,16])

you will get

[25/4,100/(4*sqrt(645))]

Previous Up Next