### 7.2.4  Linear regression: linear_regressionlinear_regression_plot

Given a set of points (x0,y0),…,(xn−1,yn−1), linear regression finds the line y=mx+b that comes closest to passing through all of the points; i.e., that makes √(y0 − (m x0 + b))2 + … + (yn−1 − (m xn−1 + b))2 as small as possible. Given a set of points (a two-column matrix) or two lists of numbers (the x- and y-coordinates), the linear_regression command will find the values of m and b which determine the line. For example, if you enter

linear_regression([[0,0],[1,1],[2,4],[3,9],[4,16]])

or

linear_regression([0,1,2,3,4],[0,1,4,9,16])

you will get

4, -2

which means that the line y = 4x − 2 is the best fit line.

The best fit line can be drawn with the linear_regression_plot command; if you enter

linear_regression_plot([0,1,2,3,4],[0,1,4,9,16])

you will get

This will draw the line (in this case y=4x−2) and give you the equation at the top, as well as the R2 value, which is

R2 =
 n−1 ∑ j=0
(m xj + b − ȳ)2
 n−1 ∑ j=0
(yj − ȳ)2

(The R2 value will be between 0 and 1 and is one measure of how good the line fits the data; a value close to 1 indicates a good fit, a value close to 0 indicates a bad fit.)