f 如何用sql 算出線性回歸(Least squares) ~ 迪貝之家

Pages

如何用sql 算出線性回歸(Least squares)


大學念農,統計是重點科目中的重點
浪費了
現在data science 當道阿.....
抄了放在這裡,供將來有機會參考
mysql> SELECT age, score FROM testscore;
+-----+-------+
| age | score |
+-----+-------+
| 5 | 5 |
| 5 | 4 |
| 5 | 6 |
| 5 | 7 |
| 6 | 8 |
| 6 | 9 |
| 6 | 4 |
| 6 | 6 |
| 7 | 8 |
| 7 | 6 |
| 7 | 9 |
| 7 | 7 |
| 8 | 9 |
| 8 | 6 |
| 8 | 7 |
| 8 | 10 |
| 9 | 9 |
| 9 | 7 |
| 9 | 10 |
| 9 | 9 |
+-----+-------+
The following equation expresses the regression line, where a and b are the intercept 
and slope of the line:
Y = bX + a
Letting age be X and score be Y, begin by computing the terms needed for the regression
equation. These include the number of observations; the means, sums, and sums of
squares for each variable; and the sum of the products of each variable:
mysql> SELECT
-> @n := COUNT(score) AS N,
-> @meanX := AVG(age) AS 'X mean',
-> @sumX := SUM(age) AS 'X sum',
-> @sumXX := SUM(age*age) AS 'X sum of squares',
-> @meanY := AVG(score) AS 'Y mean',
-> @sumY := SUM(score) AS 'Y sum',
-> @sumYY := SUM(score*score) AS 'Y sum of squares',
-> @sumXY := SUM(age*score) AS 'X*Y sum'
-> FROM testscore\G
*************************** 1. row ***************************
N: 20
X mean: 7.000000000
X sum: 140
X sum of squares: 1020
Y mean: 7.300000000
Y sum: 146
Y sum of squares: 1130
X*Y sum: 1053
From those terms, calculate the regression slope and intercept as follows:
mysql> SET @b := (@n*@sumXY - @sumX*@sumY) / (@n*@sumXX - @sumX*@sumX);
mysql> SET @a := (@meanY - @b*@meanX);
mysql> SELECT @b AS slope, @a AS intercept;
+-------------+----------------------+
| slope | intercept |
+-------------+----------------------+
| 0.775000000 | 1.875000000000000000 |
+-------------+----------------------+
The regression equation then is:
mysql> SELECT CONCAT('Y = ',@b,'X + ',@a) AS 'least-squares regression';
+-----------------------------------------+
| least-squares regression |
+-----------------------------------------+
| Y = 0.775000000X + 1.875000000000000000 |
+-----------------------------------------+
To compute the correlation coefficient, use many of the same terms:
mysql> SELECT
-> (@n*@sumXY - @sumX*@sumY)
-> / SQRT((@n*@sumXX - @sumX*@sumX) * (@n*@sumYY - @sumY*@sumY))
-> AS correlation;
+--------------------+
| correlation |
+--------------------+
| 0.6117362044219903 |
+--------------------+

這是我看了幾篇資料,比較看得懂的
最小平方迴歸分析