With the Coronavirus pandemic in 2020, the question has been raised as to whether the virus is seasonal; in other words, does Coronavirus spread more slowly as the temperature increases? I've decided to investigte this question as an example of regression analysis. This example should not be used as a good example of methodology in terms of selecting appropriate data to answer questions (there's quite a large number of methodological flaws in the choice of data); it's more a mathematical demonstration.
Here are the raw data. On March 3, there were 52 countries that had reported at least 2 cases of the virus. These are listed along with the average temperatures of their capital cities in March (except for China, where I used Wuhan's average temperature, and Italy, where I used Milan's). Under the assumption that the number of Coronavirus cases grows exponentially, we'll look at the natural logarithm of the number of cases (specifically, the difference in the logarithms) in order to be able to use linear regression.
Country | Average Temperature in March | Cases, March 3 | ln (Cases, March 3) | Cases, March 25 | ln (Cases, March 25) | ln (Cases, March 25) − ln (Cases, March 3) |
---|---|---|---|---|---|---|
China | 6.7 | 80304 | 11.2935747118947 | 81848 | 11.3126191475588 | 0.0190444356640604 |
South Korea | 5.7 | 4812 | 8.47886807709457 | 9137 | 9.12008738299862 | 0.641219305904052 |
Australia | 17.6 | 33 | 3.49650756146648 | 2252 | 7.71957398925958 | 4.2230664277931 |
Malaysia | 27.6 | 29 | 3.36729582998647 | 1624 | 7.39264752072162 | 4.02535169073515 |
Japan | 8.7 | 268 | 5.59098698051086 | 1193 | 7.08422642209792 | 1.49323944158706 |
Singapore | 28.0 | 108 | 4.68213122712422 | 558 | 6.32435896238131 | 1.64222773525709 |
Philippines | 28.7 | 3 | 1.09861228866811 | 552 | 6.31354804627709 | 5.21493575760898 |
Vietnam | 20.0 | 16 | 2.77258872223978 | 134 | 4.89783979995091 | 2.12525107771113 |
New Zealand | 15.8 | 2 | 0.693147180559945 | 189 | 5.24174701505964 | 4.5485998344997 |
Italy | 9.0 | 2036 | 7.61874237767041 | 69176 | 11.1444092606403 | 3.52566688296985 |
Spain | 11.2 | 114 | 4.7361984483945 | 39673 | 10.5884261345462 | 5.85222768615169 |
Germany | 5.1 | 157 | 5.05624580534831 | 31554 | 10.3594556428174 | 5.30320983746909 |
France | 8.8 | 191 | 5.25227342804663 | 22025 | 9.99993345080438 | 4.74766002275775 |
Switzerland | 5.3 | 30 | 3.40119738166216 | 8789 | 9.08125621856465 | 5.68005883690249 |
United Kingdom | 6.9 | 39 | 3.66356164612965 | 8081 | 8.99727090623345 | 5.3337092601038 |
Netherlands | 6.1 | 18 | 2.89037175789616 | 5560 | 8.62335338724463 | 5.73298162934846 |
Austria | 5.7 | 18 | 2.89037175789616 | 5282 | 8.57206009285708 | 5.68168833496091 |
Belgium | 6.8 | 8 | 2.07944154167984 | 4269 | 8.35913488675796 | 6.27969334507813 |
Norway | 3.3 | 25 | 3.2188758248682 | 2566 | 7.85010354517558 | 4.63122772030738 |
Sweden | 0.1 | 15 | 2.70805020110221 | 2272 | 7.72841577984104 | 5.02036557873883 |
Portugal | 14.9 | 2 | 0.693147180559945 | 2362 | 7.76726399675731 | 7.07411681619736 |
Denmark | 3.5 | 5 | 1.6094379124341 | 1591 | 7.37211802833779 | 5.76268011590369 |
Czechia (Czech Republic) | 3.6 | 3 | 1.09861228866811 | 1394 | 7.23993259132047 | 6.14132030265236 |
Israel | 15.0 | 10 | 2.30258509299405 | 1329 | 7.19218205871325 | 4.8895969657192 |
Finland | -1.3 | 7 | 1.94591014905531 | 792 | 6.67456139181443 | 4.72865124275911 |
Greece | 13.2 | 7 | 1.94591014905531 | 743 | 6.61069604471776 | 4.66478589566245 |
Iceland | 0.5 | 9 | 2.19722457733622 | 648 | 6.47389069635227 | 4.27666611901605 |
Russia | -1.0 | 3 | 1.09861228866811 | 658 | 6.48920493132532 | 5.39059264265721 |
Romania | 5.4 | 3 | 1.09861228866811 | 762 | 6.63594655568665 | 5.53733426701854 |
Croatia | 6.4 | 8 | 2.07944154167984 | 382 | 5.94542060860658 | 3.86597906692674 |
San Marino | 10.0 | 8 | 2.07944154167984 | 187 | 5.23110861685459 | 3.15166707517475 |
Azerbaijan | 7.0 | 3 | 1.09861228866811 | 87 | 4.46590811865458 | 3.36729582998647 |
Georgia | 6.6 | 3 | 1.09861228866811 | 73 | 4.29045944114839 | 3.19184715248028 |
Thailand | 29.5 | 43 | 3.76120011569356 | 934 | 6.83947643822884 | 3.07827632253528 |
Indonesia | 26.4 | 2 | 0.693147180559945 | 686 | 6.53087762772588 | 5.83773044716594 |
India | 22.1 | 5 | 1.6094379124341 | 562 | 6.33150184989369 | 4.72206393745959 |
Iran | 10.0 | 1501 | 7.31388683163346 | 24811 | 10.1190423822017 | 2.8051555505682 |
Pakistan | 17.0 | 5 | 1.6094379124341 | 991 | 6.89871453432999 | 5.28927662189589 |
Qatar | 17.0 | 7 | 1.94591014905531 | 526 | 6.26530121273771 | 4.3193910636824 |
Bahrain | 21.2 | 49 | 3.89182029811063 | 392 | 5.97126183979046 | 2.07944154167984 |
Egypt | 16.9 | 2 | 0.693147180559945 | 402 | 5.99645208861902 | 5.30330490805908 |
Lebanon | 16.0 | 13 | 2.56494935746154 | 304 | 5.71702770140622 | 3.15207834394469 |
Iraq | 16.6 | 26 | 3.25809653802148 | 316 | 5.75574221358691 | 2.49764567556543 |
Kuwait | 19.3 | 56 | 4.02535169073515 | 195 | 5.27299955856375 | 1.2476478678286 |
United Arab Emirates | 22.3 | 21 | 3.04452243772342 | 248 | 5.51342874616498 | 2.46890630844156 |
Oman | 25.0 | 6 | 1.79175946922805 | 99 | 4.59511985013459 | 2.80336038090653 |
United States | 8.3 | 64 | 4.15888308335967 | 51914 | 10.8573437822964 | 6.69846069893675 |
Canada | -2.2 | 27 | 3.29583686600433 | 1739 | 7.46106551435428 | 4.16522864834995 |
Brazil | 21.5 | 2 | 0.693147180559945 | 2201 | 7.69666708152646 | 7.00351990096652 |
Ecuador | 14.3 | 6 | 1.79175946922805 | 1049 | 6.9555926083963 | 5.16383313916824 |
Mexico | 18.1 | 5 | 1.6094379124341 | 370 | 5.91350300563827 | 4.30406509320417 |
Algeria | 13.2 | 5 | 1.6094379124341 | 264 | 5.57594910314632 | 3.96651119071222 |
Plotting this data, we get:
In the linear regression model, the line of best fit has the form
Where, y is the dependent variable (in this case, the growth in cases), x is the independent variable (in this case, average temperature), a is the y-intercept, and b is the slope. a and b are calculated as follows:
Looking at our data again:
Row # | x | y | xy | x² |
---|---|---|---|---|
1 | 6.7 | 0.0190444356640604 | 0.127597718949205 | 44.89 |
2 | 5.7 | 0.641219305904052 | 3.6549500436531 | 32.49 |
3 | 17.6 | 4.2230664277931 | 74.3259691291586 | 309.76 |
4 | 27.6 | 4.02535169073515 | 111.09970666429 | 761.76 |
5 | 8.7 | 1.49323944158706 | 12.9911831418074 | 75.69 |
6 | 28.0 | 1.64222773525709 | 45.9823765871985 | 784 |
7 | 28.7 | 5.21493575760898 | 149.668656243378 | 823.69 |
8 | 20.0 | 2.12525107771113 | 42.5050215542226 | 400 |
9 | 15.8 | 4.5485998344997 | 71.8678773850953 | 249.64 |
10 | 9.0 | 3.52566688296985 | 31.7310019467287 | 81 |
11 | 11.2 | 5.85222768615169 | 65.5449500848989 | 125.44 |
12 | 5.1 | 5.30320983746909 | 27.0463701710924 | 26.01 |
13 | 8.8 | 4.74766002275775 | 41.7794082002682 | 77.44 |
14 | 5.3 | 5.68005883690249 | 30.1043118355832 | 28.09 |
15 | 6.9 | 5.3337092601038 | 36.8025938947162 | 47.61 |
16 | 6.1 | 5.73298162934846 | 34.9711879390256 | 37.21 |
17 | 5.7 | 5.68168833496091 | 32.3856235092772 | 32.49 |
18 | 6.8 | 6.27969334507813 | 42.7019147465313 | 46.24 |
19 | 3.3 | 4.63122772030738 | 15.2830514770144 | 10.89 |
20 | 0.1 | 5.02036557873883 | 0.502036557873883 | 0.01 |
21 | 14.9 | 7.07411681619736 | 105.404340561341 | 222.01 |
22 | 3.5 | 5.76268011590369 | 20.1693804056629 | 12.25 |
23 | 3.6 | 6.14132030265236 | 22.1087530895485 | 12.96 |
24 | 15.0 | 4.8895969657192 | 73.343954485788 | 225 |
25 | −1.3 | 4.72865124275911 | −6.14724661558684 | 1.69 |
26 | 13.2 | 4.66478589566245 | 61.5751738227443 | 174.24 |
27 | 0.5 | 4.27666611901605 | 2.13833305950802 | 0.25 |
28 | −1.0 | 5.39059264265721 | −5.39059264265721 | 1 |
29 | 5.4 | 5.53733426701854 | 29.9016050419001 | 29.16 |
30 | 6.4 | 3.86597906692674 | 24.7422660283311 | 40.96 |
31 | 10.0 | 3.15166707517475 | 31.5166707517475 | 100 |
32 | 7.0 | 3.36729582998647 | 23.5710708099053 | 49 |
33 | 6.6 | 3.19184715248028 | 21.0661912063698 | 43.56 |
34 | 29.5 | 3.07827632253528 | 90.8091515147908 | 870.25 |
35 | 26.4 | 5.83773044716594 | 154.116083805181 | 696.96 |
36 | 22.1 | 4.72206393745959 | 104.357613017857 | 488.41 |
37 | 10.0 | 2.8051555505682 | 28.051555505682 | 100 |
38 | 17.0 | 5.28927662189589 | 89.9177025722301 | 289 |
39 | 17.0 | 4.3193910636824 | 73.4296480826008 | 289 |
40 | 21.2 | 2.07944154167984 | 44.0841606836126 | 449.44 |
41 | 16.9 | 5.30330490805908 | 89.6258529461984 | 285.61 |
42 | 16.0 | 3.15207834394469 | 50.433253503115 | 256 |
43 | 16.6 | 2.49764567556543 | 41.4609182143861 | 275.56 |
44 | 19.3 | 1.2476478678286 | 24.079603849092 | 372.49 |
45 | 22.3 | 2.46890630844156 | 55.0566106782468 | 497.29 |
46 | 25.0 | 2.80336038090653 | 70.0840095226633 | 625 |
47 | 8.3 | 6.69846069893675 | 55.597223801175 | 68.89 |
48 | −2.2 | 4.16522864834995 | −9.16350302636989 | 4.84 |
49 | 21.5 | 7.00351990096652 | 150.57567787078 | 462.25 |
50 | 14.3 | 5.16383313916824 | 73.8428138901058 | 204.49 |
51 | 18.1 | 4.30406509320417 | 77.9035781869955 | 327.61 |
52 | 13.2 | 3.96651119071222 | 52.3579477174013 | 174.24 |
Σ | 643.4 | 220.669855974774 | 2591.69559117111 | 11643.76 |
Plugging all those numbers into the formula above, we get
The line of best fit looks like:
The line of best fit is slightly negative. Note that, even though the slope is only slightly negative, because we've taken the logarithm of the ratio of cases, a slight negative might result in a big difference in the growth in cases if it were statistically significant (and this is a question we'll investigate below). So, a hot country (30°) would show only one-third the growth of that of a cold country (0°). Taking the exponential of the line of best fit, we get:
Now to determine whether the result is statistically significant. In the simple linear regression model, a (1 − α) · 100% confidence interval for the slope parameter is given by:
Typically we would use a 95% confidence interval, so α would be 0.05. If this interval does not contain 0, we would reject the null hypothesis that b (the slope) is zero (meaning there is no relationship). If the interval does contain zero, there is insufficient evidence for rejecting the null hypothesis.
To find the mean squared error (denoted as MSE in the equation above), we'll find the sum of the squares of (yi − ^yi and divide by the number of observations less 2 (= 50).
i | xi | yi | ^yi | (yi − ^yi)²
1 | 6.7 | 0.0190444356640604 | 4.457253950441 | 19.6977036970566
| 2 | 5.7 | 0.641219305904052 | 4.494905983211 | 14.8509010068531
| 3 | 17.6 | 4.2230664277931 | 4.046846793248 | 0.0310533595992085
| 4 | 27.6 | 4.02535169073515 | 3.670326465548 | 0.126042910519187
| 5 | 8.7 | 1.49323944158706 | 4.381949884901 | 8.34464802531102
| 6 | 28.0 | 1.64222773525709 | 3.65526565244 | 4.05232165601611
| 7 | 28.7 | 5.21493575760898 | 3.628909229501 | 2.51548014786225
| 8 | 20.0 | 2.12525107771113 | 3.9564819146 | 3.35340637797271
| 9 | 15.8 | 4.5485998344997 | 4.114620452234 | 0.188338104231718
| 10 | 9.0 | 3.52566688296985 | 4.37065427507 | 0.714003692808212
| 11 | 11.2 | 5.85222768615169 | 4.287819802976 | 2.44737202494224
| 12 | 5.1 | 5.30320983746909 | 4.517497202873 | 0.61734434416393
| 13 | 8.8 | 4.74766002275775 | 4.378184681624 | 0.136512027705901
| 14 | 5.3 | 5.68005883690249 | 4.509966796319 | 1.36911538343684
| 15 | 6.9 | 5.3337092601038 | 4.449723543887 | 0.781430746475328
| 16 | 6.1 | 5.73298162934846 | 4.479845170103 | 1.57035098549025
| 17 | 5.7 | 5.68168833496091 | 4.494905983211 | 1.40845235042505
| 18 | 6.8 | 6.27969334507813 | 4.453488747164 | 3.33502323344271
| 19 | 3.3 | 4.63122772030738 | 4.585270861859 | 0.00211203283844446
| 20 | 0.1 | 5.02036557873883 | 4.705757366723 | 0.0989783270677977
| 21 | 14.9 | 7.07411681619736 | 4.148507281727 | 8.55919114818388
| 22 | 3.5 | 5.76268011590369 | 4.577740455305 | 1.40408199925974
| 23 | 3.6 | 6.14132030265236 | 4.573975252028 | 2.45657050771668
| 24 | 15.0 | 4.8895969657192 | 4.14474207845 | 0.554808803088813
| 25 | −1.3 | 4.72865124275911 | 4.758470212601 | 0.000889170962431547
| 26 | 13.2 | 4.66478589566245 | 4.212515737436 | 0.204548296022178
| 27 | 0.5 | 4.27666611901605 | 4.690696553615 | 0.171421200774195
| 28 | −1.0 | 5.39059264265721 | 4.74717460277 | 0.4139867740523
| 29 | 5.4 | 5.53733426701854 | 4.506201593042 | 1.06323459134201
| 30 | 6.4 | 3.86597906692674 | 4.468549560272 | 0.36309119945035
| 31 | 10.0 | 3.15166707517475 | 4.3330022423 | 1.39555277708684
| 32 | 7.0 | 3.36729582998647 | 4.44595834061 | 1.16351281182466
| 33 | 6.6 | 3.19184715248028 | 4.461019153718 | 1.61079756872576
| 34 | 29.5 | 3.07827632253528 | 3.598787603285 | 0.270931993387713
| 35 | 26.4 | 5.83773044716594 | 3.715508904872 | 4.50382427457647
| 36 | 22.1 | 4.72206393745959 | 3.877412645783 | 0.713435804530933
| 37 | 10.0 | 2.8051555505682 | 4.3330022423 | 2.33431551343581
| 38 | 17.0 | 5.28927662189589 | 4.06943801291 | 1.48800623197263
| 39 | 17.0 | 4.3193910636824 | 4.06943801291 | 0.06247652759043
| 40 | 21.2 | 2.07944154167984 | 3.911299475276 | 3.35570348887919
| 41 | 16.9 | 5.30330490805908 | 4.073203216187 | 1.51315017234655
| 42 | 16.0 | 3.15207834394469 | 4.10709004568 | 0.912047350451372
| 43 | 16.6 | 2.49764567556543 | 4.084498826018 | 2.51810292110125
| 44 | 19.3 | 1.2476478678286 | 3.982838337539 | 7.4812669055946
| 45 | 22.3 | 2.46890630844156 | 3.869882239229 | 1.96273355864573
| 46 | 25.0 | 2.80336038090653 | 3.76822175075 | 0.930957463016217
| 47 | 8.3 | 6.69846069893675 | 4.397010698009 | 5.29667210677034
| 48 | −2.2 | 4.16522864834995 | 4.792357042094 | 0.393290022239993
| 49 | 21.5 | 7.00351990096652 | 3.900003865445 | 9.63181178273921
| 50 | 14.3 | 5.16383313916824 | 4.171098501389 | 0.98552206104668
| 51 | 18.1 | 4.30406509320417 | 4.028020776863 | 0.0762004645842644
| 52 | 13.2 | 3.96651119071222 | 4.212515737436 | 0.0605182370087725
| Total | 129.493244162627
| |
---|
Dividing 129.493244162627 by 50, that result is 2.58986488325253; this is MSE in the formula above. Now, we had found Σ(xi − x)² above; it was 3682.9223. Finally, we need t0.025,50. Consulting a table, t0.025,40 = 2.021 and t0.025,60 = 2.000. Since t0.025,50 would fall between those two values, we'll use 2.01. So, a 95% confidence interval for the slope parameter is
Since this range includes zero, we cannot conclude, at the 95% confidence interval, that the slope is not zero, and so the nonzero slope that we found is not statistically significant. That doesn't necessarily mean that, in the real world, there is no relationship whatsoever between temperature and spread of the virus, but it does mean that, if there is one, our data don't provide statistically significant evidence of such a relationship.
I had collected these statistics for Coronavirus statistics by country but ended up not using them (for this, at any rate):