Let $x_i$ and $y_i$ both be uniform on $[0, 1]$. Let $w_i$ be the smaller of the two, and let $z_i$ be the larger. Let $h_i = \sqrt{w_i^2 + z_i^2}$. So $(x_i, y_i)$ is a random point in the unit square, and $h_i$ is its distance from the origin. We can predict this distance using linear regression. For example, in R, we can pick $10^4$ such points and execute the code

x=runif(10^4,0,1) y=runif(10^4,0,1) w=rep(0,10^4); for(i in 1:10^4){w[i]=min(x[i],y[i])} z=rep(0,10^4); for(i in 1:10^4){z[i]=max(x[i],y[i])} h=sqrt(w^2+z^2) lm(h~0+w+z)
to fit a linear model of the form $h = aw+bz$. The least-squares model here is, for this particular simulation, $h = 0.4278w + 0.9339z$, with $R^2 = 0.9995$. In other words, the formula

$h = 0.4278 \min(x,y) + 0.9339 \max(x,y)$

appears to predict a as a linear function of $\min(x,y)$ and $\max(x,y)$ quite well, and so the hypotenuse of a triangle is 0.4278 times its shorter leg, plus 0.9339 times its longer leg. For a particular famous special case, try x = 3, y = 4; then we predict the hypotenuse is 0.4278(3) + 0.9339(4) = 5.019, quite close to the true value of 5.

Andrew Gelman and Deborah Nolan, in Teaching Statistics: A bag of tricks, give a very similar example, with slightly different numerical parameters and quip that “if Pythagoras knew about multiple regression, he might never have discovered his famous theorem”. (p. 146). They fit a model that is allowed to have nonzero constant term; I choose to fit a model with zero constant term. I think that our anachronistic Pythagoras would have had the sense to observe that if we double x and y, we should double the hypotenuse as well.

The natural question, to me, is to determine the “true” constants. So what constants a and B give the linear function $ax+by$ that best approximates $\sqrt{x^2+y^2}$, when we restrict to $0 < x < y < 1$? The reason for the triangular-shaped region is that we’re restricting to the case where $x$ is smaller and $y$ is larger. To be consistent with our Pythagoras-as-linear-regressor model, we’ll make the approximation in the least-squares sense. So we want to minimize

$f(a,b) = latex \int_0^1 \int_0^y \left( \sqrt{x^2+y^2} – (ax+by) \right)^2 \: dx \: dy$

as a function of a and b. This is a calculus problem. Expand the integrand to get

$\int_0^1 \int_0^y x^2+y^2+a^2 x^2 + b^2 y^2 + 2ab xy - 2ax \sqrt{x^2+y^2} - 2by \sqrt{x^2+y^2} \: dx \: dy.$

The polynomials are easy to integrate; the square-root terms somewhat less so, if it’s been a while since you’ve done freshman calculus. But after a bit of work this is

$f(a,b) = {1 \over 12} a^2 + {1 \over 4} ab + {1 \over 4} b^2 + {1 \over 3} - C_1 a - C_2 b$

where $C_1 = (2\sqrt{2}-1)/6, C_2 = (\sqrt{2}+\sinh^{-1} 1)/4$. Differentiating we get

${\partial \over \partial a} f(a,b) = {1 \over 6} a + {1 \over 4} b - C_1$

and

${\partial \over \partial b} f(a,b) = {1 \over 4} a + {1 \over 2} b - C_2$.

Set both of these equal to zero and solve to get
$a = 24C_1 - 12C_2 = 5 \sqrt{2} - 4 - 3 \sinh^{-1} 1 = 0.4269, b = -12C_1 + 8C_2 = -2\sqrt{2} + 2 + 2 \sinh^{-1} 1 = 0.9343$

which are tolerably close to the coefficients that came out of the regression. (Those coefficients had standard errors of 0.0009 and 0.0005 respectively.)

Of course our hypothetical Pythagoras couldn’t have done these integrals, and would not have liked that they turn out to be irrational. Perhaps he would have just said that the length of the hypotenuse of a triangle was three-sevenths of the shorter leg, plus fourteen-fifteenths of the longer leg.