Miscalculating the cost function for a linear regression model

I’m trying to render a 3D plot of a cost function. Given a dataset and two different parameters (theta0 and theta1), I’d like to render a bowl-like graph we all see in classic literature. My hypothesis function is just a simple h(x) = theta_0 + theta_1 * x. However, my cost function is being rendered as follows:

cost function

Is it ok to get this plot? In case it is, how can we plot such a “bowl”?

import matplotlib.pyplot as plt
import numpy as np

training_set = np.array([
    [20, 400],
    [30, 460],
    [10, 300],
    [50, 780],
    [15, 350],
    [60, 800],
    [19, 360],
    [31, 410],
    [5, 50],
    [46, 650],
    [37, 400],
    [39, 900]])

cost_factor = (1.0 / (len(training_set) * 2))

hypothesis = lambda theta0, theta1, x: theta0 + theta1 * x

cost = lambda theta0, theta1: cost_factor * sum(map(
    lambda entry: (hypothesis(theta0, theta1, entry[0]) - entry[1]) ** 2, training_set))

theta1 = np.arange(0, 10, 1)
theta2 = np.arange(0, 10, 1)

X, Y = np.meshgrid(theta1, theta1)

Z = cost(X, Y)

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis', edgecolor='none')
ax.set_xlabel(r'$theta_0$')
ax.set_ylabel(r'$theta_1$')
ax.set_zlabel(r'$J(theta_0, theta_1)$')
ax.set_title('Cost function')

plt.show()

Answer

Side notes:

  • I have renamed theta1 to theta0 and theta2 to theta1 in your code in order to avoid confusion between the code and the labels of the plot
  • your code contains a typo: X, Y = np.meshgrid(theta1, theta1) should be X, Y = np.meshgrid(theta0, theta1)

You Z surface may have a point of absolute/relative minimum/maximum which is outside the domain you choose: 0 < theta0 < 10 and 0 < theta1 < 10. You can try to expand this interval in order to see if there actually is a stationary point:

theta0 = np.arange(-100, 100, 5)
theta1 = np.arange(-100, 100, 5)

enter image description here

So there is a minimum zone for -50 < theta1 < 50. It seems your 2D surface does not have a minimum along theta0 direction; however you can try to expand this domain as well:

theta0 = np.arange(-1000, 1000, 100)
theta1 = np.arange(-50, 50, 1)

enter image description here

So you can see that your Z surface does not have a minimium point, but a minimum zone which is not aligned with either theta0 nor theta1.
Since I do not know what theta0 and theta1actually represent, I may have assignd them values that have no sense: for example, if they are latitude and longitude respectively, then their domain should be -90 < theta0 < 90 and 0 < theta1 < 180. This depends on the physical meaning of theta0 and theta1.


However, you can always compute the gradient of the surface with np.gradient and plot them:

g1, g2 = np.gradient(Z)

fig = plt.figure()
ax1 = fig.add_subplot(1, 3, 1, projection = '3d')
ax2 = fig.add_subplot(1, 3, 2, projection = '3d')
ax3 = fig.add_subplot(1, 3, 3, projection = '3d')
ax1.plot_surface(X, Y, Z, cmap='viridis', edgecolor='none')
ax2.plot_surface(X, Y, g1, cmap='viridis', edgecolor='none')
ax3.plot_surface(X, Y, g2, cmap='viridis', edgecolor='none')

ax1.set_xlabel(r'$theta_0$')
ax1.set_ylabel(r'$theta_1$')
ax1.set_zlabel(r'$J(theta_0, theta_1)$')
ax1.set_title('Cost function')

ax2.set_xlabel(r'$theta_0$')
ax2.set_ylabel(r'$theta_1$')

ax3.set_xlabel(r'$theta_0$')
ax3.set_ylabel(r'$theta_1$')

plt.show()

enter image description here

You can see that the region where the gradient is null is a line, not a point.


If your Z surface would have a different expression, for example:

Z = np.exp(-X**2 - Y**2)

you would have:

enter image description here

In this case you can see that both gradient are null in the point (0, 0), where the surface has a maximum.