Get `bs` (splines) equivalent in Python

In the r programming language, the following

require(stats)
require(splines)
knots = quantile(women$height, seq(0.1,0.9,length.out = 5))
bs(women$height, knots=knots, degree=3)

returns

1   2   3   4   5   6   7   8
0.0000000000    0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.0000000000    0.00000000
0.6284418529    0.323939099 0.024295432 0.000000000 0.000000000 0.000000000 0.0000000000    0.00000000
0.2155814707    0.599894720 0.182883868 0.001639942 0.000000000 0.000000000 0.0000000000    0.00000000
0.0349854227    0.495626822 0.438289602 0.031098154 0.000000000 0.000000000 0.0000000000    0.00000000
0.0001619695    0.245586330 0.620809038 0.133442663 0.000000000 0.000000000 0.0000000000    0.00000000
0.0000000000    0.072886297 0.584548105 0.338678328 0.003887269 0.000000000 0.0000000000    0.00000000
0.0000000000    0.009110787 0.384718173 0.561892614 0.044278426 0.000000000 0.0000000000    0.00000000
0.0000000000    0.000000000 0.166666667 0.666666667 0.166666667 0.000000000 0.0000000000    0.00000000
0.0000000000    0.000000000 0.044278426 0.561892614 0.384718173 0.009110787 0.0000000000    0.00000000
0.0000000000    0.000000000 0.003887269 0.338678328 0.584548105 0.072886297 0.0000000000    0.00000000
0.0000000000    0.000000000 0.000000000 0.133442663 0.620809038 0.245586330 0.0001619695    0.00000000
0.0000000000    0.000000000 0.000000000 0.031098154 0.438289602 0.495626822 0.0349854227    0.00000000
0.0000000000    0.000000000 0.000000000 0.001639942 0.182883868 0.599894720 0.2155814707    0.00000000
0.0000000000    0.000000000 0.000000000 0.000000000 0.024295432 0.323939099 0.6284418529    0.02332362
0.0000000000    0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.0000000000    1.00000000

Is there a Python equivalent? I have tried BSpline from scipy, but it requires that the coefficients already be known and be passed in.

How can I just generate the B-spline basis matrix, having passed in an array, knots, and degree?


To reproduce the input Python, you can do:

import numpy as np

women_height = np.array([58,59,60,61,62,63,64,65,66,67,68,69,70,71,72])
knots = array([59.4, 62.2, 65. , 67.8, 70.6])

Answer

Turning a comment into an answer, BSpline.design_matrix is constructing what you are after, in the csr sparse format. It’ll be available from scipy 1.8 when it is released. Until then, you can either grab the master branch of scipy, or use a workaround suggested by the docs (https://scipy.github.io/devdocs/reference/generated/scipy.interpolate.BSpline.design_matrix.html#scipy.interpolate.BSpline.design_matrix) :

t = ... 
c = np.eye(len(t) - k - 1)
design_matrix_gh = BSpline(t, c, k)(x)

EDIT: R documentation, https://www.rdocumentation.org/packages/splines/versions/3.6.2/topics/bs, states that knots argument is internal knots. scipy’s BSpline does not pad the knots automatically, so you need to do it yourself. Using the OP data:

In [22]: women_height = np.array([58,59,60,61,62,63,64,65,66,67,68,69,70,71,72])
    ...: knots = np.array([59.4, 62.2, 65. , 67.8, 70.6])

In [23]: t = np.r_[(58,)*4, knots, (72,)*4]   # <<<<<< here

In [24]: m = BSpline.design_matrix(women_height, t, k=3)

In [25]: with np.printoptions(linewidth=120, precision=5):
    ...:     print(m.toarray())
    ...:
[[1.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
 [2.33236e-02 6.28442e-01 3.23939e-01 2.42954e-02 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 2.15581e-01 5.99895e-01 1.82884e-01 1.63994e-03 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 3.49854e-02 4.95627e-01 4.38290e-01 3.10982e-02 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 1.61970e-04 2.45586e-01 6.20809e-01 1.33443e-01 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 7.28863e-02 5.84548e-01 3.38678e-01 3.88727e-03 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 9.11079e-03 3.84718e-01 5.61893e-01 4.42784e-02 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 1.66667e-01 6.66667e-01 1.66667e-01 0.00000e+00 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 4.42784e-02 5.61893e-01 3.84718e-01 9.11079e-03 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 3.88727e-03 3.38678e-01 5.84548e-01 7.28863e-02 0.00000e+00 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.33443e-01 6.20809e-01 2.45586e-01 1.61970e-04 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 3.10982e-02 4.38290e-01 4.95627e-01 3.49854e-02 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.63994e-03 1.82884e-01 5.99895e-01 2.15581e-01 0.00000e+00]
 [0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 2.42954e-02 3.23939e-01 6.28442e-01 2.33236e-02]
 [0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00]]

which looks similar to the R’s output from the OP modulo the first column. From the R’s docs it’s not immediately clear how exactly it pads the knot vector, but if you want the same output, you can just chop the first column off (m.toarray()[1:, :] or some such).