Turn 2D Numpy Array Into 3D Array

X_train[['x_1', 'x_2']].values

> array([[array([  8,  14,  28, 101,  49,  11,  48,  32,  75,  88]),
        array([107,  23,  75,  88,  53, 120, 114, 112,  11,  30])],
       [array([107,  23,  75,  88,  53, 120, 114, 112,  11,  30]),
        array([  8,  14,  28, 101,  49,  11,  48,  32,  75,  88])],
       [array([ 40,  46,  21,  67,  17, 167, 125, 165,  89,  90]),
        array([ 10,  58,  73,  61,  94,  46, 122,  46,   6,  15])],
       ...,
       [array([ 778,  356, 1091,  912,  866,  763,  170,  456,  539, 1059]),
        array([ 434,  992, 1437,  980,  949,  916,  714, 2000, 2000,  768])],
       [array([ 583,   90,  666,  224,  819,  154, 1399,  340,   99,  201]),
        array([1051,  663, 1018,  581, 1188, 2000,  867,  211,  441,  660])],
       [array([1051,  663, 1018,  581, 1188, 2000,  867,  211,  441,  660]),
        array([ 583,   90,  666,  224,  819,  154, 1399,  340,   99,  201])]],
      dtype=object)

I converted this partial dataframe into a numpy array, but it is giving me a 2D shape.

X_train[['x_1', 'x_2']].shape

> (335334, 2)

However, if I copy and paste the output into a Jupyter block, I get a 3D shape all of a sudden.

np.array([[np.array([  8,  14,  28, 101,  49,  11,  48,  32,  75,  88]),
    np.array([107,  23,  75,  88,  53, 120, 114, 112,  11,  30])],
   [np.array([107,  23,  75,  88,  53, 120, 114, 112,  11,  30]),
    np.array([  8,  14,  28, 101,  49,  11,  48,  32,  75,  88])],
   [np.array([ 40,  46,  21,  67,  17, 167, 125, 165,  89,  90]),
    np.array([ 10,  58,  73,  61,  94,  46, 122,  46,   6,  15])]
  ]).shape

> (3, 2, 10)

Answer

Making a dataframe like yours:

In [6]: df['C1']=[np.arange(i,i+3) for i in range(3)]
In [7]: df['C2']=[np.arange(i,i+3) for i in range(5,8)]
In [8]: df
Out[8]: 
          C1         C2
0  [0, 1, 2]  [5, 6, 7]
1  [1, 2, 3]  [6, 7, 8]
2  [2, 3, 4]  [7, 8, 9]
In [9]: df.values
Out[9]: 
array([[array([0, 1, 2]), array([5, 6, 7])],
       [array([1, 2, 3]), array([6, 7, 8])],
       [array([2, 3, 4]), array([7, 8, 9])]], dtype=object)

One column, a Series, does have a to_list method:

In [12]: df['C1'].to_list()
Out[12]: [array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4])]
In [13]: np.array(df['C1'].to_list())
Out[13]: 
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

but a dataframe does not. That makes sense, since a list is 1d, and a frame is 2d.

values (or to_numpy()), is a 2d object array:

In [14]: df.values
Out[14]: 
array([[array([0, 1, 2]), array([5, 6, 7])],
       [array([1, 2, 3]), array([6, 7, 8])],
       [array([2, 3, 4]), array([7, 8, 9])]], dtype=object)

np.array(df.values) doesn’t change that. But we can make a nested list from it:

In [15]: df.values.tolist()
Out[15]: 
[[array([0, 1, 2]), array([5, 6, 7])],
 [array([1, 2, 3]), array([6, 7, 8])],
 [array([2, 3, 4]), array([7, 8, 9])]]

and recreate an array from that:

In [16]: np.array(df.values.tolist())
Out[16]: 
array([[[0, 1, 2],
        [5, 6, 7]],

       [[1, 2, 3],
        [6, 7, 8]],

       [[2, 3, 4],
        [7, 8, 9]]])

The copy-n-paste is effectively do the same thing,

np.stack (or vstack) can join the arrays of a Series:

In [20]: np.stack(df['C1'])
Out[20]: 
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

But that doesn’t work with the nested list from a dataframe. But it does work on a ravel array:

np.stack(df.values.ravel()).reshape(-1,2,3)

Leave a Reply

Your email address will not be published. Required fields are marked *