IndexError when plotting pandas dataframe with subplots

I’m working a beginner tutorial on this dataset here:

http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data

I’ve loaded it like so:

dataset = pd.read_csv("sonar.all-data.csv", header=None)

All the numbers and metrics seem to be correct.

If I try to do a histogram or density plot, it works fine. But if I try to do a box plot, I get an exception:

C:ProgramDataAnaconda3libsite-packagespandascoreseries.py in __setitem__(self, key, value)
    975             if is_integer(key) and not self.index.inferred_type == "integer":
    976                 # positional setter
--> 977                 values[key] = value
    978             else:
    979                 # GH#12862 adding a new key to the Series

IndexError: index 0 is out of bounds for axis 0 with size 0

It has drawn the first box plot. I looked in the CSV file and there doesn’t seem to be any weird data in the second column.

Just doing:

dataset.plot(kind='box', subplots=True, layout=(8,8), sharex=False, sharey=False, fontsize=1)
plt.show()

Versions:

scipy: 1.6.2
numpy: 1.20.1
matplotlib: 3.3.4
pandas: 1.2.4
sklearn: 0.24.1

Sample data if link dies

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.066,0.2273,0.31,0.2999,0.5078,0.4797
0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,0.4918,0.6552,0.6919,0.7797,0.7464,0.9444,1.0,0.8874,0.8024,0.7818
0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,0.6333,0.706,0.5544,0.532,0.6479,0.6931,0.6759,0.7551,0.8929,0.8619
0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,0.0881,0.1992,0.0184,0.2261,0.1729,0.2131,0.0693,0.2281,0.406,0.3973
0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,0.4152,0.3952,0.4256,0.4135,0.4528,0.5326,0.7306,0.6193,0.2032,0.4636
0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833,0.2105,0.3039,0.2988,0.425,0.6343,0.8198,1.0,0.9988,0.9508,0.9025,0.7234,0.5122
0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401,0.2083,0.3513,0.1786,0.0658,0.0513,0.3752,0.5419,0.544,0.515,0.4262,0.2024,0.4233
0.0519,0.0548,0.0842,0.0319,0.1158,0.0922,0.1027,0.0613,0.1465,0.2838,0.2802,0.3086,0.2657,0.3801,0.5626,0.4376,0.2617,0.1199,0.6676,0.9402
0.0223,0.0375,0.0484,0.0475,0.0647,0.0591,0.0753,0.0098,0.0684,0.1487,0.1156,0.1654,0.3833,0.3598,0.1713,0.1136,0.0349,0.3796,0.7401,0.9925
0.0164,0.0173,0.0347,0.007,0.0187,0.0671,0.1056,0.0697,0.0962,0.0251,0.0801,0.1056,0.1266,0.089,0.0198,0.1133,0.2826,0.3234,0.3238,0.4333
0.0039,0.0063,0.0152,0.0336,0.031,0.0284,0.0396,0.0272,0.0323,0.0452,0.0492,0.0996,0.1424,0.1194,0.0628,0.0907,0.1177,0.1429,0.1223,0.1104
0.0123,0.0309,0.0169,0.0313,0.0358,0.0102,0.0182,0.0579,0.1122,0.0835,0.0548,0.0847,0.2026,0.2557,0.187,0.2032,0.1463,0.2849,0.5824,0.7728
0.0079,0.0086,0.0055,0.025,0.0344,0.0546,0.0528,0.0958,0.1009,0.124,0.1097,0.1215,0.1874,0.3383,0.3227,0.2723,0.3943,0.6432,0.7271,0.8673
0.009,0.0062,0.0253,0.0489,0.1197,0.1589,0.1392,0.0987,0.0955,0.1895,0.1896,0.2547,0.4073,0.2988,0.2901,0.5326,0.4022,0.1571,0.3024,0.3907
0.0124,0.0433,0.0604,0.0449,0.0597,0.0355,0.0531,0.0343,0.1052,0.212,0.164,0.1901,0.3026,0.2019,0.0592,0.239,0.3657,0.3809,0.5929,0.6299
0.0298,0.0615,0.065,0.0921,0.1615,0.2294,0.2176,0.2033,0.1459,0.0852,0.2476,0.3645,0.2777,0.2826,0.3237,0.4335,0.5638,0.4555,0.4348,0.6433
0.0352,0.0116,0.0191,0.0469,0.0737,0.1185,0.1683,0.1541,0.1466,0.2912,0.2328,0.2237,0.247,0.156,0.3491,0.3308,0.2299,0.2203,0.2493,0.4128
0.0192,0.0607,0.0378,0.0774,0.1388,0.0809,0.0568,0.0219,0.1037,0.1186,0.1237,0.1601,0.352,0.4479,0.3769,0.5761,0.6426,0.679,0.7157,0.5466
0.027,0.0092,0.0145,0.0278,0.0412,0.0757,0.1026,0.1138,0.0794,0.152,0.1675,0.137,0.1361,0.1345,0.2144,0.5354,0.683,0.56,0.3093,0.3226
0.0126,0.0149,0.0641,0.1732,0.2565,0.2559,0.2947,0.411,0.4983,0.592,0.5832,0.5419,0.5472,0.5314,0.4981,0.6985,0.8292,0.7839,0.8215,0.9363

Answer

  • I don’t know why, but using subplots=True with numeric column names seems to be causing the issue.
  • The resolution is to convert the column names to strings
import pandas as pd

# load the data
df = pd.read_csv("sonar_all-data.csv", header=None)

# check the column name type
print(type(df.columns[0]))
[out]:
numpy.int64

# convert the column names to strings
df.columns = [f'{v}' for v in df.columns]

# check the column name type
print(type(df.columns[0]))
[out]:
str

# plot the dataframe
df.plot(kind='box', layout=(10, 6), figsize=(20, 20), subplots=True)
plt.show()

enter image description here

  • With subplots=False the plot works with numeric column names

enter image description here