I am attempting to execute a train test split on some data, wine.data but when initializing x and y:
import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.neural_network import MLPClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.preprocessing import StandardScaler from sklearn.model_selection import cross_val_score wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data") print(wine.shape) wine.head() X = wine[np.arange(1,14)] y = wine
The rest of the code below this segment will not run as I get the error message:
KeyError: "None of [Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')] are in the [columns]"
I have attempted to resolve this by changing the range of the X value or changing the np.arange function but neither help the problem.
Any help or advice would be greatly appreciated, thank you!
You forgot to add
header=None to the dataframe constructor. The csv you are downloading doesn’t have a header line. So, if you don’t specify
header=None, the first line of data will be used as the header.
wine = pd.read_csv( "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", header=None )