I am facing the following problem. I am trying to download this dataset: Dataset link
in this way:
data_file_url = 'http://cs.joensuu.fi/sipu/datasets/s1.txt' D = np.array(pd.read_csv(data_file_url,header=0)) D = D[ np.random.choice(np.arange(D.shape), D.shape, replace=False) ,:] Dx = D[:,0:2] Dy = D[:,2]
but it seems that is comes in a .txt array format. Thats not really the problem, but the string itself is. It comes in this form:
[[' 665845 557965'] [' 597173 575538'] [' 618600 551446'] ... [' 650661 861267'] [' 599647 858702'] [' 684091 842566']]
, where all the arrays are a giant wierd string with a lot of blanc spaces and two number which are the coordinates. I am trying to get it in this form
The dataset can be downloaded in .txt or .ts format.
I tried to split, then cast to
int but I am getting all the number instead of 2, obviously.
Thanks for the help or advice!
Have you tried using the optional arguments of pd.read_csv?
Try the following:
D = np.array(pd.read_csv(data_file_url,header=0,delimiter=' '))