I have 200 datasets and I want to iterate through them to pick random rows and add them to another dataset(empty dataset), using iloc and value function. when I execute the code it does not give an error but also does not add anything to the empty dataset. However, when I try to run the single command to check if the random row has any value or not it gives an error of: AttributeError: ‘str’ object has no attribute ‘iloc’.
my code is given below:
Tdata = np.zeros([20, 6]) k = 0 for j in range(200): for j1 in range(0, 20): Tdata[k:k+1,:] = (('dataset'+j)).iloc[random.randint(100)].values k += 1
(‘dataset’+j) is basically selecting different datasets. The names of my datasets are dataset0, dataset1, dataset2……there are already defined.
There are multiple issues with you code.
str in place of the actual DataFrame variable
You are trying use
.iloc over a string
dataframe1 for example. This won’t work since what
str has no attribute
.iloc, as the error reads for you.
Since you want to work with DataFrame variable names, you may need to use
eval() to interpret the string as a variable name. NOTE: BE EXTRA CAREFUL while using
eval(). Please read the dangers of using eval() carefully.
2. Sampling 20 rows from each DataFrame.
If you are trying to get 20 rows by using
for j1 in range(0, 20): along with
random.randint(100), there is a better way to avoid this iteration. Instead what you need is to use
random.randint(0,100,(n,) to get n random indexes. In this case
Or an even better way to do this is just simply using
df.sample(20) to sample 20 rows from a given dataframe.
3. Forcing update over views of the dataframe
Its better to use a different appraoch than force an update over a view of the dataframe with
Tdata[k:k+1,:] == .... Since you want to combine dataframes, its better to just collect them in a list and pass them to a
pd.concat which would be much more useful.
Here is sample code with a simple setting which should help guide you to what you are looking for.
import pandas as pd import numpy as np dataset0 = pd.DataFrame(np.random.random((100,3))) dataset1 = pd.DataFrame(np.random.random((100,3))) dataset2 = pd.DataFrame(np.random.random((100,3))) dataset3 = pd.DataFrame(np.random.random((100,3))) ##Using random.randint ##samples = [eval('dataset'+str(i)).iloc[np.random.randint(0,100,(3,))] for i in range(4)] ##Using df.sample() samples = [eval('dataset'+str(i)).sample(3) for i in range(4)] ##Change - ##1. The 3 to 20 for 20 samples per dataframe ##2. range(4) to range(200) to work with 200 dataframes output = pd.concat(samples) print(output)
0 1 2 42 0.372626 0.445972 0.030467 20 0.376201 0.445504 0.835735 56 0.214806 0.083550 0.582863 85 0.691495 0.346022 0.619638 24 0.290397 0.202795 0.704082 16 0.112986 0.013269 0.903917 51 0.521951 0.115386 0.632143 73 0.946870 0.531085 0.437418 98 0.745897 0.718701 0.280326 56 0.679253 0.010143 0.124667 4 0.028559 0.769682 0.737377 84 0.857553 0.866464 0.827472
4. Storing 200 dataframes??
Last but not the least, you should ask yourself, why are you storing 200 dataframe as individual variables, only to sample some rows from each.
Why not try to –
- Read each of the files iteratively
- Sample rows from each
- Store them in a list of dataframes
pd.concatonce you are done iterating over the 200 files
… instead of saving 200 dataframes and then doing the same.