I am attempting to add all rows that contain a specific DateTime value to a list and then print said list. I iterate through all rows in a dataframe, looking for a specific value within a column. If the value appears, I would like to add that specific row to a list.
with open('Layer1.csv', newline = '') as csvfile2: df = pd.read_csv('Layer1.csv') AudioMothIDs = getID() AudioMothIDs.remove('NA') csv_reader = csv.reader(csvfile2) for row in csv_reader: orig_list =  #Iterates through each unique ID for x in AudioMothIDs: ID_df = df[df['AudioMothID'] == x] #Iterates through all rows in the ID dataframe for index, rows in ID_df.iterrows(): #Searches for a specific DateTime within the StartDateTime column if '16.06.2019 15:00' in ID_df.StartDateTime.values: #Attempts to add rows with the specific DateTime to a list current_list = [rows.AudioMothID,rows.StartDateTime] orig_list.append(current_list) print(orig_list)
Appending rows.AudioMothID and rows.StartDatetime appends all rows within the ID to the list instead of just the rows with ‘16.06.2019 15:00’ in the StartDateTime column. I have tried using ID_df.iloc[index] as well which similarly adds all rows instead of only the rows that contain the specified string.
How would I only append the rows to a list that contain ‘16.06.2019 15:00’ in the StartDateTime column?
I think you’re making quite a few unnecessary steps, e.g. reading the csv file twice, iterating over each row and then over each ID..
Let’s simply use
pandas for this:
# read dataframe using pandas df = pd.read_csv('Layer1.csv') # filter for date, select specific columns and convert to list df[df.StartDateTime == '16.06.2019 15:00'][[df.AudioMothID, df.StartDateTime]].values.tolist()
[EDIT] adding on to address your comments:
- How to select values within a certain hour
# ensure date column is in the right format df['StartDateTime'] = pd.to_datetime(df['StartDateTime']) # filter by hour of date df[df.StartDateTime.dt.hour == 15]
- How to randomly sample for each hour and each ID
# ensure date column is in the right format df['StartDateTime'] = pd.to_datetime(df['StartDateTime']) # round date column to nearest hour df['StartDateTime_nearest_hour'] = df['StartDateTime'].dt.round('H') # randomly sample 1 by each hour of each day and ID df.groupby(['AudioMothID', 'StartDateTime_nearest_hour'].sample(n=1) # if you want to sample an hour of a random day instead: df['StartDateTime_hour'] = df['StartDateTime'].dt.hour df.groupby(['AudioMothID', 'StartDateTime_hour'].sample(n=1)