Edit strings in every row of a column of a csv


I have a csv with a date column with dates listed as MM/DD/YY but I want to change the years from 00,02,03 to 1900, 1902, 1903 so that they are instead listed as MM/DD/YYYY

This is what works for me:

df2['Date'] = df2['Date'].str.replace(r'00', '1900')

but I’d have to do this for every year up until 68 (aka repeat this 68 times). I’m not sure how to create a loop to do the code above for every year in that range. I tried this:

ogyear=00 
newyear=1900 
while ogyear <= 68:
    df2['date']=df2['Date'].str.replace(r'ogyear','newyear')
    ogyear += 1
    newyear += 1

but this returns an empty data set. Is there another way to do this?

I can’t use datetime because it assumes that 02 refers to 2002 instead of 1902 and when I try to edit that as a date I get an error message from python saying that dates are immutable and that they must be changed in the original data set. For this reason I need to keep the dates as strings. I also attached the csv here in case thats helpful.

Answer

I would do it like this:

# create a data frame
d = pd.DataFrame({'date': ['20/01/00','20/01/20','20/01/50']})

# create year column
d['year'] = d['date'].str.split('/').str[2].astype(int) + 1900

# add new year into old date by replacing old year 
d['new_data'] = d['date'].str.replace('[0-9]*.$','') + d['year'].astype(str)

        date year   new_data
0   20/01/00 1900   20/01/1900
1   20/01/20 1920   20/01/1920
2   20/01/50 1950   20/01/1950