Pyhthon: Getting “list index out of range” error; I know why but don’t know how to resolve this

I am currently working on a data science project. The Idea is to clean the data from “glassdoor_jobs.csv”, and present it in a much more understandable manner.

import pandas as pd

df = pd.read_csv('glassdoor_jobs.csv')

#salary parsing
#Removing "-1" Ratings
#Clean up "Founded"
#state field
#Parse out job description

df['hourly'] = df['Salary Estimate'].apply(lambda x: 1 if 'per hour' in x.lower() else 0)
df['employer_provided'] = df['Salary Estimate'].apply(lambda x: 1 if 'employer provided salary' in x.lower() else 0)
df = df[df['Salary Estimate'] != '-1']
Salary = df['Salary Estimate'].apply(lambda x: x.split('(')[0])
minus_Kd = Salary.apply(lambda x: x.replace('K', '').replace('$',''))

minus_hr = minus_Kd.apply(lambda x: x.lower().replace('per hour', '').replace('employer provided salary:', ''))

df['min_salary'] = minus_hr.apply(lambda x: int(x.split('-')[0]))
df['max_salary'] = minus_hr.apply(lambda x: int(x.split('-')[1]))

I am getting the error at that last line. After digging a bit, I found out in minus_hr, some of the ‘Salary Estimate’ only has one number instead of range:

index Salary Estimate
0 150
1 58
2 130
3 125-150
4 110-140
5 200
6 67- 77

And so on. Now I’m trying to figure out how to work around the “list index out of range”, and make max_salary the same as the min_salary for the cells with only one value.

I am also trying to get average between the min and max salary, and if the cell only has a single value, make that value the average

So in the end, something like index 0 would look like:

index min max average
0 150 150 150

Answer

You’ll have to add in a conditional statement somewhere.

df['min_salary'] = minus_hr.apply(lambda x: int(x.split('-')[0]) if '-' in x else x)

The above might do it, or you can define a function.

def max_salary(cell_value):
    if '-' in cell_value:
        max_salary = split(cell_value, '-')[1]
    else:
        max_salary = cell_value
return max_salary

df['max_salary'] = minus_hr.apply(lambda x: max_salary(x))


def avg_salary(cell_value):
    if '-' in cell_value:
        salaries = split(cell_value,'-')
        avg = sum(salaries)/len(salaries)
    else:
        avg = cell_value
return avg

df['avg_salary'] = minus_hr.apply(lambda x: avg_salary(x))

Swap in min_salary and repeat