Errors reading CSV with Pandas

I have a dataset of 100 million rows that I need to analyze. I use this function to read the file:

                    usecols=['field1', 'field2', 'field3', 'field4'],
                    dtype={'field1': int,'field2': float, 'field3': float, 'field4': float})

But I’m getting an error about one of the lines not possible to convert to a float:

ValueError: could not convert string to float: ‘ORCH’

I would like to omit any lines where this error occurs, but I don’t know how besides the error-bad-lines argument. Help?



The error_bad_lines option is not for this purpose, it only applies to an incorrect number of fields.

Read your file without the dtype option and do the conversion afterwards using pandas.to_numeric with the errors='coerce' option:

df = pd.read_csv(…)
df['field1'] = pd.to_numeric(df['field1'], errors='coerce')
df['field2'] = …