Filter Pandas DataFrames Using Dynamic URL Query String

Currently i am having an question in python pandas. I want to filter a dataframe using url query string dynamically.

For eg: CSV:

enter image description here

url: http://example.com/filter?Name=Sam&Age=21&Gender=male

Hardcoded:

filtered_data = data[
    (data['Name'] == 'Sam') &
    (data['Age'] == 21) &
    (data['Gender'] == 'male') 
];

I don’t want to hard code the filter keys like before because the csv file changes anytime with different column headers. Any suggestions

Answer

Typically, your web framework will return the arguments in a dict-like structure. Let’s say your args are like this:

args = {
    'Name': ['Sam'],
    'Age': ['21'],         # Note that Age is a string
    'Gender': ['male']
}

You can filter your dataset successively like this:

for key, values in args.items():
    data = data[data[key].isin(values)]

However, this is likely not to match any data for Age, which may have been loaded as an integer. In that case, you could load the CSV file as a string via pd.read_csv(filename, dtype=object), or convert to string before comparison:

for key, values in args.items():
    data = data[data[key].astype(str).isin(values)]

Incidentally, this will also match multiple values. For example, take the URL http://example.com/filter?Name=Sam&Name=Ben&Age=21&Gender=male — which leads to the structure:

args = {
    'Name': ['Sam', 'Ben'],    # There are 2 names
    'Age': ['21'],
    'Gender': ['male']
}

In this case, both Ben and Sam will be matched, since we’re using .isin to match.

Leave a Reply

Your email address will not be published. Required fields are marked *