What is the fastest way to filter over 2.5gb Json file?

I have 2.5 GB of JSON file with 25 columns and about 4 million rows. I try to filter the JSON with the following script it takes at least 10 minutes.

import json

product_list = ['Horse','Rabit','Cow']
year_list = ['2008','2009','2010']
country_list = ['USA','GERMANY','ITALY']

with open('./products/animal_production.json', 'r', encoding='utf8') as r:
     result = r.read()
result = json.loads(result)

for item in result[:]:
    if (not str(item["Year"]) in year_list) or (not item["Name"] in product_list) or (not item["Country"] in country_list):

I need to prepare the result in a max 1 minute so what your suggestion or the fastest way to filter json?


Removing from a list in a loop is slower, each remove is O(n) and that is done n times so O(n^2), appending to a new list is O(1) and doing this n times is O(n) in a loop. So you can try this

[item for item in result if str(item["Year"] in year_list) or (item["Name"] in product_list) or (item["Country"] in country_list)]

Filter based on the condition you need and add only those that match.