I have this code to scrape oddsportal page:
https://www.oddsportal.com/soccer/england/premier-league/
browser = webdriver.Chrome() browser.get("https://www.oddsportal.com/soccer/england/premier-league/") df= pd.read_html(browser.page_source, header=0)[0] timeList = [] dateList = [] gameList = [] home_odds = [] draw_odds = [] away_odds = [] for row in df.itertuples(): if not isinstance(row[1], str): continue elif ':' not in row[1]: date = row[1].split('-')[0] continue time = timeList.append(row[1]) dateList.append(date) gameList.append(row[2]) home_odds.append(row[4]) draw_odds.append(row[5]) away_odds.append(row[6]) result = pd.DataFrame({'date':dateList, 'time':time, 'game':gameList, 'Home':home_odds, 'Draw':draw_odds, 'Away':away_odds})
I am getting the output as:
date time game Home Draw Away -- ------------- ------ ----------------------------- ------ ------ ------ 0 Today, 08 Mar Chelsea - Everton 1.62 3.93 6.07 1 Today, 08 Mar West Ham - Leeds 2.25 3.61 3.18 2 10 Mar 2021 Manchester City - Southampton 1.22 6.94 13.75 3 12 Mar 2021 Newcastle - Aston Villa 3.8 3.59 2 4 13 Mar 2021 Leeds - Chelsea 4.45 3.97 1.77 5 13 Mar 2021 Crystal Palace - West Brom 2.1 3.34 3.77 6 13 Mar 2021 Everton - Burnley 1.84 3.61 4.54 7 13 Mar 2021 Fulham - Manchester City 10.05 5.16 1.34 8 14 Mar 2021 Southampton - Brighton 2.8 3.11 2.77 9 14 Mar 2021 Leicester - Sheffield Utd 1.5 4.34 7.06 10 14 Mar 2021 Arsenal - Tottenham 2.48 3.47 2.87 11 14 Mar 2021 Manchester Utd - West Ham 1.86 3.62 4.44 12 15 Mar 2021 Wolves - Liverpool 4.65 3.66 1.8 13 19 Mar 2021 Fulham - Leeds 2.55 3.53 2.72 14 20 Mar 2021 Brighton - Newcastle 1.76 3.39 5.58 15 21 Mar 2021 West Ham - Arsenal 2.86 3.51 2.44 16 21 Mar 2021 Aston Villa - Tottenham 3.24 3.4 2.27
I am not getting any value for time
Can anybody help me in understand if I am missing anything? Am I defining time
correctly?
Answer
timeList.append(row[1])
doesn’t return anything, so time
is always None. I suspect you want:
time = row[1] timeList.append(time)