How to omit some results while web scraping?

I’m a newbie at web scraping and wanted to get a list of hotels and prices from booking.com But the page uses javascript and updates some hotels as ‘sold out’ after the page loads therefore those entries do not have a price. Due to this the list which iss retrieved has the prices shifted one up or more depending upon how many hotels went ‘sold out’.

This is the code im using:

from selenium import webdriver
chrome_path = r"C:UsersshiksDesktopchromedriver_win32chromedriver.exe"
dr = webdriver.Chrome(chrome_path)
dr.get("https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNYBGhsiAEBmAExwgEKd2luZG93cyAxMMgBDNgBAegBAfgBApICAXmoAgM;sid=83ab0db61cc2291ed9a9875978c46395;checkin_month=8&checkin_monthday=7&checkin_year=2018&checkout_month=8&checkout_monthday=8&checkout_year=2018&class_interval=1&dest_id=20088325&dest_type=city&dtdisc=0&from_sf=1&group_adults=2&group_children=0&inac=0&index_postcard=0&label_click=undef&no_rooms=1&offset=0&postcard=0&raw_dest_type=city&room1=A%2CA&sb_price_type=total&search_selected=1&src=index&src_elem=sb&ss=New%20York%2C%20New%20York%20State%2C%20USA&ss_all=0&ss_raw=new%20yor&ssb=empty&sshis=0&")
hotel = dr.find_elements_by_class_name("sr-hotel__name")
price = dr.find_elements_by_css_selector("strong.price")
for hotel1, price1 in zip(hotel, price):
    print(hotel.text + " - " + price1.text)

This is the output i get:

The Watson Hotel - ₹ 14,824
citizenM New York Times Square - ₹ 21,984
Studio Plus - Midtown Spacious Apartment - ₹ 21,984
HGU New York - ₹ 22,397
Comfortable 2 bedroom by Wall street - ₹ 31,632
Gansevoort Meatpacking - ₹ 20,261
MOXY NYC Times Square - ₹ 15,782
La Quinta Inn & Suites New York City Central Park - ₹ 19,165
Madison LES Hotel - ₹ 33,079
Courtyard by Marriott New York Manhattan/Central Park - ₹ 23,362
LUMA Hotel - Times Square - ₹ 16,884
The Assemblage John Street - ₹ 29,709
Broadway at Times Square Hotel - ₹ 15,919
Splendid Apartment by Times SQ - ₹ 36,456
Candlewood Suites NYC -Times Square - ₹ 20,399

But the hotel HGU New York was sold out and the ‘22397’ is the price of ‘Comfortable 2 bedroom’. How do I fix this?

Answer

I would use the same method as above but rather than the css property I would use the xpath because targeting elements using css didn’t work well for me. I really like the SelectorGadget chrome plugin for getting object xpaths: https://chrome.google.com/webstore/detail/selectorgadget/mhjhnkcfbdhnjickkkdbjoemdmbfginb

My full code:

from selenium import webdriver

chrome_path = r"C:UsersshiksDesktopchromedriver_win32chromedriver.exe"
dr = webdriver.Chrome(chrome_path)

dr.get("https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNYBGhsiAEBmAExwgEKd2luZG93cyAxMMgBDNgBAegBAfgBApICAXmoAgM;sid=83ab0db61cc2291ed9a9875978c46395;checkin_month=8&checkin_monthday=7&checkin_year=2018&checkout_month=8&checkout_monthday=8&checkout_year=2018&class_interval=1&dest_id=20088325&dest_type=city&dtdisc=0&from_sf=1&group_adults=2&group_children=0&inac=0&index_postcard=0&label_click=undef&no_rooms=1&offset=0&postcard=0&raw_dest_type=city&room1=A%2CA&sb_price_type=total&search_selected=1&src=index&src_elem=sb&ss=New%20York%2C%20New%20York%20State%2C%20USA&ss_all=0&ss_raw=new%20yor&ssb=empty&sshis=0&")

search_results = dr.find_elements_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "sr_flex_layout", " " ))]')

for result_card in search_results:

    hotel_name = result_card.find_elements_by_class_name("sr-hotel__name")[0].text

    price_obj = result_card.find_elements_by_css_selector("strong.price")

    if price_obj:
        price = price_obj[0].text
    else:
        price = 'Unknown'

    print(hotel_name, price)