Unable to scrape a website with styled-component javascript

My goal

Get basic informations from this page with using Scrapy framework, but question is no specific to this framework. Let’s take the p element inside the h1 node for exemple.


All the selections I make with the response I get from my Scrapy requests are failing to return what’s inside the h1 node.

scrapy shell 'url'
>>> 200
>>> []
Fetching the response:

When fetching the response, I see a structure i can’t really understand with all the main html markup condensed and placed just after a bunch of javascript styled-components. The file is here (ligne 1725).

My process

Testing the selector from dev-tool:

After disabling Javascript from the dev tools and testing my selector, I get the desired result. For exemple I get the <p> element inside the <h1> with a simple query //h1/p from the console.

testing the selector with scrapy shell:

Not working, see Issue

testing the selector with splash:

I get the exact same result as shown in the issue.


I can’t explain the error, but I can hopefull provide an answer to your problem

response.xpath('//*[@class="summary__StyledAddress-e4c4ok-6 zWwUF textIntent-title1"]/text()').get()

returns : ’12-14 31st Avenue, Unit 2 ‘

Which is hopefully what you need?

Dr P.