I have been gathering book previews from google books and from amazon manually on Firefox using the debugger tools Inspect element, under the Network tab with an image filter. It is tedious so I wanted to automate if I could.
I found a convenient tool written in C (getxbook) with three utilities: one for google, another for amazon, and a third for barnes and noble. Only the google utility seems to work.
I’m attempting to understand the request URL for the amazon images so that I can automate in Node.js.
Here is the URL for a hi-res book image:
From this page, it is clear that Amazon is using CloudFront signed URLs in order to secure the transaction.
First the pieces that I understand:
- 1405193557 is the ISBN10
- S00R is the page number (page 14 in this case), the next will be S00S (page 15), etc.
- JUMBOXXX gives the hi-res (800×1205) [XXXXXXXX gives the low-res (600×903)]
- Key-Pair-Id: is the same for all pages
- Expires: increments of time in seconds
- Signature: 172 characters, always ends with ‘=’, seems to be base64 encoding
The pieces I need to understand still are:
- Signature: at some point in the loading, the library (sitb-library-js.js) fetches all of the image urls from the server, with the signatures pregenerated
What I’d like is a way to insert code that prints the list of jumboImageUrls once it has been populated. I’ve got no idea how to go forward with this – greasemonkey?
Any thoughts or experience appreciated.
A post is made to https://www.amazon.ca/gp/search-inside/service-data . The response holds a list of image URLs with signature.