Unicode characters cannot be decoded

I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.

One example is the following:

https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cbu003d20141113231800u0026path-prefixu003dde

There are unicode characters such as u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).

I have tried lots of things, but nothing works.

  • JSON.parse(JSON.stringify(...))
  • String.prototype.normalize()
  • decodeURIComponent

Curiously, the regular expression for “u003d” (i.e. “\u003d” in js) does not match that string above, but “u003d” does.

This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.

I hope that someone can help me on this.

Answer

Just to mark this one as answered. Thomas replied:

JSON.parse(`"${url}"`)

Leave a Reply

Your email address will not be published. Required fields are marked *