How to find quoted strings except when wrapped in inline code or code block?

I would like to find all occurrences of /".*?"/ except when wrapped in inline code (single backtick) or code block (triple backtick).

This is what I have so far (doesn’t work as expected).

/(?<!(`|```))".*?"(?!1)/g

In the following markdown snippet, I would like to only find "rabbit hole". Unfortunately, I cannot include an example of a code block (I don’t know how to escape nested triple backticks), but the same logic applies.

When copy/pasting commands that start with `cat << "EOF"`, select all lines at once (from `cat << "EOF"` to `EOF` inclusively) as they are part of the same (single) command

Figuring out how to ignore the above quoted strings is an interesting "rabbit hole".

Answer

It is not possible to match just the strings between double quotes outside of single or triple backticks with plain JavaScript regex. It is possible with PCRE and Python PyPi regex library because they support the (*SKIP)(*F) construct.

In JavaScript, you can join the regex and the code to get what you need:

text.replace(/(?<!`)(`(?:`{2})?)(?:(?!1).)*?1|"([^"]*)"/g, 
      (x,y,z) => z ? `‘${z}’` : x)

See the regex demo, once the z is matched (Group 2), the match is valid and you may replace the quotes with curly quotes, else, return x, the whole match value.

Regex details

  • (?<!`)(`(?:`{2})?)(?:(?!1).)*?1:
    • (?<!`) – no backtick immediately to the left is allowed
    • (`(?:`{2})?) – Group 1: a backtick and then an optional double backtick sequence
    • (?:(?!1).)*? – any char other than a line break char, zero or more occurrences but as few as possible, that does not start the same char sequence that is captured in Group 1
    • 1 – the same char sequence that is captured in Group 1
  • | – or
  • "([^"]*)"", Group 2: any zero or more chars other than ", and then a ".

Leave a Reply

Your email address will not be published. Required fields are marked *