Extract substring from urls stored in a pandas column

Pandas column contains a series of urls. I’d like to extract a substring from the url. MRE code below.

s = pd.Series(['https://url-location/img/xxxyyy_image1.png'])

s.apply(lambda x: x[x.find("/")+1:st.find("_")])

I’d like to extract xxxyyy and store them into a new column.

Answer

You can use

>>> s.str.extract(r'.*/([^_]+)')
        0
0  xxxyyy

See the regex demo. Details:

  • .* – zero or more chars other than line break chars as many as possible
  • / – a slash
  • ([^_]+) – Capturing group 1 (the value captured into this group will be the actual return value of Series.str.extract): one or more chars other than _ char.