Regular expression to find all the image urls in a string

Question:

I am trying to construct a regular expression that finds all image urls from a string.
An image url can be either absolute path or relative.

All these should be valid matches:

 ../example/test.png
   
 https://www.test.com/abc.jpg
   
 images/test.webp

For example:
if we define

inputString="img src=https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghello"

then we should find these 3 matches:

https://www.test.com/abc.jpg
../example/test.png
images/test.webp

I am currently doing this(i am using python) and it only finds absolute path, find only some of the images and also sometimes has bad matches(finds a string that has an image url inside but adds to it a lot of stuff that is after the image url)

imageurls = re.findall(r'(?:"|')((?:https?://|/)S+.(?:jpg|png|gif|jpeg|webp))(?:"|')', inputString)
Asked By: AJ222

||

Answers:

You can try:

(?i)https?[^<>s'"=]+(?:jpg|png|webp)b|[^:<>s'"=]+(?:jpg|png|webp)b

Regex demo.


import re

s = '''img src=https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghellobackground-image: url('../images/pics/mobile/img.JPG')'''
pat = re.compile(r'(?i)https?[^<>s'"=]+(?:jpg|png|webp)b|[^:<>s'"=]+(?:jpg|png|webp)b')

for m in pat.findall(s):
    print(m)

Prints:

https://www.test.com/abc.jpg
../example/test.png
images/test.webp
../images/pics/mobile/img.JPG
Answered By: Andrej Kesely

What do you think of that :

re.findall(r'(?=:[^S])?(?:https?://)?[./]*[w/.]+.(?:jpg|png|gif|jpeg|webp)', inputString)

With:

"img src=http://another.org/hola.gif https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghello"

Gives :

 ['http://another.org/hola.gif',
 'https://www.test.com/abc.jpg',
 '../example/test.png',
 'images/test.webp',
 'image.png']

This probably needs more test samples 🙂

Answered By: cidonia
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.