How can I match the start and end in Python's regex?

Question

I have a string and I want to match something at the start and end with a single search pattern. How can this be done?

Let’s say we have a string like:

 string = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"

I want to do something like this:

 re.search("^ftp:// & .jpg$" ,string)

Obviously, it’s incorrect, but I hope it gets my point across. Is this possible?

Asked By: user427390

||

Source

Answer 1

How about not using a regular expression at all?

if string.startswith("ftp://") and string.endswith(".jpg"):

Don’t you think this reads nicer?

You can also support multiple options for start and end:

if (string.startswith(("ftp://", "http://")) and 
    string.endswith((".jpg", ".png"))):

Answered By: Sven Marnach

Answer 2

Try

 re.search(r'^ftp://.*.jpg$' ,string)

if you want a regular expression search. Note that you have to escape the period because it has a special meaning in regular expressions.

Answered By: Howard

Answer 3

Don’t be greedy, use ^ftp://(.*?).jpg$

Answered By: JKirchartz

Answer 4

re.match will match the string at the beginning, in contrast to re.search:

re.match(r'(ftp|http)://.*.(jpg|png)$', s)

Two things to note here:

r'' is used for the string literal to make it trivial to have backslashes inside the regex
string is a standard module, so I chose s as a variable
If you use a regex more than once, you can use r = re.compile(...) to built the state machine once and then use r.match(s) afterwards to match the strings

If you want, you can also use the urlparse module to parse the URL for you (though you still need to extract the extension):

>>> allowed_schemes = ('http', 'ftp')
>>> allowed_exts = ('png', 'jpg')
>>> from urlparse import urlparse
>>> url = urlparse("ftp://www.somewhere.com/over/the/rainbow/image.jpg")
>>> url.scheme in allowed_schemes
True
>>> url.path.rsplit('.', 1)[1] in allowed_exts
True

Answered By: Niklas B.

Answer 5

import re

s = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"
print(re.search("^ftp://.*.jpg$", s).group(0))

Answered By: Roman Bataev

Answer 6

I want extract all numeric, include int and float.

and it works for me.

import re

s = '[11-09 22:55:41] [INFO ]  [  4560] source_loss: 0.717, target_loss: 1.279, 
transfer_loss:  0.001, total_loss:  0.718'

print([float(s) if '.' in s else int(s) for s in re.findall(r'-?d+.?d*', s)])

refs: https://www.tutorialspoint.com/How-to-extract-numbers-from-a-string-in-Python

Answered By: Colin Wang

Answer 7

I had a similar issue and here’s what I came up with.

If you are looking for a substring within a string, you can use the string.find() method to see where in the string your substring starts, and where it ends.

You should, in theory, use the same variable name here for all the variables named x_text in my code, and the same variable for those labeled substring_start or substring_end.
This would be the more memory-efficient method, but I have named them differently in hopes of making this as clear as possible.

Let x = a string that represents the start of the substring you’re searching for, and let y = the same for the end of that substring.

full_text=yourstring

substring_start=full_text.find(x)  
# This will return the index of where your starting indicator first appears in your full string

backend_text=full_text[substring_start:]
# This truncates your string to start only where you indicated

substring_end=backend_text.find(y)
# This will find the index (relative to this backend_string) where your string should end

final_text=backend_text[0:substring_end]

Here’s a working example, let’s say your string is this whole mess

<article class="product_pod">
<div class="image_container">
<a href="a-light-in-the-attic_1000/index.html"><img alt="A Light in the Attic" class="thumbnail" src="../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg"/></a>
</div>
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
<div class="product_price">
<p class="price_color">Â£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
    
        In stock
    
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>
1

The following code

title_start=full_text.find("title")
backend_text=full_text[title_start:]
title_end=backend_text.find('">')
final_text=backend_text[0:title_end]

would return:

'title="A Light in the Attic'

Answered By: BadgerTaco

How can I match the start and end in Python's regex?

Question:

Answers: