How to match to capture group 1 with regex
Question:
My goal is to capture the date from the following string:
<span class="ui_bubble_rating bubble_50"></span><span class="ratingDate relativeDate" title="November 9, 2017">Reviewed 2 days ago </span><a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24487)"><span class="ui_icon mobile-phone"></span>via mobile </a>
To do this I’m using the regex:title="(*?)"
Which returns Match (group 0): title="November 9, 2017"
Group 1: November 9, 2017
I need my match returned by regex to be just the date, what is currently group 1. Is there a simple way to do this? I am new to regex but I could find direction on this online.
Note: I’m not writing regex for the structure of a date because some strings have multiple dates and I only want the date in title. Thanks!
Answers:
You can use re.findall
:
import re
s = """
<span class="ui_bubble_rating bubble_50"></span><span class="ratingDate relativeDate" title="November 9, 2017">Reviewed 2 days ago </span><a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24487)"><span class="ui_icon mobile-phone"></span>via mobile </a>
"""
date = re.findall('title="(.*?)"', s)[0]
Output:
'November 9, 2017'
You can use negative lookahead / lookbehind instead of capture groups
(?<=title=").+?(?=")
This will ensure it starts with title, without actually selecting it
You can also use
title="K.*?(?=")
This will look for the value between title="
and "
My goal is to capture the date from the following string:
<span class="ui_bubble_rating bubble_50"></span><span class="ratingDate relativeDate" title="November 9, 2017">Reviewed 2 days ago </span><a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24487)"><span class="ui_icon mobile-phone"></span>via mobile </a>
To do this I’m using the regex:title="(*?)"
Which returns Match (group 0): title="November 9, 2017"
Group 1: November 9, 2017
I need my match returned by regex to be just the date, what is currently group 1. Is there a simple way to do this? I am new to regex but I could find direction on this online.
Note: I’m not writing regex for the structure of a date because some strings have multiple dates and I only want the date in title. Thanks!
You can use re.findall
:
import re
s = """
<span class="ui_bubble_rating bubble_50"></span><span class="ratingDate relativeDate" title="November 9, 2017">Reviewed 2 days ago </span><a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24487)"><span class="ui_icon mobile-phone"></span>via mobile </a>
"""
date = re.findall('title="(.*?)"', s)[0]
Output:
'November 9, 2017'
You can use negative lookahead / lookbehind instead of capture groups
(?<=title=").+?(?=")
This will ensure it starts with title, without actually selecting it
You can also use
title="K.*?(?=")
This will look for the value between title="
and "