Iterate through a string and append static values to a list for every occurrence of substring (Python)

Question:

I’m currently stuck on some basic Python. I currently have a very long html string that looks something like this:

<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>, <relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00

I want to iterate through and, at every substring = "datetime", store the date that follows.

My current implementation is that I have two lists. One list stores the index of the .find() method for datetimes so:

datetime_indexes = list(get_all_updates(string, "datetime"))
print(datetime_indexes)
#output 36, 168 etc

Next, I have a loop to go through the string and, if the index that I’m currently on in that string matches a value stored in my index list, append the datetime value to a new list.

count = 0
all_datetimes = []
for i in string:
    if string.index(i) is datetime_indexes[count]:
        all_datetimes.append(string[string.index(i) + 10:(string.index(i) + 10 + 21)])
        count = count + 1

Currently, it outputs the first "datetime" value that I’m looking for:

#output
#2023-03-07T02:38:29Z

The desired result here would be to show all datetime values, so:

#desired output
2023-03-07T02:38:29
2023-03-06T10:25:38
Asked By: TripleCute

||

Answers:

This is what Beautiful Soup was made to do:

python -m pip install beautifulsoup4

Then you can do:

from bs4 import BeautifulSoup

html_text = """
<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>,
<relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00">asdf</relative-time>
"""

soup = BeautifulSoup(html_text, "html.parser")
date_list = [tag["datetime"] for tag in soup.findAll(attrs={"datetime" : True})]
print(date_list)

That will give you:

['2023-03-07T02:38:29Z', '2023-03-06T10:25:38-07:00']

Since you were already using BeautifulSoup, I think the key part here is find_all("relative-time") being replaced with findAll(attrs={"datetime" : True}) to get all tags with an attribute datetime

Answered By: JonSG
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.