Use BeautifulSoup to get previous tag data

Question:

I’m trying to create a list of dates and a respective link associated with those dates.

Currently, I have code that looks like this:

date_list = [tag["datetime"] for tag in new_soup.findAll(attrs={"datetime" : True})]

This will get me all of the values associated with "datetime" in the new_soup html.

Now, what if, for every date that I add into this list, I wanted to add the link associated with it which happens to be in the previous tag:

html example
<a class="Link--secondary ml-2" 
data-pjax="#repo-content-pjax-container" 
data-turbo-frame="repo-content-turbo-frame" 
href="the link right here">

<relative-time class="no-wrap" 
    datetime="2023-03-07T02:38:29Z" 
    title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023
</relative-time>
Asked By: TripleCute

||

Answers:

You can try to use tag.find_previous():

from bs4 import BeautifulSoup


html_doc = '''
<a class="Link--secondary ml-2"
data-pjax="#repo-content-pjax-container"
data-turbo-frame="repo-content-turbo-frame"
href="the link right here">

</a>

<relative-time class="no-wrap"
    datetime="2023-03-07T02:38:29Z"
    title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023
</relative-time>'''

soup = BeautifulSoup(html_doc, 'html.parser')
date_list = [(tag["datetime"], tag.find_previous('a')['href']) for tag in soup.findAll(attrs={"datetime" : True})]

print(date_list)

Prints:

[('2023-03-07T02:38:29Z', 'the link right here')]
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.