HTML table to json dict with BeautifulSoup Python

Question:

I have the following HTML data:

<table>
  <tbody>
    <tr>
      <th class="left" colspan="7">
        <p>Some text</p>
      </th>
    </tr>
    <tr>
      <td class="left print-wide" colspan="2">  </td>
      <td class="print-wide" colspan="13">some-text</td>
    </tr>
    <tr>
      <td class="left"><br /></td>
      <td><strong>ABC   </strong></td>
      <td><strong>≤25%</strong></td>
      <td><strong>≤75%</strong></td>
      <td><strong>≤100%</strong></td>
    </tr>
    <tr>
      <td class="left">1 month</td>
      <td>3,93%</td>
      <td>4,05%</td>
      <td>4,09%</td>
      <td>4,18%</td>
    </tr>
    <tr>
      <td class="left">3 months</td>
      <td>4,12%</td>
      <td>4,24%</td>
      <td>4,28%</td>
      <td>4,37%</td>
    </tr>
    <tr>
      <td class="left">6 months</td>
      <td>4,23%</td>
      <td>4,35%</td>
      <td>4,39%</td>
      <td>4,48%</td>
    </tr>
  </tbody>
</table>

I want to convert that to:

{
    "1 month": {
        "ABC": "3,93%",
        "≤25%": "4,05%",
        "≤75%": "4,09%",
        "≤100%": "4,18%"
    },
    "3 month": {
        "ABC": "4,12%",
        "≤25%": "4,24%",
        "≤75%": "4,28%",
        "≤100%": "4,37%"
    },
    "6 month": {
        "ABC": "4,23%",
        "≤25%": "4,35%",
        "≤75%": "4,39%",
        "≤100%": "4,48%"
    }
}

I made the following, it creates a list with the months:

soup = BeautifulSoup(body, "html.parser")
table = soup.find("table")
headers = [header.text for header in table.find_all('td', class_="left")]
del headers[:2]
print(headers)

Prints out:

['1 month', '3 month', '6 month']

Now I have to iterate over that list and create the data I want to have, but I am stuck, I tried several things but with no luck. Can anyone help me in the right direction?

Asked By: C-nan

||

Answers:

Try:

headers = [s.get_text(strip=True) for s in soup.select("strong")]

out = {}
for tr in soup.select("tr:-soup-contains(month)"):
    out[tr.td.text] = {k: v.text for k, v in zip(headers, tr.select("td")[1:])}

print(out)

Prints:

{
    "1 month": {
        "ABC": "3,93%",
        "≤25%": "4,05%",
        "≤75%": "4,09%",
        "≤100%": "4,18%",
    },
    "3 months": {
        "ABC": "4,12%",
        "≤25%": "4,24%",
        "≤75%": "4,28%",
        "≤100%": "4,37%",
    },
    "6 months": {
        "ABC": "4,23%",
        "≤25%": "4,35%",
        "≤75%": "4,39%",
        "≤100%": "4,48%",
    },
}
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.