HTML table to json dict with BeautifulSoup Python
Question:
I have the following HTML data:
<table>
<tbody>
<tr>
<th class="left" colspan="7">
<p>Some text</p>
</th>
</tr>
<tr>
<td class="left print-wide" colspan="2"> </td>
<td class="print-wide" colspan="13">some-text</td>
</tr>
<tr>
<td class="left"><br /></td>
<td><strong>ABC </strong></td>
<td><strong>≤25%</strong></td>
<td><strong>≤75%</strong></td>
<td><strong>≤100%</strong></td>
</tr>
<tr>
<td class="left">1 month</td>
<td>3,93%</td>
<td>4,05%</td>
<td>4,09%</td>
<td>4,18%</td>
</tr>
<tr>
<td class="left">3 months</td>
<td>4,12%</td>
<td>4,24%</td>
<td>4,28%</td>
<td>4,37%</td>
</tr>
<tr>
<td class="left">6 months</td>
<td>4,23%</td>
<td>4,35%</td>
<td>4,39%</td>
<td>4,48%</td>
</tr>
</tbody>
</table>
I want to convert that to:
{
"1 month": {
"ABC": "3,93%",
"≤25%": "4,05%",
"≤75%": "4,09%",
"≤100%": "4,18%"
},
"3 month": {
"ABC": "4,12%",
"≤25%": "4,24%",
"≤75%": "4,28%",
"≤100%": "4,37%"
},
"6 month": {
"ABC": "4,23%",
"≤25%": "4,35%",
"≤75%": "4,39%",
"≤100%": "4,48%"
}
}
I made the following, it creates a list with the months:
soup = BeautifulSoup(body, "html.parser")
table = soup.find("table")
headers = [header.text for header in table.find_all('td', class_="left")]
del headers[:2]
print(headers)
Prints out:
['1 month', '3 month', '6 month']
Now I have to iterate over that list and create the data I want to have, but I am stuck, I tried several things but with no luck. Can anyone help me in the right direction?
Answers:
Try:
headers = [s.get_text(strip=True) for s in soup.select("strong")]
out = {}
for tr in soup.select("tr:-soup-contains(month)"):
out[tr.td.text] = {k: v.text for k, v in zip(headers, tr.select("td")[1:])}
print(out)
Prints:
{
"1 month": {
"ABC": "3,93%",
"≤25%": "4,05%",
"≤75%": "4,09%",
"≤100%": "4,18%",
},
"3 months": {
"ABC": "4,12%",
"≤25%": "4,24%",
"≤75%": "4,28%",
"≤100%": "4,37%",
},
"6 months": {
"ABC": "4,23%",
"≤25%": "4,35%",
"≤75%": "4,39%",
"≤100%": "4,48%",
},
}
I have the following HTML data:
<table>
<tbody>
<tr>
<th class="left" colspan="7">
<p>Some text</p>
</th>
</tr>
<tr>
<td class="left print-wide" colspan="2"> </td>
<td class="print-wide" colspan="13">some-text</td>
</tr>
<tr>
<td class="left"><br /></td>
<td><strong>ABC </strong></td>
<td><strong>≤25%</strong></td>
<td><strong>≤75%</strong></td>
<td><strong>≤100%</strong></td>
</tr>
<tr>
<td class="left">1 month</td>
<td>3,93%</td>
<td>4,05%</td>
<td>4,09%</td>
<td>4,18%</td>
</tr>
<tr>
<td class="left">3 months</td>
<td>4,12%</td>
<td>4,24%</td>
<td>4,28%</td>
<td>4,37%</td>
</tr>
<tr>
<td class="left">6 months</td>
<td>4,23%</td>
<td>4,35%</td>
<td>4,39%</td>
<td>4,48%</td>
</tr>
</tbody>
</table>
I want to convert that to:
{
"1 month": {
"ABC": "3,93%",
"≤25%": "4,05%",
"≤75%": "4,09%",
"≤100%": "4,18%"
},
"3 month": {
"ABC": "4,12%",
"≤25%": "4,24%",
"≤75%": "4,28%",
"≤100%": "4,37%"
},
"6 month": {
"ABC": "4,23%",
"≤25%": "4,35%",
"≤75%": "4,39%",
"≤100%": "4,48%"
}
}
I made the following, it creates a list with the months:
soup = BeautifulSoup(body, "html.parser")
table = soup.find("table")
headers = [header.text for header in table.find_all('td', class_="left")]
del headers[:2]
print(headers)
Prints out:
['1 month', '3 month', '6 month']
Now I have to iterate over that list and create the data I want to have, but I am stuck, I tried several things but with no luck. Can anyone help me in the right direction?
Try:
headers = [s.get_text(strip=True) for s in soup.select("strong")]
out = {}
for tr in soup.select("tr:-soup-contains(month)"):
out[tr.td.text] = {k: v.text for k, v in zip(headers, tr.select("td")[1:])}
print(out)
Prints:
{
"1 month": {
"ABC": "3,93%",
"≤25%": "4,05%",
"≤75%": "4,09%",
"≤100%": "4,18%",
},
"3 months": {
"ABC": "4,12%",
"≤25%": "4,24%",
"≤75%": "4,28%",
"≤100%": "4,37%",
},
"6 months": {
"ABC": "4,23%",
"≤25%": "4,35%",
"≤75%": "4,39%",
"≤100%": "4,48%",
},
}