Beautiful Soup web scraping first character a point
Question:
I tried to scrape this:
<table class="info">
<tr class="info"><th align="center" class="info" colspan="2">Nachrichten zum Tag</th></tr>
<br/>.</td></tr>><td class="info" colspan="2">Information: Room.
<br/>.</td></tr>2023 07:45 Uhr. colspan="2">Update:
<br/>.</td></tr>><td class="info" colspan="2">Heute.
</table>
with beautiful soup and this function: soup.get_text() but the first letter becomes a .
output:
.nformation: Room. .r., 27.01.2023 07:45 Uhr. .eute.
expected output:
Information: Room. Update: Heute.
Answers:
First, the HTML script seems to be damaged.
I think the correct HTML you need is.
<table class="info">
<tr class="info">
<th align="center" class="info" colspan="2">Nachrichten zum Tag</th>
</tr>
<tr>
<td class="info" colspan="2">Information: Room. </td>
</tr>
<tr>
<td class="info" colspan="2">Update: Heute. </td>
</table>
Beautiful soup will behave the way you want it to with this HTML.
I tried to scrape this:
<table class="info">
<tr class="info"><th align="center" class="info" colspan="2">Nachrichten zum Tag</th></tr>
<br/>.</td></tr>><td class="info" colspan="2">Information: Room.
<br/>.</td></tr>2023 07:45 Uhr. colspan="2">Update:
<br/>.</td></tr>><td class="info" colspan="2">Heute.
</table>
with beautiful soup and this function: soup.get_text() but the first letter becomes a .
output:
.nformation: Room. .r., 27.01.2023 07:45 Uhr. .eute.
expected output:
Information: Room. Update: Heute.
First, the HTML script seems to be damaged.
I think the correct HTML you need is.
<table class="info">
<tr class="info">
<th align="center" class="info" colspan="2">Nachrichten zum Tag</th>
</tr>
<tr>
<td class="info" colspan="2">Information: Room. </td>
</tr>
<tr>
<td class="info" colspan="2">Update: Heute. </td>
</table>
Beautiful soup will behave the way you want it to with this HTML.