How to remove big spaces in my scraped texts?

Question:

I am trying to remove big spaces from the code result:

from bs4 import BeautifulSoup
import requests


url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text())

Result after running code :

Character Information




Name
Something


Level
28


Last online

                    about 6 years ago



Born
September 03, 2016


string() is not working, I think it’s because beautifulsoup

Asked By: mr. one

||

Answers:

One line answer:

print("n".join([s for s in table.get_text().split("n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
                    about 6 years ago
Born
September 03, 2016

And to remove trailing and leading spaces

print("n".join([s.strip() for s in table.get_text().split("n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016

Alternatively you can utilize BeautifulSoup‘s get_text() to do the same:

print(table.get_text("n", strip=True))

Output:

Character Information
Name
Something
Level
28
Last online
                    about 6 years ago
Born
September 03, 2016
Answered By: rafathasan

Since you are using BeautifulSoup. You can do this,

table_values = [item.text.strip() for item in table.find_all('tr')]
for item in table_values:
    print(item.replace('n', ''))

Output

Character Information
NameSomething
Level28
Last online                    about 6 years ago
BornSeptember 03, 2016
Answered By: Rahul K P

There is no need of regex or join() of list comprehension results – Simply use standard parameters of get_text():

table.get_text('n',strip=True)

Example

from bs4 import BeautifulSoup
import requests

url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text('n',strip=True))

Output

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016
Answered By: HedgeHog