How to scrape the specific text from kworb and extract it as an excel file?

Question:

I’m trying to scrape the positions, the artists and the songs from a ranking list on kworb. For example: https://kworb.net/spotify/country/us_weekly.html

I used the following script:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://kworb.net/spotify/country/us_weekly.html")
content = response.content
soup = BeautifulSoup(response.content, 'html.parser')

print(soup.get_text())

And here is the output:

ITUNES
WORLDWIDE
ARTISTS
CHARTS
DON'T PRAY
RADIO
SPOTIFY
YOUTUBE
TRENDING
HOME


CountriesArtistsListenersCities




Spotify Weekly Chart - United States - 2023/02/09 | Totals


PosP+Artist and TitleWksPk(x?)StreamsStreams+Total

1
+1
SZA - Kill Bill
9
1(x5)
15,560,813
+247,052
148,792,089
2
-1
Miley Cyrus - Flowers
4
1(x3)
13,934,413
-4,506,662
75,009,251
3
+20
Morgan Wallen - Last Night
2
3(x1)
11,560,741
+6,984,649
16,136,833
...

How do I only get the positions, the artists and the songs separately and store it as an excel first?

expected output:

Pos         Artist            Songs
1           SZA               Kill Bill
2           Miley Cyrus       Flowers
3           Morgan Wallen     Last Night
...
Asked By: Hi Hi try

||

Answers:

Best practice to scrape tables is using pandas.read_html() it uses BeautifulSoup under the hood for you.

import pandas as pd

#find table by id and select first index from list of dfs
df = pd.read_html('https://kworb.net/spotify/country/us_weekly.html', attrs={'id':'spotifyweekly'})[0]

#split the column by delimiter and creat your expected columns
df[['Artist','Song']]=df['Artist and Title'].str.split(' - ', n=1, expand=True)

#pick your columns and export to excel
df[['Pos','Artist','Song']].to_excel('yourfile.xlsx', index = False)

Alternative based on direct approach:

import requests
from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(requests.get("https://kworb.net/spotify/country/hk_weekly.html").content, 'html.parser')

data = []

for e in soup.select('#spotifyweekly tr:has(td)'):
    data .append({
        'Pos':e.td.text,
        'Artist':e.a.text,
        'Song':e.a.find_next_sibling('a').text
    })
pd.DataFrame(data).to_excel('yourfile.xlsx', index = False)

Outputs

Pos Artist Song
1 SZA Kill Bill
2 Miley Cyrus Flowers
3 Morgan Wallen Last Night
4 Metro Boomin Creepin’
5 Lil Uzi Vert Just Wanna Rock
6 Drake Rich Flex
7 Metro Boomin Superhero (Heroes & Villains) [with Future & Chris Brown]
8 Sam Smith Unholy

Answered By: HedgeHog