csv.writer not writing entire output to CSV file

Question:

I am attempting to scrape the artists’ Spotify streaming rankings from Kworb.net into a CSV file and I’ve nearly succeeded except I’m running into a weird issue.

The code below successfully scrapes all 10,000 of the listed artists into the console:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    print(beautified_value)

    if len(beautified_value) == 0:
        continue

    rows.append(beautified_value)

The issue arises when I use the following code to save the output to a CSV file:

with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)
    writer.writerow(headers)
    writer.writerows(rows)

For whatever reason, only 738 of the artists are saved to the file. Does anyone know what could be causing this?

Thanks so much for any help!

Asked By: nellygrl

||

Answers:

The issue with your code is that you are using the print statement to display the data on the console, but this is not included in the rows list that you are writing to the CSV file. Instead, you need to append the data to the rows list before writing it to the CSV file.

Here is how you can modify your code to fix this issue:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
value = row.find_all('td')
beautified_value = [dp.text.strip() for dp in value]
# Append the data to the rows list
rows.append(beautified_value)

Write the data to the CSV file

with open('artist_rankings.csv', 'w', newline="") as output:
writer = csv.writer(output)
writer.writerow(headers)
writer.writerows(rows)

In this modified code, the data is first appended to the rows list, and then it is written to the CSV file. This will ensure that all of the data is saved to the file, and not just the first 738 rows.

Note that you may also want to add some error handling to your code in case the request to the URL fails, or if the HTML of the page is not in the expected format. This will help prevent your code from crashing when it encounters unexpected data. You can do this by adding a try-except block to your code, like this:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://kworb.net/spotify/artists.html"

try:
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')

table = soup.find('table', id="spotifyartistindex")

if table is None:
    raise Exception("Could not find table with id 'spotifyartistindex'")

header_tags = table.find_all('th')
headers = [header.text.strip() for header in header_tags]

rows = []
data_rows = table.find_all('tr')

for row in data_rows:
    value = row.find_all('td')
    beautified_value = [dp.text.strip() for dp in value]
    # Append the data to the rows list
    rows.append(beautified_value)

# Write the data to the CSV file
with open('artist_rankings.csv', 'w', newline="") as output:
    writer = csv.writer(output)
Answered By: Boatti

As an alternative approach, you might want to make your life easier next time and use pandas.

Here’s how:

import requests
import pandas as pd

source = requests.get("https://kworb.net/spotify/artists.html")
df = pd.concat(pd.read_html(source.text, flavor="bs4"))
df.to_csv("artists.csv", index=False)

This outputs a .csv file with 10,000 artists.

enter image description here

Answered By: baduker
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.