How can i read csv from zip file python?

Question:

I am trying to read csv which is in zip file. My task is to read the file rad_15min.csv file but the issue is when i read zip file (I copied link address by clicking on download button) it gives me error:

Code:

import pandas as pd
df = pd.read_csv('https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7')

Error:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 2

Data: https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich

Zip file Link: https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7

I have to read this csv dynamically, I dont want to download it, All just to make a download link and then read csv dynamically. Is there any other approach which i can try ?

Asked By: Hamza

||

Answers:

For me, it’s forwarding to the HTML page, instead of downloading.
Why not use the kaggle API that is provided? (You need first to provide a token)

this is what i tried:

import csv
import requests

url = 'https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7'

# Open the URL and create a response object
response = requests.get(url)

# Create a CSV reader object
csv_reader = csv.reader(response.iter_lines(decode_unicode=True), delimiter=',')

# Iterate over each row in the CSV file
for row in csv_reader:
    # Process each row as needed
    print(row)

What i got back is this:

[]
[]
['<!DOCTYPE html>']
['<html lang="en">']
[]
['<head>']
['  <title>Bike Traffic in Munich | Kaggle</title>']
['  <meta charset="utf-8" />']
['    <meta name="robots" content="index', ' follow" />']
['  <meta name="description" content="Bike traffic measured over time at different stations in Munich." />']
['  <meta name="turbolinks-cache-control" content="no-cache" />']
Answered By: physicsuser

I tried using kaggle API.. but i dont want to download the data, just read dynamically.
I want to read only 1 file in a zip named as rad15_min.csv, with pandas

You can try making a request with the __Host-KAGGLEID cookie.

I’m not sure if there is a programatic way to get this one but you can always hardcode it. On your keyboard, press (CTRL+SHIFT+I) to open the Developer Tools of your browser and go to Applications/Cookies and copy the concerned cookie (and make sure you’re logged-in before in kaggle).

import requests

url = "https://www.kaggle.com/datasets/" 
      "lucafrance/bike-traffic-in-munich/" 
      "download?datasetVersionNumber=7"

cookies = {"__Host-KAGGLEID": "CfDJ8IPkmlRqhQhDn1PidxljKKQWcrozwJuFfsIn..."}

response = requests.get(url, cookies=cookies)

from zipfile import ZipFile
from io import BytesIO

with ZipFile(BytesIO(response.content)) as zf:
    df = pd.read_csv(zf.open("rad_15min.csv")) # not rad15_min.csv

NB : If the zip has only one csv OR if the dataset is not an archive (i.e, a single csv), you can pass BytesIO(response.content) directly to read_csv.

Output :

print(df)

              datum uhrzeit_start  ... richtung_2 gesamt
0        2017.01.01         00:00  ...          0      0
1        2017.01.01         00:00  ...          0      0
2        2017.01.01         00:00  ...          0      0
3        2017.01.01         00:00  ...          0      0
4        2017.01.01         00:00  ...          0      0
...             ...           ...  ...        ...    ...
1255761  2022.12.31         23:45  ...          2      7
1255762  2022.12.31         23:45  ...          0      0
1255763  2022.12.31         23:45  ...          0      0
1255764  2022.12.31         23:45  ...          0      0
1255765  2022.12.31         23:45  ...          5     17

[1255766 rows x 7 columns]
Answered By: Timeless
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.