Uploading Csv FIle Google colab

Question:

So I have a 1.2GB csv file and to upload it to google colab it is taking over an hour to upload. Is that normal? or am I doing something wrong?

Code:

from google.colab import files
uploaded = files.upload()

df = pd.read_csv(io.BytesIO(uploaded['IF 10 PERCENT.csv']), index_col=None)

Thanks.

Asked By: Ali Youssef

||

Answers:

files.upload is perhaps the slowest method to transfer data into Colab.

The fastest is syncing using Google Drive. Download the desktop sync client. Then, mount your Drive in Colab and you’ll find the file there.

A middle ground that is faster than files.upload but still slower than Drive is to click the upload button in the file browser.
enter image description here

Answered By: Bob Smith

1.2 GB is huge dataset and if you upload this huge dataset it take time no question at all. Previously i worked on one of my project and i face this same problem. There are multiple ways to handel this problem.

Solution 1:

Try to get your dataset in google drive and start doing your project in google colab. In colab you can mount your drive and just use file path and it works.

from google.colab import files
uploaded = files.upload()

df = pd.read_csv('Enter file path')

Solution 2:

I believe that you used this dataset for a machine learning project. So for developing the initial model, your first task is to check whether your model is working or not so what you do, you just open your CSV file in Excel and copy the first 500 or 1000 thousand rows and paste into another excel sheet and make small dataset and work with that dataset. Once you find everything is working then uploads your full dataset and train your model on it.

This technique is little bit tedious because you have to take care about EDA and Feature Engineering stuff, when you upload entire 1.2 GB dataset. Apart from that everything is fine and it work.

NOTE: This techinique very helpful when your first priority is performing experiment, because loading huge dataset and then start working is very time comsuming process.