Why does my google colab session keep crashing?

Question:

I am using google colab on a dataset with 4 million rows and 29 columns. When I run the statement sns.heatmap(dataset.isnull()) it runs for some time but after a while the session crashes and the instance restarts. It has been happening a lot and I till now haven’t really seen an output. What can be the possible reason ? Is the data/calculation too much ? What can I do ?

Asked By: Callmeat911 True

||

Answers:

I’m not sure what is causing your specific crash, but a common cause is an out-of-memory error. It sounds like you’re working with a large enough dataset that this is probable. You might try working with a subset of the dataset and see if the error recurs.

Otherwise, CoLab keeps logs in /var/log/colab-jupyter.log. You may be able to get more insight into what is going on by printing its contents. Either run:

!cat /var/log/colab-jupyter.log

Or, to get the messages alone (easier to read):

import json

with open("/var/log/colab-jupyter.log", "r") as fo:
  for line in fo:
    print(json.loads(line)['msg'])
Answered By: Sam

Another cause – if you’re using PyTorch and assign your model to the GPU, but don’t assign an internal tensor to the GPU (e.g. a hidden layer).

Answered By: user1114

For me, passing specific arguments to the tfms augmentation failed the dataloader and crahed the session.
Wasted lot of time checking the images not coruppt and clean the gc and more…

Answered By: yoavs

This error mostly comes if you enable the GPU but do not using it. Change your runtime type to "None". You will not face this issue again.

Answered By: Muhammad Talha

I would first suggest closing your browser and restarting the notebook. Look at the run time logs and check to see if cuda is mentioned anywhere. If not then do a factory runtime reset and run the notebook. Check your logs again and you should find cuda somewhere there.

Answered By: RAP

What worked for me was to click on the RAM/Disk Resources drop down menu, then ‘Manage Sessions’ and terminate my current session which had been active for days. Then reconnect and run everything again.

Before that, my code kept crashing even though it was working perfectly the previous day, so I knew there was nothing wrong coding wise.

After doing this, I also realized that the parameter n_jobs in GridSearchCV (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) plays a massive role in GPU RAM consumption. For example, for me it works fine and execution doesn’t crash if n_jobs is set to None, 1 (same as None), or 2. Setting it to -1 (using all processors) or >3 crashes everything.

Answered By: Amadeo Amadei