How to avoid generating unwanted blank columns when looping through Pandas dataframe?

Question:

I’ve learnt much from the stackoverflow community, thanks to all of you. I can’t find an answer to this question anywhere so I’d appreciate your help.

I have created a smallish database (899 * 10) of scientific papers. I want to write a script to print the title and abstract of each, and then make a (human, non-automatable) decision to include or not include in a systematic review.

I have come close with the following script, which allows me to update the ‘decision’ column for each paper, and to save and exit so I don’t have to do it all at once.

import pandas as pd
import numpy as np

print "Please type the path of the database you would like to assess"
path = raw_input('>>> ')

data = pd.read_csv(path)

if 'jim_decision' not in data:
        data['jim_decision'] = pd.Series(np.nan)


def decision_maker(dataframe):
        current_row = 0
        while True:
                if pd.isnull(dataframe['jim_decision'][current_row]):
                        print "nn Title:nn %s n Abstract:n snnn" % (dataframe.Title[current_row], dataframe.Abstract[current_row])
                        decision = raw_input("From the title and abstract, should this article be included for review of full manuscript?nnType 'Y' or 'N', or 'Save' to exit: ")     

                        if decision == 'Save':  
                                dataframe.to_csv(path)
                                print "Your changes have been saved"
                                break
                        else: 
                                dataframe['jim_decision'][current_row] = decision
                                current_row += 1
decision_maker(data)

However, for some reason, every time I run it, I get an extra column called ‘Unnamed: [X]’, simply containing the index number, added before the first pandas column. I can’t work out where it comes from, how to get rid of it, or whether (as I presume) it risks contaminating the data.

I’m fairly new to all this, so I’m sure this isn’t very pretty or pythonic, but I’m just trying to learn to use python/pandas to make my research life easier… Any input would be gratefully received!

Asked By: Jim

||

Answers:

In case anyone comes across this in future, pandas saves the index of the data frame by default as a new column.

If you don’t want this, run include the index=False argument, for example: df.to_csv('./filepath.csv', index=False)

Answered By: Jim
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.