How can I set index while converting dictionary to dataframe?

Question:

I have a dictionary that looks like the below

defaultdict(list,
        {'Open': ['47.47', '47.46', '47.38', ...],
         'Close': ['47.48', '47.45', '47.40', ...],
         'Date': ['2016/11/22 07:00:00', '2016/11/22 06:59:00','2016/11/22 06:58:00', ...]})

My purpose is to convert this dictionary to a dataframe and to set the ‘Date’ key values as the index of the dataframe.

I can do this job by the below commands

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])
df.index = df.Date

Output:

               Date                  Date    Open   Close
2016/11/22 07:00:00   2016/11/22 07:00:00   47.47   47.48
2016/11/22 06:59:00   2016/11/22 06:59:00   47.46   47.45
2016/11/22 06:58:00   2016/11/22 06:58:00   47.38   47.38

but, then I have two ‘Date’ columns, one of which is the index and the other is the original column.

Is there any way to set index while converting dictionary to dataframe, without having overlapping columns like the below?

               Date   Close    Open
2016/11/22 07:00:00   47.48   47.47
2016/11/22 06:59:00   47.45   47.46
2016/11/22 06:58:00   47.38   47.38
Asked By: maynull

||

Answers:

Use set_index:

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])  
df = df.set_index('Date')       
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40

Or use inplace:

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])  
df.set_index('Date', inplace=True)       
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40

Another possible solution filter out dict by Date key and then set index by dictionary['Date']:

df = pd.DataFrame({k: v for k, v in dictionary.items() if not k == 'Date'}, 
                   index=dictionary['Date'], 
                   columns=['Open','Close'])  
df.index.name = 'Date'
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40
Answered By: jezrael

If the original dictionary is not needed, then an alternative is to simply pop the Date key.

df = pd.DataFrame(mydict, index=pd.Series(mydict.pop('Date'), name='Date'))

That said, I think set_index is the more convenient and less verbose option that can be called immediately on the newly created frame:

df = pd.DataFrame(mydict).set_index('Date')

res

Answered By: cottontail
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.