Python pandas: fill a dataframe row by row


The simple task of adding a row to a pandas.DataFrame object seems to be hard to accomplish. There are 3 stackoverflow questions relating to this, none of which give a working answer.

Here is what I’m trying to do. I have a DataFrame of which I already know the shape as well as the names of the rows and columns.

>>> df = pandas.DataFrame(columns=['a','b','c','d'], index=['x','y','z'])
>>> df
     a    b    c    d
x  NaN  NaN  NaN  NaN
y  NaN  NaN  NaN  NaN
z  NaN  NaN  NaN  NaN

Now, I have a function to compute the values of the rows iteratively. How can I fill in one of the rows with either a dictionary or a pandas.Series ? Here are various attempts that have failed:

>>> y = {'a':1, 'b':5, 'c':2, 'd':3} 
>>> df['y'] = y
AssertionError: Length of values does not match length of index

Apparently it tried to add a column instead of a row.

>>> y = {'a':1, 'b':5, 'c':2, 'd':3} 
>>> df.join(y)
AttributeError: 'builtin_function_or_method' object has no attribute 'is_unique'

Very uninformative error message.

>>> y = {'a':1, 'b':5, 'c':2, 'd':3} 
>>> df.set_value(index='y', value=y)
TypeError: set_value() takes exactly 4 arguments (3 given)

Apparently that is only for setting individual values in the dataframe.

>>> y = {'a':1, 'b':5, 'c':2, 'd':3} 
>>> df.append(y)
Exception: Can only append a Series if ignore_index=True

Well, I don’t want to ignore the index, otherwise here is the result:

>>> df.append(y, ignore_index=True)
     a    b    c    d
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3    1    5    2    3

It did align the column names with the values, but lost the row labels.

>>> y = {'a':1, 'b':5, 'c':2, 'd':3} 
>>> df.ix['y'] = y
>>> df
                                  a                                 b  
x                               NaN                               NaN
y  {'a': 1, 'c': 2, 'b': 5, 'd': 3}  {'a': 1, 'c': 2, 'b': 5, 'd': 3}
z                               NaN                               NaN

                                  c                                 d
x                               NaN                               NaN
y  {'a': 1, 'c': 2, 'b': 5, 'd': 3}  {'a': 1, 'c': 2, 'b': 5, 'd': 3}
z                               NaN                               NaN

That also failed miserably.

So how do you do it ?

Asked By: xApple



df['y'] will set a column

since you want to set a row, use .loc

Note that .ix is equivalent here, yours failed because you tried to assign a dictionary
to each element of the row y probably not what you want; converting to a Series tells pandas
that you want to align the input (for example you then don’t have to to specify all of the elements)

In [6]: import pandas as pd

In [7]: df = pd.DataFrame(columns=['a','b','c','d'], index=['x','y','z'])

In [8]: df.loc['y'] = pd.Series({'a':1, 'b':5, 'c':2, 'd':3})

In [9]: df
     a    b    c    d
x  NaN  NaN  NaN  NaN
y    1    5    2    3
z  NaN  NaN  NaN  NaN
Answered By: Jeff

This is a simpler version

import pandas as pd
df = pd.DataFrame(columns=('col1', 'col2', 'col3'))
for i in range(5):
   df.loc[i] = ['<some value for first>','<some value for second>','<some value for third>']`
Answered By: Satheesh

Update: because append has been deprecated

df = pd.DataFrame(columns=["firstname", "lastname"])

entry = pd.DataFrame.from_dict({
     "firstname": ["John"],
     "lastname":  ["Johny"]

df = pd.concat([df, entry], ignore_index=True)
Answered By: flow

If your input rows are lists rather than dictionaries, then the following is a simple solution:

import pandas as pd
list_of_lists = []

pd.DataFrame(list_of_lists, columns=['A', 'B', 'C'])
#    A  B  C
# 0  1  2  3
# 1  4  5  6

The logic behind the code is quite simple and straight forward

Make a df with 1 row using the dictionary

Then create a df of shape (1, 4) that only contains NaN and has the same columns as the dictionary keys

Then concatenate a nan df with the dict df and then another nan df

import pandas as pd
import numpy as np

raw_datav = {'a':1, 'b':5, 'c':2, 'd':3} 

datav_df = pd.DataFrame(raw_datav, index=[0])

nan_df = pd.DataFrame([[np.nan]*4], columns=raw_datav.keys())

df = pd.concat([nan_df, datav_df, nan_df], ignore_index=True)

df.index = ["x", "y", "z"]



a    b    c    d
x  NaN  NaN  NaN  NaN
y  1.0  5.0  2.0  3.0
z  NaN  NaN  NaN  NaN

[Program finished]
Answered By: Subham
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.