Append an empty row in dataframe using pandas

Question

I am trying to append an empty row at the end of dataframe but unable to do so, even trying to understand how pandas work with append function and still not getting it.

Here’s the code:

import pandas as pd

excel_names = ["ARMANI+EMPORIO+AR0143-book.xlsx"]
excels = [pd.ExcelFile(name) for name in excel_names]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None).dropna(how='all') for x in excels]
for f in frames:
    f.append(0, float('NaN'))
    f.append(2, float('NaN'))

There are two columns and random number of row.

with “print f” in for loop i Get this:

                             0                 1
0                   Brand Name    Emporio Armani
2                 Model number            AR0143
4                  Part Number            AR0143
6                   Item Shape       Rectangular
8   Dial Window Material Type           Mineral
10               Display Type          Analogue
12                 Clasp Type            Buckle
14               Case Material   Stainless steel
16              Case Diameter    31 millimetres
18               Band Material           Leather
20                 Band Length  Women's Standard
22                 Band Colour             Black
24                 Dial Colour             Black
26            Special Features       second-hand
28                    Movement            Quartz

Asked By: Mansoor Akram

||

Source

Answer 1

You can add it by appending a Series to the dataframe as follows. I am assuming by blank you mean you want to add a row containing only “Nan”.
You can first create a Series object with Nan. Make sure you specify the columns while defining ‘Series’ object in the -Index parameter.
The you can append it to the DF. Hope it helps!

from numpy import nan as Nan
import pandas as pd

>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
...                     'B': ['B0', 'B1', 'B2', 'B3'],
...                     'C': ['C0', 'C1', 'C2', 'C3'],
...                     'D': ['D0', 'D1', 'D2', 'D3']},
...                     index=[0, 1, 2, 3])

>>> s2 = pd.Series([Nan,Nan,Nan,Nan], index=['A', 'B', 'C', 'D'])
>>> result = df1.append(s2)
>>> result
     A    B    C    D
0   A0   B0   C0   D0
1   A1   B1   C1   D1
2   A2   B2   C2   D2
3   A3   B3   C3   D3
4  NaN  NaN  NaN  NaN

Answered By: silent_dev

Answer 2

The code below worked for me.

df.append(pd.Series([np.nan]), ignore_index = True)

Answered By: kamal tanwar

Answer 3

Assuming df is your dataframe,

df_prime = pd.concat([df, pd.DataFrame([[np.nan] * df.shape[1]], columns=df.columns)], ignore_index=True)

where df_prime equals df with an additional last row of NaN’s.

Note that pd.concat is slow so if you need this functionality in a loop, it’s best to avoid using it.
In that case, assuming your index is incremental, you can use

df.loc[df.iloc[-1].name + 1,:] = np.nan

Answered By: Dave Reikher

Answer 4

You can add a new series, and name it at the same time. The name will be the index of the new row, and all the values will automatically be NaN.

df.append(pd.Series(name='Afterthought'))

Answered By: pocketdora

Answer 5

Add a new pandas.Series using pandas.DataFrame.append().

If you wish to specify the name (AKA the “index”) of the new row, use:

df.append(pandas.Series(name='NameOfNewRow'))

If you don’t wish to name the new row, use:

df.append(pandas.Series(), ignore_index=True)

where df is your pandas.DataFrame.

Answered By: srcerer

Answer 6

Assuming your df.index is sorted you can use:

df.loc[df.index.max() + 1] = None

It handles well different indexes and column types.

[EDIT] it works with pd.DatetimeIndex if there is a constant frequency, otherwise we must specify the new index exactly e.g:

df.loc[df.index.max() + pd.Timedelta(milliseconds=1)] = None

long example:

df = pd.DataFrame([[pd.Timestamp(12432423), 23, 'text_field']], 
                    columns=["timestamp", "speed", "text"],
                    index=pd.DatetimeIndex(start='2111-11-11',freq='ms', periods=1))
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11 Freq: L Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null int64 text 1 non-null object dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 32.0+ bytes

df.loc[df.index.max() + 1] = None
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000 Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null float64 text 1 non-null object dtypes: datetime64[ns](1), float64(1), object(1) memory usage: 64.0+ bytes

df.head()

                            timestamp                   speed      text
2111-11-11 00:00:00.000 1970-01-01 00:00:00.012432423   23.0    text_field
2111-11-11 00:00:00.001 NaT NaN NaN

Answered By: Daniel R

Answer 7

You can also use:

your_dataframe.insert(loc=0, value=np.nan, column="")

where loc is your empty row index.

Answered By: Alberto Garcia

Answer 8

Append “empty” row to data frame and fill selected cells:

Generate empty data frame (no rows just columns a and b):

import pandas as pd    
col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)

Append empty row at the end of the data frame:

df = df.append(pd.Series(), ignore_index = True)

Now fill the empty cell at the end (len(df)-1) of the data frame in column a:

df.loc[[len(df)-1],'a'] = 123

Result:

     a    b
0  123  NaN

And of course one can iterate over the rows and fill cells:

col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)
for x in range(0,5):
    df = df.append(pd.Series(), ignore_index = True)
    df.loc[[len(df)-1],'a'] = 123

Result:

     a    b
0  123  NaN
1  123  NaN
2  123  NaN
3  123  NaN
4  123  NaN

Answered By: Peter

Answer 9

@Dave Reikher‘s answer is the best solution.

df.loc[df.iloc[-1].name + 1,:] = np.nan

Here’s a similar answer without the NumPy library

df.loc[len(df.index)] = ['' for x in df.columns.values.tolist()]

len(df.index) = number of rows. Always 1 more than index count.
By using df.loc[len(df.index)] you are selecting the next index number (row) available.
df.iloc[-1].name + 1 equals df.loc[len(df.index)]
Instead of using NumPy, you can also use a python comprehension
Create a list from the column names: df.columns.values.tolist()
Create a new list of empty strings ” based on the number of columns.
['' for x in df.columns.values.tolist()]

Answered By: Ryan Jones

Append an empty row in dataframe using pandas

Question:

Answers: