Extract first and last row of a dataframe in pandas

Question:

How can I extract the first and last rows of a given dataframe as a new dataframe in pandas?

I’ve tried to use iloc to select the desired rows and then concat as in:

df=pd.DataFrame({'a':range(1,5), 'b':['a','b','c','d']})
pd.concat([df.iloc[0,:], df.iloc[-1,:]])

but this does not produce a pandas dataframe:

a    1
b    a
a    4
b    d
dtype: object
Asked By: Bryan P

||

Answers:

I think you can try add parameter axis=1 to concat, because output of df.iloc[0,:] and df.iloc[-1,:] are Series and transpose by T:

print df.iloc[0,:]
a    1
b    a
Name: 0, dtype: object

print df.iloc[-1,:]
a    4
b    d
Name: 3, dtype: object

print pd.concat([df.iloc[0,:], df.iloc[-1,:]], axis=1)
   0  3
a  1  4
b  a  d

print pd.concat([df.iloc[0,:], df.iloc[-1,:]], axis=1).T
   a  b
0  1  a
3  4  d
Answered By: jezrael

I think the most simple way is .iloc[[0, -1]].

df = pd.DataFrame({'a':range(1,5), 'b':['a','b','c','d']})
df2 = df.iloc[[0, -1]]
    
print(df2)

   a  b
0  1  a
3  4  d
Answered By: su79eu7k

You can also use head and tail:

In [29]: pd.concat([df.head(1), df.tail(1)])
Out[29]:
   a  b
0  1  a
3  4  d
Answered By: Colonel Beauvel

The accepted answer duplicates the first row if the frame only contains a single row. If that’s a concern

df[0::len(df)-1 if len(df) > 1 else 1]

works even for single row-dataframes.

Example: For the following dataframe this will not create a duplicate:

df = pd.DataFrame({'a': [1], 'b':['a']})
df2 = df[0::len(df)-1 if len(df) > 1  else 1]

print df2

   a  b
0  1  a

whereas this does:

df3 = df.iloc[[0, -1]]

print df3 

   a  b
0  1  a
0  1  a

because the single row is the first AND last row at the same time.

Answered By: joh-mue

Here is the same style as in large datasets:

x = df[:5]
y = pd.DataFrame([['...']*df.shape[1]], columns=df.columns, index=['...'])
z = df[-5:]
frame = [x, y, z]
result = pd.concat(frame)

print(result)

Output:

                     date  temp
0     1981-01-01 00:00:00  20.7
1     1981-01-02 00:00:00  17.9
2     1981-01-03 00:00:00  18.8
3     1981-01-04 00:00:00  14.6
4     1981-01-05 00:00:00  15.8
...                   ...   ...
3645  1990-12-27 00:00:00    14
3646  1990-12-28 00:00:00  13.6
3647  1990-12-29 00:00:00  13.5
3648  1990-12-30 00:00:00  15.7
3649  1990-12-31 00:00:00    13
Answered By: Mina Gabriel

Alternatively you can use take:

In [3]: df.take([0, -1])
Out[3]: 
   a  b
0  1  a
3  4  d
Answered By: rachwa

I think this is a very useful question. I personally prefer to see the n first and last rows of the dataframes instead of using .head() or .tail(). I have found that

# n should change to 1 to see only the first and last row
df.drop(df.index[n:-n])

Is slightly faster than pd.concat. Here is a test.

df = pd.DataFrame({'a':range(1,5), 'b':['a','b','c','d']})

%%time
pd.concat([df.head(n),df.tail(n)]) 

%%time
df.drop(df.index[n:-n])

In my case df.drop(df.index[1:-1]) took half of the time compared to pd.concat().
I hope this answer can still be of use.

Answered By: eliasmaxil
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.