Add a row with means of columns to pandas DataFrame

Question:

I have a pandas DataFrame consisting of some sensor readings taken over time like this:

       diode1  diode2  diode3  diode4
Time
0.530       7       0      10      16
1.218      17       7      14      19
1.895      13       8      16      17
2.570       8       2      16      17
3.240      14       8      17      19
3.910      13       6      17      18
4.594      13       5      16      19
5.265       9       0      12      16
5.948      12       3      16      17
6.632      10       2      15      17

I have written code to add another row with the means of each column:

# List of the averages for the test. 
averages = [df[key].describe()['mean'] for key in df]
indexes = df.index.tolist()
indexes.append('mean')
df.reindex(indexes)
# Adding the mean row to the bottom of the DataFrame

i = 0
for key in df:
    df.set_value('mean', key, averages[i])
    i += 1

This gives me the result I want, which is a DataFrame like this:

       diode1  diode2  diode3  diode4
Time
0.53      7.0     0.0    10.0    16.0
1.218    17.0     7.0    14.0    19.0
1.895    13.0     8.0    16.0    17.0
2.57      8.0     2.0    16.0    17.0
3.24     14.0     8.0    17.0    19.0
3.91     13.0     6.0    17.0    18.0
4.594    13.0     5.0    16.0    19.0
5.265     9.0     0.0    12.0    16.0
5.948    12.0     3.0    16.0    17.0
6.632    10.0     2.0    15.0    17.0
mean     11.6     4.1    14.9    17.5

However, I am sure that this is not the most efficient way of adding the row. I have tried using append with the means saved as a pandas Series but that doesn’t produce the expected output. I was wondering if there was a more efficient means of adding a row with the index ‘mean’ and the averages of each column to the bottom of a pandas DataFrame.

Asked By: willk

||

Answers:

Use loc for setting with enlargement:

df.loc['mean'] = df.mean()

The resulting output:

       diode1  diode2  diode3  diode4
Time                                 
0.53      7.0     0.0    10.0    16.0
1.218    17.0     7.0    14.0    19.0
1.895    13.0     8.0    16.0    17.0
2.57      8.0     2.0    16.0    17.0
3.24     14.0     8.0    17.0    19.0
3.91     13.0     6.0    17.0    18.0
4.594    13.0     5.0    16.0    19.0
5.265     9.0     0.0    12.0    16.0
5.948    12.0     3.0    16.0    17.0
6.632    10.0     2.0    15.0    17.0
mean     11.6     4.1    14.9    17.5
Answered By: root

For a single statistic, loc works really well. Another way is to use concat.

df1 = pd.concat([df, df.apply(['mean'])])

It’s especially useful if multiple statistics need to be appended:

df1 = pd.concat([df, df.apply(['mean', 'sum', 'median'])])

To append a whole bunch of statistics such as std, median, mean etc. (that OP already computed), concat is again useful:

df1 = pd.concat([df, df.describe()])
Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.