Python: Divide each row of a DataFrame by another DataFrame vector


I have a DataFrame (df1) with a dimension 2000 rows x 500 columns (excluding the index) for which I want to divide each row by another DataFrame (df2) with dimension 1 rows X 500 columns. Both have the same column headers. I tried:

df.divide(df2) and
df.divide(df2, axis='index') and multiple other solutions and I always get a df with nan values in every cell. What argument am I missing in the function df.divide?

Asked By: Plug4



You can divide by the series i.e. the first row of df2:

In [11]: df = pd.DataFrame([[1., 2.], [3., 4.]], columns=['A', 'B'])

In [12]: df2 = pd.DataFrame([[5., 10.]], columns=['A', 'B'])

In [13]: df.div(df2)
     A    B
0  0.2  0.2
1  NaN  NaN

In [14]: df.div(df2.iloc[0])
     A    B
0  0.2  0.2
1  0.6  0.4
Answered By: Andy Hayden

In df.divide(df2, axis='index'), you need to provide the axis/row of df2 (ex. df2.iloc[0]).

import pandas as pd

data1 = {"a":[1.,3.,5.,2.],
data2 = {"a":[4.],

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2) 

df1.div(df2.iloc[0], axis='columns')

or you can use df1/df2.values[0,:]

Answered By: kimal

Small clarification just in case: the reason why you got NaN everywhere while Andy’s first example (df.div(df2)) works for the first line is div tries to match indexes (and columns). In Andy’s example, index 0 is found in both dataframes, so the division is made, not index 1 so a line of NaN is added. This behavior should appear even more obvious if you run the following (only the ‘t’ line is divided):

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'])
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'])

So in your case, the index of the only row of df2 was apparently not present in df1. “Luckily”, the column headers are the same in both dataframes, so when you slice the first row, you get a series, the index of which is composed by the column headers of df2. This is what eventually allows the division to take place properly.

For a case with index and column matching:

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'], columns = range(5))
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'], columns = [1,2,3,4,5])
Answered By: etna

If you want to divide each row of a column with a specific value you could try:

df['column_name'] = df['column_name'].div(10000)

For me, this code divided each row of ‘column_name’ with 10,000.

Answered By: Cornel Ciobanu

to divide a row (with single or multiple columns), we need to do the below:

df.loc['index_value'] = df.loc['index_value'].div(10000)
Answered By: Motoman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.