# How do I operate on a DataFrame with a Series for every column?

## The question

Given a `Series` `s` and `DataFrame` `df`, how do I operate on each column of `df` with `s`?

``````df = pd.DataFrame(
[[1, 2, 3], [4, 5, 6]],
index=[0, 1],
columns=['a', 'b', 'c']
)

s = pd.Series([3, 14], index=[0, 1])
``````

When I attempt to add them, I get all `np.nan`

``````df + s

a   b   c   0   1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
``````

What I thought I should get is

``````    a   b   c
0   4   5   6
1  18  19  20
``````

## Objective and motivation

I’ve seen this kind of question several times over and have seen many other questions that involve some element of this. Most recently, I had to spend a bit of time explaining this concept in comments while looking for an appropriate canonical Q&A. I did not find one and so I thought I’d write one.

These questions usually arises with respect to a specific operation, but equally applies to most arithmetic operations.

• How do I subtract a `Series` from every column in a `DataFrame`?
• How do I add a `Series` from every column in a `DataFrame`?
• How do I multiply a `Series` from every column in a `DataFrame`?
• How do I divide a `Series` from every column in a `DataFrame`?

It is helpful to create a mental model of what `Series` and `DataFrame` objects are.

# Anatomy of a `Series`

A `Series` should be thought of as an enhanced dictionary. This isn’t always a perfect analogy, but we’ll start here. Also, there are other analogies that you can make, but I am targeting a dictionary in order to demonstrate the purpose of this post.

## `index`

These are the keys that we can reference to get at the corresponding values. When the elements of the index are unique, the comparison to a dictionary becomes very close.

## `values`

These are the corresponding values that are keyed by the index.

# Anatomy of a `DataFrame`

A `DataFrame` should be thought of as a dictionary of `Series` or a `Series` of `Series`. In this case the keys are the column names and the values are the columns themselves as `Series` objects. Each `Series` agrees to share the same `index` which is the index of the `DataFrame`.

## `columns`

These are the keys that we can reference to get at the corresponding `Series`.

## `index`

This the the index that all of the `Series` values agree to share.

## Note: RE: `columns` and `index` objects

They are the same kind of things. A `DataFrame`s `index` can be used as another `DataFrame`s `columns`. In fact, this happens when you do `df.T` to get a transpose.

## `values`

This is a two-dimensional array that contains the data in a `DataFrame`. The reality is that `values` is not what is stored inside the `DataFrame` object. (Well, sometimes it is, but I’m not about to try to describe the block manager). The point is, it is better to think of this as access to a two-dimensional array of the data.

# Define Sample Data

These are sample `pandas.Index` objects that can be used as the `index` of a `Series` or `DataFrame` or can be used as the `columns` of a `DataFrame`:

``````idx_lower = pd.Index([*'abcde'], name='lower')
idx_range = pd.RangeIndex(5, name='range')
``````

These are sample `pandas.Series` objects that use the `pandas.Index` objects above:

``````s0 = pd.Series(range(10, 15), idx_lower)
s1 = pd.Series(range(30, 40, 2), idx_lower)
s2 = pd.Series(range(50, 10, -8), idx_range)
``````

These are sample `pandas.DataFrame` objects that use the `pandas.Index` objects above:

``````df0 = pd.DataFrame(100, index=idx_range, columns=idx_lower)
df1 = pd.DataFrame(
np.arange(np.product(df0.shape)).reshape(df0.shape),
index=idx_range, columns=idx_lower
)
``````

## `Series` on `Series`

When operating on two `Series`, the alignment is obvious. You align the `index` of one `Series` with the `index` of the other.

``````s1 + s0

lower
a    40
b    43
c    46
d    49
e    52
dtype: int64
``````

Which is the same as when I randomly shuffle one before I operate. The indices will still align.

``````s1 + s0.sample(frac=1)

lower
a    40
b    43
c    46
d    49
e    52
dtype: int64
``````

And is not the case when instead I operate with the values of the shuffled `Series`. In this case, Pandas doesn’t have the `index` to align with and therefore operates from a positions.

``````s1 + s0.sample(frac=1).values

lower
a    42
b    42
c    47
d    50
e    49
dtype: int64
``````

``````s1 + 1

lower
a    31
b    33
c    35
d    37
e    39
dtype: int64
``````

## `DataFrame` on `DataFrame`

The similar is true when operating between two `DataFrame`s. The alignment is obvious and does what we think it should do:

``````df0 + df1

lower    a    b    c    d    e
range
0      100  101  102  103  104
1      105  106  107  108  109
2      110  111  112  113  114
3      115  116  117  118  119
4      120  121  122  123  124
``````

It shuffles the second `DataFrame` on both axes. The `index` and `columns` will still align and give us the same thing.

``````df0 + df1.sample(frac=1).sample(frac=1, axis=1)

lower    a    b    c    d    e
range
0      100  101  102  103  104
1      105  106  107  108  109
2      110  111  112  113  114
3      115  116  117  118  119
4      120  121  122  123  124
``````

It is the same shuffling, but it adds the array and not the `DataFrame`. It is no longer aligned and will get different results.

``````df0 + df1.sample(frac=1).sample(frac=1, axis=1).values

lower    a    b    c    d    e
range
0      123  124  121  122  120
1      118  119  116  117  115
2      108  109  106  107  105
3      103  104  101  102  100
4      113  114  111  112  110
``````

Add a one-dimensional array. It will align with columns and broadcast across rows.

``````df0 + [*range(2, df0.shape + 2)]

lower    a    b    c    d    e
range
0      102  103  104  105  106
1      102  103  104  105  106
2      102  103  104  105  106
3      102  103  104  105  106
4      102  103  104  105  106
``````

Add a scalar. There isn’t anything to align with, so broadcasts to everything:

``````df0 + 1

lower    a    b    c    d    e
range
0      101  101  101  101  101
1      101  101  101  101  101
2      101  101  101  101  101
3      101  101  101  101  101
4      101  101  101  101  101
``````

## `DataFrame` on `Series`

If `DataFrame`s are to be thought of as dictionaries of `Series` and `Series` are to be thought of as dictionaries of values, then it is natural that when operating between a `DataFrame` and `Series` that they should be aligned by their "keys".

``````s0:
lower    a    b    c    d    e
10   11   12   13   14

df0:
lower    a    b    c    d    e
range
0      100  100  100  100  100
1      100  100  100  100  100
2      100  100  100  100  100
3      100  100  100  100  100
4      100  100  100  100  100
``````

And when we operate, the `10` in `s0['a']` gets added to the entire column of `df0['a']`:

``````df0 + s0

lower    a    b    c    d    e
range
0      110  111  112  113  114
1      110  111  112  113  114
2      110  111  112  113  114
3      110  111  112  113  114
4      110  111  112  113  114
``````

### The heart of the issue and point of the post

What about if I want `s2` and `df0`?

``````s2:               df0:

|    lower    a    b    c    d    e
range        |    range
0      50    |    0      100  100  100  100  100
1      42    |    1      100  100  100  100  100
2      34    |    2      100  100  100  100  100
3      26    |    3      100  100  100  100  100
4      18    |    4      100  100  100  100  100
``````

When I operate, I get the all `np.nan` as cited in the question:

``````df0 + s2

a   b   c   d   e   0   1   2   3   4
range
0     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
``````

This does not produce what we wanted, because Pandas is aligning the `index` of `s2` with the `columns` of `df0`. The `columns` of the result includes a union of the `index` of `s2` and the `columns` of `df0`.

We could fake it out with a tricky transposition:

``````(df0.T + s2).T

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118
``````

But it turns out Pandas has a better solution. There are operation methods that allow us to pass an `axis` argument to specify the axis to align with.

`-` `sub`
`+` `add`
`*` `mul`
`/` `div`
`**` `pow`

And so the answer is simply:

``````df0.add(s2, axis='index')

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118
``````

It turns out `axis='index'` is synonymous with `axis=0`.
As is `axis='columns'` synonymous with `axis=1`:

``````df0.add(s2, axis=0)

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118
``````

### The rest of the operations

``````df0.sub(s2, axis=0)

lower   a   b   c   d   e
range
0      50  50  50  50  50
1      58  58  58  58  58
2      66  66  66  66  66
3      74  74  74  74  74
4      82  82  82  82  82
``````

``````df0.mul(s2, axis=0)

lower     a     b     c     d     e
range
0      5000  5000  5000  5000  5000
1      4200  4200  4200  4200  4200
2      3400  3400  3400  3400  3400
3      2600  2600  2600  2600  2600
4      1800  1800  1800  1800  1800
``````

``````df0.div(s2, axis=0)

lower         a         b         c         d         e
range
0      2.000000  2.000000  2.000000  2.000000  2.000000
1      2.380952  2.380952  2.380952  2.380952  2.380952
2      2.941176  2.941176  2.941176  2.941176  2.941176
3      3.846154  3.846154  3.846154  3.846154  3.846154
4      5.555556  5.555556  5.555556  5.555556  5.555556
``````

``````df0.pow(1 / s2, axis=0)

lower         a         b         c         d         e
range
0      1.096478  1.096478  1.096478  1.096478  1.096478
1      1.115884  1.115884  1.115884  1.115884  1.115884
2      1.145048  1.145048  1.145048  1.145048  1.145048
3      1.193777  1.193777  1.193777  1.193777  1.193777
4      1.291550  1.291550  1.291550  1.291550  1.291550
``````

It’s important to address some higher level concepts first. Since my motivation is to share knowledge and teach, I wanted to make this as clear as possible.

I prefer the method mentioned by piSquared (i.e., `df.add(s, axis=0)`), but another method uses `apply` together with `lambda` to perform an action on each column in the dataframe:

``````>>>> df.apply(lambda col: col + s)
a   b   c
0   4   5   6
1  18  19  20
``````

To apply the lambda function to the rows, use `axis=1`:

``````>>> df.T.apply(lambda row: row + s, axis=1)
0   1
a  4  18
b  5  19
c  6  20
``````

This method could be useful when the transformation is more complex, e.g.:

``````df.apply(lambda col: 0.5 * col ** 2 + 2 * s - 3)
``````

Just to add an extra layer from my own experience. It extends what others have done here. This shows how to operate on a `Series` with a `DataFrame` that has extra columns that you want to keep the values for. Below is a short demonstration of the process.

``````import pandas as pd

d = [1.056323, 0.126681,
0.142588, 0.254143,
0.15561, 0.139571,
0.102893, 0.052411]

df = pd.Series(d, index = ['const', '426', '428', '424', '425', '423', '427', '636'])

print(df)
const    1.056323
426      0.126681
428      0.142588
424      0.254143
425      0.155610
423      0.139571
427      0.102893
636      0.052411

d2 = {
'loc': ['D', 'D', 'E', 'E', 'F', 'F', 'G', 'G', 'E', 'D'],
'426': [9, 2, 3, 2, 4, 0, 2, 7, 2, 8],
'428': [2, 4, 1, 0, 2, 1, 3, 0, 7, 8],
'424': [1, 10, 5, 8, 2, 7, 10, 0, 3, 5],
'425': [9, 2, 6, 8, 9, 1, 7, 3, 8, 6],
'423': [4, 2, 8, 7, 9, 6, 10, 5, 9, 9],
'423': [2, 7, 3, 10, 8, 1, 2, 9, 3, 9],
'427': [4, 10, 4, 0, 8, 3, 1, 5, 7, 7],
'636': [10, 5, 6, 4, 0, 5, 1, 1, 4, 8],
'seq': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
}

df2 = pd.DataFrame(d2)

print(df2)
loc  426  428  424  425  423  427  636  seq
0   D    9    2    1    9    2    4   10    1
1   D    2    4   10    2    7   10    5    1
2   E    3    1    5    6    3    4    6    1
3   E    2    0    8    8   10    0    4    1
4   F    4    2    2    9    8    8    0    1
5   F    0    1    7    1    1    3    5    1
6   G    2    3   10    7    2    1    1    1
7   G    7    0    0    3    9    5    1    1
8   E    2    7    3    8    3    7    4    1
9   D    8    8    5    6    9    7    8    1
``````

To multiply a `DataFrame` by a `Series` and keep dissimilar columns

1. Create a list of the elements in the `DataFrame` and `Series` you want to operate on:
``````col = ['426', '428', '424', '425', '423', '427', '636']
``````
1. Perform your operation using the list and indicate the axis to use:
``````df2[col] = df2[col].mul(df[col], axis=1)

print(df2)
loc       426       428       424      425       423       427       636  seq
0   D  1.140129  0.285176  0.254143  1.40049  0.279142  0.411572  0.524110    1
1   D  0.253362  0.570352  2.541430  0.31122  0.976997  1.028930  0.262055    1
2   E  0.380043  0.142588  1.270715  0.93366  0.418713  0.411572  0.314466    1
3   E  0.253362  0.000000  2.033144  1.24488  1.395710  0.000000  0.209644    1
4   F  0.506724  0.285176  0.508286  1.40049  1.116568  0.823144  0.000000    1
5   F  0.000000  0.142588  1.779001  0.15561  0.139571  0.308679  0.262055    1
6   G  0.253362  0.427764  2.541430  1.08927  0.279142  0.102893  0.052411    1
7   G  0.886767  0.000000  0.000000  0.46683  1.256139  0.514465  0.052411    1
8   E  0.253362  0.998116  0.762429  1.24488  0.418713  0.720251  0.209644    1
9   D  1.013448  1.140704  1.270715  0.93366  1.256139  0.720251  0.419288    1
``````
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.