# How do I operate on a DataFrame with a Series for every column?

## Question:

## The question

Given a `Series`

`s`

and `DataFrame`

`df`

, how do I operate on each column of `df`

with `s`

?

```
df = pd.DataFrame(
[[1, 2, 3], [4, 5, 6]],
index=[0, 1],
columns=['a', 'b', 'c']
)
s = pd.Series([3, 14], index=[0, 1])
```

When I attempt to add them, I get all `np.nan`

```
df + s
a b c 0 1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
```

What I thought I should get is

```
a b c
0 4 5 6
1 18 19 20
```

## Objective and motivation

I’ve seen this kind of question several times over and have seen many other questions that involve some element of this. Most recently, I had to spend a bit of time explaining this concept in comments while looking for an appropriate canonical Q&A. I did not find one and so I thought I’d write one.

These questions usually arises with respect to a specific operation, but equally applies to most arithmetic operations.

- How do I subtract a
`Series`

from every column in a`DataFrame`

? - How do I add a
`Series`

from every column in a`DataFrame`

? - How do I multiply a
`Series`

from every column in a`DataFrame`

? - How do I divide a
`Series`

from every column in a`DataFrame`

?

## Answers:

It is helpful to create a mental model of what `Series`

and `DataFrame`

objects are.

# Anatomy of a `Series`

A `Series`

should be thought of as an enhanced dictionary. This isn’t always a perfect analogy, but we’ll start here. Also, there are other analogies that you can make, but I am targeting a dictionary in order to demonstrate the purpose of this post.

`index`

These are the keys that we can reference to get at the corresponding values. When the elements of the index are unique, the comparison to a dictionary becomes very close.

`values`

These are the corresponding values that are keyed by the index.

# Anatomy of a `DataFrame`

A `DataFrame`

should be thought of as a dictionary of `Series`

or a `Series`

of `Series`

. In this case the keys are the column names and the values are the columns themselves as `Series`

objects. Each `Series`

agrees to share the same `index`

which is the index of the `DataFrame`

.

`columns`

These are the keys that we can reference to get at the corresponding `Series`

.

`index`

This the the index that all of the `Series`

values agree to share.

## Note: RE: `columns`

and `index`

objects

They are the same kind of things. A `DataFrame`

s `index`

can be used as another `DataFrame`

s `columns`

. In fact, this happens when you do `df.T`

to get a transpose.

`values`

This is a two-dimensional array that contains the data in a `DataFrame`

. The reality is that `values`

is * not* what is stored inside the

`DataFrame`

object. (Well, sometimes it is, but I’m not about to try to describe the block manager). The point is, it is better to think of this as access to a two-dimensional array of the data.# Define Sample Data

These are sample `pandas.Index`

objects that can be used as the `index`

of a `Series`

or `DataFrame`

or can be used as the `columns`

of a `DataFrame`

:

```
idx_lower = pd.Index([*'abcde'], name='lower')
idx_range = pd.RangeIndex(5, name='range')
```

These are sample `pandas.Series`

objects that use the `pandas.Index`

objects above:

```
s0 = pd.Series(range(10, 15), idx_lower)
s1 = pd.Series(range(30, 40, 2), idx_lower)
s2 = pd.Series(range(50, 10, -8), idx_range)
```

These are sample `pandas.DataFrame`

objects that use the `pandas.Index`

objects above:

```
df0 = pd.DataFrame(100, index=idx_range, columns=idx_lower)
df1 = pd.DataFrame(
np.arange(np.product(df0.shape)).reshape(df0.shape),
index=idx_range, columns=idx_lower
)
```

`Series`

on `Series`

When operating on two `Series`

, the alignment is obvious. You align the `index`

of one `Series`

with the `index`

of the other.

```
s1 + s0
lower
a 40
b 43
c 46
d 49
e 52
dtype: int64
```

Which is the same as when I randomly shuffle one before I operate. The indices will still align.

```
s1 + s0.sample(frac=1)
lower
a 40
b 43
c 46
d 49
e 52
dtype: int64
```

And is * not* the case when instead I operate with the values of the shuffled

`Series`

. In this case, Pandas doesn’t have the `index`

to align with and therefore operates from a positions.```
s1 + s0.sample(frac=1).values
lower
a 42
b 42
c 47
d 50
e 49
dtype: int64
```

Add a scalar

```
s1 + 1
lower
a 31
b 33
c 35
d 37
e 39
dtype: int64
```

`DataFrame`

on `DataFrame`

The similar is true when operating between two `DataFrame`

s. The alignment is obvious and does what we think it should do:

```
df0 + df1
lower a b c d e
range
0 100 101 102 103 104
1 105 106 107 108 109
2 110 111 112 113 114
3 115 116 117 118 119
4 120 121 122 123 124
```

It shuffles the second `DataFrame`

on both axes. The `index`

and `columns`

will still align and give us the same thing.

```
df0 + df1.sample(frac=1).sample(frac=1, axis=1)
lower a b c d e
range
0 100 101 102 103 104
1 105 106 107 108 109
2 110 111 112 113 114
3 115 116 117 118 119
4 120 121 122 123 124
```

It is the same shuffling, but it adds the array and not the `DataFrame`

. It is no longer aligned and will get different results.

```
df0 + df1.sample(frac=1).sample(frac=1, axis=1).values
lower a b c d e
range
0 123 124 121 122 120
1 118 119 116 117 115
2 108 109 106 107 105
3 103 104 101 102 100
4 113 114 111 112 110
```

Add a one-dimensional array. It will align with columns and broadcast across rows.

```
df0 + [*range(2, df0.shape[1] + 2)]
lower a b c d e
range
0 102 103 104 105 106
1 102 103 104 105 106
2 102 103 104 105 106
3 102 103 104 105 106
4 102 103 104 105 106
```

Add a scalar. There isn’t anything to align with, so broadcasts to everything:

```
df0 + 1
lower a b c d e
range
0 101 101 101 101 101
1 101 101 101 101 101
2 101 101 101 101 101
3 101 101 101 101 101
4 101 101 101 101 101
```

`DataFrame`

on `Series`

If `DataFrame`

s are to be thought of as dictionaries of `Series`

and `Series`

are to be thought of as dictionaries of values, then it is natural that when operating between a `DataFrame`

and `Series`

that they should be aligned by their "keys".

```
s0:
lower a b c d e
10 11 12 13 14
df0:
lower a b c d e
range
0 100 100 100 100 100
1 100 100 100 100 100
2 100 100 100 100 100
3 100 100 100 100 100
4 100 100 100 100 100
```

And when we operate, the `10`

in `s0['a']`

gets added to the entire column of `df0['a']`

:

```
df0 + s0
lower a b c d e
range
0 110 111 112 113 114
1 110 111 112 113 114
2 110 111 112 113 114
3 110 111 112 113 114
4 110 111 112 113 114
```

### The heart of the issue and point of the post

What about if I want `s2`

and `df0`

?

```
s2: df0:
| lower a b c d e
range | range
0 50 | 0 100 100 100 100 100
1 42 | 1 100 100 100 100 100
2 34 | 2 100 100 100 100 100
3 26 | 3 100 100 100 100 100
4 18 | 4 100 100 100 100 100
```

When I operate, I get the all `np.nan`

as cited in the question:

```
df0 + s2
a b c d e 0 1 2 3 4
range
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
```

This does not produce what we wanted, because Pandas is aligning the `index`

of `s2`

with the `columns`

of `df0`

. The `columns`

of the result includes a union of the `index`

of `s2`

and the `columns`

of `df0`

.

We could fake it out with a tricky transposition:

```
(df0.T + s2).T
lower a b c d e
range
0 150 150 150 150 150
1 142 142 142 142 142
2 134 134 134 134 134
3 126 126 126 126 126
4 118 118 118 118 118
```

But it turns out Pandas has a better solution. There are operation methods that allow us to pass an `axis`

argument to specify the axis to align with.

`-`

`sub`

`+`

`add`

`*`

`mul`

`/`

`div`

`**`

`pow`

And so the answer is simply:

```
df0.add(s2, axis='index')
lower a b c d e
range
0 150 150 150 150 150
1 142 142 142 142 142
2 134 134 134 134 134
3 126 126 126 126 126
4 118 118 118 118 118
```

It turns out `axis='index'`

is synonymous with `axis=0`

.

As is `axis='columns'`

synonymous with `axis=1`

:

```
df0.add(s2, axis=0)
lower a b c d e
range
0 150 150 150 150 150
1 142 142 142 142 142
2 134 134 134 134 134
3 126 126 126 126 126
4 118 118 118 118 118
```

### The rest of the operations

```
df0.sub(s2, axis=0)
lower a b c d e
range
0 50 50 50 50 50
1 58 58 58 58 58
2 66 66 66 66 66
3 74 74 74 74 74
4 82 82 82 82 82
```

```
df0.mul(s2, axis=0)
lower a b c d e
range
0 5000 5000 5000 5000 5000
1 4200 4200 4200 4200 4200
2 3400 3400 3400 3400 3400
3 2600 2600 2600 2600 2600
4 1800 1800 1800 1800 1800
```

```
df0.div(s2, axis=0)
lower a b c d e
range
0 2.000000 2.000000 2.000000 2.000000 2.000000
1 2.380952 2.380952 2.380952 2.380952 2.380952
2 2.941176 2.941176 2.941176 2.941176 2.941176
3 3.846154 3.846154 3.846154 3.846154 3.846154
4 5.555556 5.555556 5.555556 5.555556 5.555556
```

```
df0.pow(1 / s2, axis=0)
lower a b c d e
range
0 1.096478 1.096478 1.096478 1.096478 1.096478
1 1.115884 1.115884 1.115884 1.115884 1.115884
2 1.145048 1.145048 1.145048 1.145048 1.145048
3 1.193777 1.193777 1.193777 1.193777 1.193777
4 1.291550 1.291550 1.291550 1.291550 1.291550
```

It’s important to address some higher level concepts first. Since my motivation is to share knowledge and teach, I wanted to make this as clear as possible.

I prefer the method mentioned by piSquared (i.e., `df.add(s, axis=0)`

), but another method uses `apply`

together with `lambda`

to perform an action on each column in the dataframe:

```
>>>> df.apply(lambda col: col + s)
a b c
0 4 5 6
1 18 19 20
```

To apply the lambda function to the rows, use `axis=1`

:

```
>>> df.T.apply(lambda row: row + s, axis=1)
0 1
a 4 18
b 5 19
c 6 20
```

This method could be useful when the transformation is more complex, e.g.:

```
df.apply(lambda col: 0.5 * col ** 2 + 2 * s - 3)
```

Just to add an extra layer from my own experience. It extends what others have done here. This shows how to operate on a `Series`

with a `DataFrame`

that has extra columns that you want to keep the values for. Below is a short demonstration of the process.

```
import pandas as pd
d = [1.056323, 0.126681,
0.142588, 0.254143,
0.15561, 0.139571,
0.102893, 0.052411]
df = pd.Series(d, index = ['const', '426', '428', '424', '425', '423', '427', '636'])
print(df)
const 1.056323
426 0.126681
428 0.142588
424 0.254143
425 0.155610
423 0.139571
427 0.102893
636 0.052411
d2 = {
'loc': ['D', 'D', 'E', 'E', 'F', 'F', 'G', 'G', 'E', 'D'],
'426': [9, 2, 3, 2, 4, 0, 2, 7, 2, 8],
'428': [2, 4, 1, 0, 2, 1, 3, 0, 7, 8],
'424': [1, 10, 5, 8, 2, 7, 10, 0, 3, 5],
'425': [9, 2, 6, 8, 9, 1, 7, 3, 8, 6],
'423': [4, 2, 8, 7, 9, 6, 10, 5, 9, 9],
'423': [2, 7, 3, 10, 8, 1, 2, 9, 3, 9],
'427': [4, 10, 4, 0, 8, 3, 1, 5, 7, 7],
'636': [10, 5, 6, 4, 0, 5, 1, 1, 4, 8],
'seq': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
}
df2 = pd.DataFrame(d2)
print(df2)
loc 426 428 424 425 423 427 636 seq
0 D 9 2 1 9 2 4 10 1
1 D 2 4 10 2 7 10 5 1
2 E 3 1 5 6 3 4 6 1
3 E 2 0 8 8 10 0 4 1
4 F 4 2 2 9 8 8 0 1
5 F 0 1 7 1 1 3 5 1
6 G 2 3 10 7 2 1 1 1
7 G 7 0 0 3 9 5 1 1
8 E 2 7 3 8 3 7 4 1
9 D 8 8 5 6 9 7 8 1
```

To multiply a `DataFrame`

by a `Series`

and keep dissimilar columns

- Create a list of the elements in the
`DataFrame`

and`Series`

you want to operate on:

```
col = ['426', '428', '424', '425', '423', '427', '636']
```

- Perform your operation using the list and indicate the axis to use:

```
df2[col] = df2[col].mul(df[col], axis=1)
print(df2)
loc 426 428 424 425 423 427 636 seq
0 D 1.140129 0.285176 0.254143 1.40049 0.279142 0.411572 0.524110 1
1 D 0.253362 0.570352 2.541430 0.31122 0.976997 1.028930 0.262055 1
2 E 0.380043 0.142588 1.270715 0.93366 0.418713 0.411572 0.314466 1
3 E 0.253362 0.000000 2.033144 1.24488 1.395710 0.000000 0.209644 1
4 F 0.506724 0.285176 0.508286 1.40049 1.116568 0.823144 0.000000 1
5 F 0.000000 0.142588 1.779001 0.15561 0.139571 0.308679 0.262055 1
6 G 0.253362 0.427764 2.541430 1.08927 0.279142 0.102893 0.052411 1
7 G 0.886767 0.000000 0.000000 0.46683 1.256139 0.514465 0.052411 1
8 E 0.253362 0.998116 0.762429 1.24488 0.418713 0.720251 0.209644 1
9 D 1.013448 1.140704 1.270715 0.93366 1.256139 0.720251 0.419288 1
```