pandas.DataFrame.explode produces too many rows

Question

Give the following data:

data = {'type': ['chisel', 'disc', 'user_defined'],
        'depth': [[152, 178, 203],  [127, 152, 178, 203], [0]],
        'residue': [[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], [0.0]],
        'timing': [["10-nov", "10-apr"], ["10-nov", "10-apr"], ["10-apr"]]}

Create `df`:

import pandas as pd

df = pd.DataFrame(data)

Output as expected:

`explode` `timing`:

df = df.explode('timing')

Output as expected:

One additional row for each item in timing

`explode` `depth`:

df = df.explode('depth')

Output not as expected:

I expect there to be 6 rows for chisel and 8 rows disc
- 3 each, for 10-apr & 10-nov, for chisel
- 4 each, for 10-apr & 10-nov, for disc
explode is producing twice as many as expected
- 12 instead of 6, for chisel
- 16 instead of 8, for disc

Questions:

Is my expectation incorrect?
Am I using explode incorrectly?

Asked By: Trenton McKinney

||

Source

Answer 1

pandas produces unexpected results whenever you work with duplicate indexes. Notice that after the first explode, you end up having duplicated indexes.

Resetting them will yield a dataframe that works as you expect.

Fix it with

df.explode('timing', ignore_index=True).explode('depth')

           type depth                              residue  timing
0        chisel   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
0        chisel   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
0        chisel   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
1        chisel   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
1        chisel   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
1        chisel   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
2          disc   127  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
3          disc   127  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
4  user_defined     0                                [0.0]  10-apr

Answered By: rafaelc

pandas.DataFrame.explode produces too many rows

Question:

Give the following data:

Create `df`:

Output as expected:

`explode` `timing`:

Output as expected:

`explode` `depth`:

Output not as expected:

Questions:

Answers:

pandas.DataFrame.explode produces too many rows

Question:

Give the following data:

Create df:

Output as expected:

explode timing:

Output as expected:

explode depth:

Output not as expected:

Questions:

Answers:

Create `df`:

`explode` `timing`:

`explode` `depth`: