pandas.DataFrame.explode produces too many rows

Question:

Give the following data:

data = {'type': ['chisel', 'disc', 'user_defined'],
        'depth': [[152, 178, 203],  [127, 152, 178, 203], [0]],
        'residue': [[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], [0.0]],
        'timing': [["10-nov", "10-apr"], ["10-nov", "10-apr"], ["10-apr"]]}

Create df:

import pandas as pd

df = pd.DataFrame(data)

Output as expected:

enter image description here

explode timing:

df = df.explode('timing')

Output as expected:

  • One additional row for each item in timing

enter image description here

explode depth:

df = df.explode('depth')

Output not as expected:

  • I expect there to be 6 rows for chisel and 8 rows disc
    • 3 each, for 10-apr & 10-nov, for chisel
    • 4 each, for 10-apr & 10-nov, for disc
  • explode is producing twice as many as expected
    • 12 instead of 6, for chisel
    • 16 instead of 8, for disc

enter image description here

Questions:

  • Is my expectation incorrect?
  • Am I using explode incorrectly?
Asked By: Trenton McKinney

||

Answers:

pandas produces unexpected results whenever you work with duplicate indexes. Notice that after the first explode, you end up having duplicated indexes.

Resetting them will yield a dataframe that works as you expect.

Fix it with

df.explode('timing', ignore_index=True).explode('depth')

           type depth                              residue  timing
0        chisel   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
0        chisel   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
0        chisel   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
1        chisel   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
1        chisel   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
1        chisel   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
2          disc   127  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
2          disc   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-nov
3          disc   127  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   152  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   178  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
3          disc   203  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  10-apr
4  user_defined     0                                [0.0]  10-apr
Answered By: rafaelc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.