Split Dataframe when hyphen appears

Question:

enter image description here

I have a Dataframe with approx. 70000 rows and 6 columns. The inputs are numbers expect of some hyphens. I want to split the dataframe (by rows) every time a hyphen appears. The row with the hyphen can be deleted.

Example of Dataframe:

Timestamp ; power(kW) ; ....
2020-01-01 17:50:10 ; 4.32 ; ... 
2020-01-01 17:55:15 ; 4.30 ; ...
2020-01-01 18:00:20 ; 3.20 ; ...
2020-01-01 18:05:25 ; - ; ...
2020-01-03 12:00:20 ; 6.20 ; ...

Expectet Outcome:
2 Dataframe

Asked By: luscgu00

||

Answers:

You can use:

# identify rows with "-"
m = df['power(kW)'].eq('-')
# or based on non-numbers:
# m = pd.to_numeric(df['power(kW)'], errors='coerce').isna()

# split per group while removing the invalid rows
subdfs = [d for _,d in df[~m].groupby(m.cumsum())]

output list:

[             Timestamp power(kW) ....
 0  2020-01-01 17:50:10      4.32  ...
 1  2020-01-01 17:55:15      4.30  ...
 2  2020-01-01 18:00:20      3.20  ...,

              Timestamp power(kW) ....
 4  2020-01-03 12:00:20      6.20  ...]

Accessing subdataframes:

subdf[0]

             Timestamp power(kW) ....
0  2020-01-01 17:50:10      4.32  ...
1  2020-01-01 17:55:15      4.30  ...
2  2020-01-01 18:00:20      3.20  ...

NB. because your initial data had strings, the dtype of the output will be string/object. You must convert the types if you plan to perform vectorial operations.

One option:

subdfs = [d for _,d in df[~m].astype({'Timestamp': 'datetime64',
                                      'power(kW)': float})
                             .groupby(m.cumsum())]
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.