Split period into two dates when the date has the same delimiter

Question

Goal: derive period start and period end from the column period, in the form of
dd.mm.yyyy – dd.mm.yyyy

period

28-02-2022 - 30.09.2022    
31.01.2022 - 31.12.2022
28.02.2019 - 30-04-2020
20.01.2019-22.02.2020
19.03.2020- 24.05.2021
13.09.2022-12-10.2022

df[['period_start,'period_end]]= df['period'].str.split('-',expand=True)

will not work.

Expected output

period_start    period_end
31.02.2022      30.09.2022    
31.01.2022      31.12.2022
28.02.2019      30.04.2020
20.01.2019      22.02.2020
19.03.2020      24.05.2021
13.09.2022      12.10.2022

Asked By: luc

||

Source

Answer 1

We can use str.extract here for one option:

df[["period_start", "period_end"]] = df["period"].str.extract(r'(S+)s*-s*(S+)')
                                                 .str.replace(r'-', '.')

Answered By: Tim Biegeleisen

Answer 2

Use a regex to split on the dash with surrounding spaces:

out = (df['period'].str.split(r's+-s+',expand=True)
         .set_axis(['period_start', 'period_end'], axis=1)
       )

or to remove the column and create new ones:

df[['period_start', 'period_end']] = df.pop('period').str.split(r's+-s+',expand=True)

output:

  period_start  period_end
0   31-02-2022  30.09.2022
1   31.01.2022  31.12.2022
2   28.02.2019  30-04-2020

Answered By: mozway

Answer 3

the problem is you were trying to split on dash, and there’s many dashes in the one row, this work :

df[['period_start','period_end']]= df['period'].str.split(' - ',expand=True)

because we split on space + dash

Answered By: grymlin

Answer 4

You can use pandas.Series.str.extract to capture the two dates.

Try this :

out = (
        df['period'].str.replace('s*', '', regex=True)
                    .str.extract('(d{2}.d{2}.d{4}).(d{2}.d{2}.d{4})',expand=True)
                    .apply(pd.to_datetime, dayfirst=True)
                    .apply(lambda x: x.dt.strftime('%d.%m.%Y'))
                    .rename(columns={0: 'period_start', 1: 'period_end'})
      )

# Output :

print(out)

  period_start  period_end
0   28.02.2022  30.09.2022
1   31.01.2022  31.12.2022
2   28.02.2019  30.04.2020
3   20.01.2019  22.02.2020
4   19.03.2020  24.05.2021
5   13.09.2022  12.10.2022

Answered By: abokey

Split period into two dates when the date has the same delimiter

Question:

Answers:

# Output :