Duplicate row based on specific conditions in column
Question:
As input I have a table like this example:
ID_EXAMPLE
KRED_NUMMER
SYSTEM
BUKRS
1
100506
PPL
0002 (PG 1030);#1025;#1340
2
1656082
PASM
3333
3
1656099
DFMG
0716;#6008;#6104
And as you can see last column BUKRS can have different value formats. And I’d like for each code (4 digits) in BUKRS to create a separate row, but with the same values in other columns. Based on 2 conditions:
- duplicate row for first 4 digits in BUKRS (if there is a string in the brackets, ignore it)
- duplicate row for each 4 digits in BUKRS after #
The expected result:
ID_EXAMPLE
KRED_NUMMER
SYSTEM
BUKRS
My Comment
1
100506
PPL
0002
always take first 4 digits in BUKRS
1
100506
PPL
1025
4 digits after each # (after first #)
1
100506
PPL
1340
4 digits after each # (after second #)
2
1656082
PASM
3333
always take first 4 digits in field
3
1656099
DFMG
0716
always take first 4 digits in field
3
1656099
DFMG
6008
4 digits after each # (after first #)
3
1656099
DFMG
6104
4 digits after each # (after second #)
Answers:
You can use str.findall
and a short regex to get the numbers, then explode
:
out = (df
.assign(BUKRS=df['BUKRS'].str.findall(r'(?:^|#)(d{4})'))
.explode('BUKRS')
)
output:
ID_EXAMPLE KRED_NUMMER SYSTEM BUKRS
0 1 100506 PPL 0002
0 1 100506 PPL 1025
0 1 100506 PPL 1340
1 2 1656082 PASM 3333
2 3 1656099 DFMG 0716
2 3 1656099 DFMG 6008
2 3 1656099 DFMG 6104
As input I have a table like this example:
ID_EXAMPLE | KRED_NUMMER | SYSTEM | BUKRS |
---|---|---|---|
1 | 100506 | PPL | 0002 (PG 1030);#1025;#1340 |
2 | 1656082 | PASM | 3333 |
3 | 1656099 | DFMG | 0716;#6008;#6104 |
And as you can see last column BUKRS can have different value formats. And I’d like for each code (4 digits) in BUKRS to create a separate row, but with the same values in other columns. Based on 2 conditions:
- duplicate row for first 4 digits in BUKRS (if there is a string in the brackets, ignore it)
- duplicate row for each 4 digits in BUKRS after #
The expected result:
ID_EXAMPLE | KRED_NUMMER | SYSTEM | BUKRS | My Comment |
---|---|---|---|---|
1 | 100506 | PPL | 0002 | always take first 4 digits in BUKRS |
1 | 100506 | PPL | 1025 | 4 digits after each # (after first #) |
1 | 100506 | PPL | 1340 | 4 digits after each # (after second #) |
2 | 1656082 | PASM | 3333 | always take first 4 digits in field |
3 | 1656099 | DFMG | 0716 | always take first 4 digits in field |
3 | 1656099 | DFMG | 6008 | 4 digits after each # (after first #) |
3 | 1656099 | DFMG | 6104 | 4 digits after each # (after second #) |
You can use str.findall
and a short regex to get the numbers, then explode
:
out = (df
.assign(BUKRS=df['BUKRS'].str.findall(r'(?:^|#)(d{4})'))
.explode('BUKRS')
)
output:
ID_EXAMPLE KRED_NUMMER SYSTEM BUKRS
0 1 100506 PPL 0002
0 1 100506 PPL 1025
0 1 100506 PPL 1340
1 2 1656082 PASM 3333
2 3 1656099 DFMG 0716
2 3 1656099 DFMG 6008
2 3 1656099 DFMG 6104