Iterate over pairs of columns and add value from one under condition in new column Python
Question:
I have dataframe with steps/action in user behaviour. Sample is provided. There are many steps. Each step contains two columns: subtitle and dimension.
df = pd.DataFrame({'idVisit': [1, 2, 3],
'subtitle (step 0)': ['download', 'homepage', 'www.example.com'],
'dimension1 (step 0)': ['client', nan, 'internal'],
'subtitle (step 1)': ['pageview', 'pageview', 'map'],
'dimension1 (step 1)': ['client', 'client', nan],
'subtitle (step 2)': ['download', 'homepage', 'www.example.com'],
'dimension1 (step 2)': ['client', nan, 'internal'],
'subtitle (step 3)': ['pageview', 'pageview', 'map'],
'dimension1 (step 3)': ['client', 'client', nan]}
I need to merge columns subtitle and dimension for each step new column – if dimension is empty then keep only subtitle, if not keep only dimension.
So new column step0 value: if df[‘dimension1 (step0)’] not null value then use df[‘dimension1 (step0)]
if df[‘dimension 1 (step0)] is null then use df[‘subtitle (step0)’]
then repeated for step1.
I am complete newbie.
Expected output:
[In]: df['step0']
[Out]: ['client', 'homepage', 'internal']
[In]: df['step1']
[Out]: ['client', 'client', 'map']
# etc.
Answers:
Assume idVisit
is the index. Then you may try .combine_first()
method on every odd column (dimension
) with every even one (subtitle
):
# set the index just in case
df.set_index('idVisit', inplace=True)
# loop over subtitles and dimensions zipped together and enumerated
for n, (subtitle, dimension) in enumerate(zip(df.columns[0::2], df.columns[1::2])):
df[f'step {n}'] = df[dimension].combine_first(df[subtitle])
# show only added columns
df.iloc[:, 8:]
Output:
# only the added columns are shown
step 0 step 1 step 2 step 3
idVisit
1 client client client client
2 homepage client homepage client
3 internal map internal map
I have dataframe with steps/action in user behaviour. Sample is provided. There are many steps. Each step contains two columns: subtitle and dimension.
df = pd.DataFrame({'idVisit': [1, 2, 3],
'subtitle (step 0)': ['download', 'homepage', 'www.example.com'],
'dimension1 (step 0)': ['client', nan, 'internal'],
'subtitle (step 1)': ['pageview', 'pageview', 'map'],
'dimension1 (step 1)': ['client', 'client', nan],
'subtitle (step 2)': ['download', 'homepage', 'www.example.com'],
'dimension1 (step 2)': ['client', nan, 'internal'],
'subtitle (step 3)': ['pageview', 'pageview', 'map'],
'dimension1 (step 3)': ['client', 'client', nan]}
I need to merge columns subtitle and dimension for each step new column – if dimension is empty then keep only subtitle, if not keep only dimension.
So new column step0 value: if df[‘dimension1 (step0)’] not null value then use df[‘dimension1 (step0)]
if df[‘dimension 1 (step0)] is null then use df[‘subtitle (step0)’]
then repeated for step1.
I am complete newbie.
Expected output:
[In]: df['step0']
[Out]: ['client', 'homepage', 'internal']
[In]: df['step1']
[Out]: ['client', 'client', 'map']
# etc.
Assume idVisit
is the index. Then you may try .combine_first()
method on every odd column (dimension
) with every even one (subtitle
):
# set the index just in case
df.set_index('idVisit', inplace=True)
# loop over subtitles and dimensions zipped together and enumerated
for n, (subtitle, dimension) in enumerate(zip(df.columns[0::2], df.columns[1::2])):
df[f'step {n}'] = df[dimension].combine_first(df[subtitle])
# show only added columns
df.iloc[:, 8:]
Output:
# only the added columns are shown
step 0 step 1 step 2 step 3
idVisit
1 client client client client
2 homepage client homepage client
3 internal map internal map