How to use multiple string conditions and numerical calculations on multiple columns to create multiple columns?
Question:
Input:
(Having error in uploading image, otherwise I always do.)
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['apple store', 'apple marketing', 'apple store', 'apple marketing'],
'rank': [10, 12, 10, 12],
'impression': [100, 200, 100, 200],
'landing page': ['nol.com/123', 'nol.com/123', 'oats.com/123', 'oats.com/123']
}
)
df
Output:
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['apple', 'store', 'marketing', 'apple', 'store', 'marketing'],
'mean_rank': [11.0, 10.0, 12.0, 11.0, 10.0, 12.0],
'impression': [300, 100, 200, 300, 100, 200],
'landing page': ['nol.com/123', 'nol.com/123', 'nol.com/123', 'oats.com/123', 'oats.com/123', 'oats.com/123'],
'keyword_length': [5, 5, 9, 5, 5, 9],
'impression_per_char': [50.0, 16.67, 20.0, 50.0, 16.67, 20.0]
}
)
df
Maybe this could be used to find words in keyword:
words = 'apple store'
re.findall('w+', words.casefold())
mean_rank = Mean rank of the word in keyword.
keyword_length = length of the word in keyword.
impression_per_char = Impression/(keyword_length + 1)
Actual dataset has 10,000 rows. This one is made by me, please tell if something is wrong with it. I’ll be parallelly working on this for the next few hours.
Also, for ‘mean_rank’ column, you can take weighted mean or some made up equation that (maybe also) uses ‘impression’, ‘keyword_length’ and/or ‘impression_per_char’ to find a sensible rank. If you do so, then I’ll select that as final answer instead.
Answers:
Use Series.str.casefold
with Series.str.split
and DataFrame.explode
for separate words, get legths of words by Series.str.len
, then aggregate sum
and mean
and last create impression_per_char
column:
df = df.assign(keyword = df['keyword'].str.casefold().str.split()).explode('keyword')
df['keyword_length'] = df['keyword'].str.len()
df = (df.groupby(['keyword','landing page', 'keyword_length' ], as_index=False, sort=False)
.agg(mean_rank=('rank','mean'), impression=('impression', 'sum')))
df['impression_per_char'] = df['impression'].div(df['keyword_length'].add(1))
print (df)
keyword landing page keyword_length mean_rank impression
0 apple nol.com/123 5 11 300
1 store nol.com/123 5 10 100
2 marketing nol.com/123 9 12 200
3 apple oats.com/123 5 11 300
4 store oats.com/123 5 10 100
5 marketing oats.com/123 9 12 200
impression_per_char
0 50.000000
1 16.666667
2 20.000000
3 50.000000
4 16.666667
5 20.000000
Input:
(Having error in uploading image, otherwise I always do.)
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['apple store', 'apple marketing', 'apple store', 'apple marketing'],
'rank': [10, 12, 10, 12],
'impression': [100, 200, 100, 200],
'landing page': ['nol.com/123', 'nol.com/123', 'oats.com/123', 'oats.com/123']
}
)
df
Output:
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['apple', 'store', 'marketing', 'apple', 'store', 'marketing'],
'mean_rank': [11.0, 10.0, 12.0, 11.0, 10.0, 12.0],
'impression': [300, 100, 200, 300, 100, 200],
'landing page': ['nol.com/123', 'nol.com/123', 'nol.com/123', 'oats.com/123', 'oats.com/123', 'oats.com/123'],
'keyword_length': [5, 5, 9, 5, 5, 9],
'impression_per_char': [50.0, 16.67, 20.0, 50.0, 16.67, 20.0]
}
)
df
Maybe this could be used to find words in keyword:
words = 'apple store'
re.findall('w+', words.casefold())
mean_rank = Mean rank of the word in keyword.
keyword_length = length of the word in keyword.
impression_per_char = Impression/(keyword_length + 1)
Actual dataset has 10,000 rows. This one is made by me, please tell if something is wrong with it. I’ll be parallelly working on this for the next few hours.
Also, for ‘mean_rank’ column, you can take weighted mean or some made up equation that (maybe also) uses ‘impression’, ‘keyword_length’ and/or ‘impression_per_char’ to find a sensible rank. If you do so, then I’ll select that as final answer instead.
Use Series.str.casefold
with Series.str.split
and DataFrame.explode
for separate words, get legths of words by Series.str.len
, then aggregate sum
and mean
and last create impression_per_char
column:
df = df.assign(keyword = df['keyword'].str.casefold().str.split()).explode('keyword')
df['keyword_length'] = df['keyword'].str.len()
df = (df.groupby(['keyword','landing page', 'keyword_length' ], as_index=False, sort=False)
.agg(mean_rank=('rank','mean'), impression=('impression', 'sum')))
df['impression_per_char'] = df['impression'].div(df['keyword_length'].add(1))
print (df)
keyword landing page keyword_length mean_rank impression
0 apple nol.com/123 5 11 300
1 store nol.com/123 5 10 100
2 marketing nol.com/123 9 12 200
3 apple oats.com/123 5 11 300
4 store oats.com/123 5 10 100
5 marketing oats.com/123 9 12 200
impression_per_char
0 50.000000
1 16.666667
2 20.000000
3 50.000000
4 16.666667
5 20.000000