Replace values based on column names and other dataframe
Question:
Let’s say we have the following "cities" dataframe with cities as column names:
NY LA Rome London Milan
date
2023-01-01 1 81 26 55 95
2023-01-02 92 42 96 98 7
2023-01-03 14 4 60 88 73
In another "countries" dataframe I have cities and their countries:
City Country
0 NY US
1 LA US
2 London UK
3 Rome Italy
4 Milan Italy
I want to change values in the "cities" dataframe, replacing the value of each city with the median value of all cities in the same country on the date. Here’s the expected output. For example on 2023-01-01 the value for NY (41) is the median of 1 and 81.
NY LA Rome London Milan
date
2023-01-01 41 41 60.5 55 60.5
2023-01-02 67 67 51.5 98 51.5
2023-01-03 9 9 66.5 88 66.5
I think I need to use groupby but couldn’t make it work. Any help? Thanks.
Answers:
Lets reconstruct your data
import pandas as pd
cities = pd.DataFrame({'NY': {'2023-01-01': 1, '2023-01-02': 92, '2023-01-03': 14},
'LA': {'2023-01-01': 81, '2023-01-02': 42, '2023-01-03': 4},
'Rome': {'2023-01-01': 26, '2023-01-02': 96, '2023-01-03': 60},
'London': {'2023-01-01': 55, '2023-01-02': 98, '2023-01-03': 88},
'Milan': {'2023-01-01': 95, '2023-01-02': 7, '2023-01-03': 73}})
cities = cities.rename_axis('date')
countries = pd.DataFrame({'City': ['NY', 'LA', 'London', 'Rome', 'Milan'],
'Country': ['US', 'US', 'UK', 'Italy', 'Italy']})
Melt the cities df and merge w/countries. Then take the mean and pivot.
df = cities.reset_index().melt(id_vars='date', var_name='City').merge(countries, on='City')
df['mean'] = df.groupby(['date','Country'])['value'].transform('mean')
df = df.pivot_table(index='date', columns='City', values='mean')
print(df)
Output
City LA London Milan NY Rome
date
2023-01-01 41.0 55.0 60.5 41.0 60.5
2023-01-02 67.0 98.0 51.5 67.0 51.5
2023-01-03 9.0 88.0 66.5 9.0 66.5
We can also take advantage of pandas method chaining here:
import pandas as pd
imoprt numpy as np
(df.reset_index().melt(id_vars='date')
.merge(df2, left_on='variable', right_on='index')
.assign(value = lambda d: d.groupby(['date', 'City Country']).transform(np.median))
.pivot(index='date', columns='index', values='value'))
index LA London Milan NY Rome
date
2023-01-01 41.0 55.0 60.5 41.0 60.5
2023-01-02 67.0 98.0 51.5 67.0 51.5
2023-01-03 9.0 88.0 66.5 9.0 66.5
Let’s say we have the following "cities" dataframe with cities as column names:
NY LA Rome London Milan
date
2023-01-01 1 81 26 55 95
2023-01-02 92 42 96 98 7
2023-01-03 14 4 60 88 73
In another "countries" dataframe I have cities and their countries:
City Country
0 NY US
1 LA US
2 London UK
3 Rome Italy
4 Milan Italy
I want to change values in the "cities" dataframe, replacing the value of each city with the median value of all cities in the same country on the date. Here’s the expected output. For example on 2023-01-01 the value for NY (41) is the median of 1 and 81.
NY LA Rome London Milan
date
2023-01-01 41 41 60.5 55 60.5
2023-01-02 67 67 51.5 98 51.5
2023-01-03 9 9 66.5 88 66.5
I think I need to use groupby but couldn’t make it work. Any help? Thanks.
Lets reconstruct your data
import pandas as pd
cities = pd.DataFrame({'NY': {'2023-01-01': 1, '2023-01-02': 92, '2023-01-03': 14},
'LA': {'2023-01-01': 81, '2023-01-02': 42, '2023-01-03': 4},
'Rome': {'2023-01-01': 26, '2023-01-02': 96, '2023-01-03': 60},
'London': {'2023-01-01': 55, '2023-01-02': 98, '2023-01-03': 88},
'Milan': {'2023-01-01': 95, '2023-01-02': 7, '2023-01-03': 73}})
cities = cities.rename_axis('date')
countries = pd.DataFrame({'City': ['NY', 'LA', 'London', 'Rome', 'Milan'],
'Country': ['US', 'US', 'UK', 'Italy', 'Italy']})
Melt the cities df and merge w/countries. Then take the mean and pivot.
df = cities.reset_index().melt(id_vars='date', var_name='City').merge(countries, on='City')
df['mean'] = df.groupby(['date','Country'])['value'].transform('mean')
df = df.pivot_table(index='date', columns='City', values='mean')
print(df)
Output
City LA London Milan NY Rome
date
2023-01-01 41.0 55.0 60.5 41.0 60.5
2023-01-02 67.0 98.0 51.5 67.0 51.5
2023-01-03 9.0 88.0 66.5 9.0 66.5
We can also take advantage of pandas method chaining here:
import pandas as pd
imoprt numpy as np
(df.reset_index().melt(id_vars='date')
.merge(df2, left_on='variable', right_on='index')
.assign(value = lambda d: d.groupby(['date', 'City Country']).transform(np.median))
.pivot(index='date', columns='index', values='value'))
index LA London Milan NY Rome
date
2023-01-01 41.0 55.0 60.5 41.0 60.5
2023-01-02 67.0 98.0 51.5 67.0 51.5
2023-01-03 9.0 88.0 66.5 9.0 66.5