Replace values based on column names and other dataframe

Question

Let’s say we have the following "cities" dataframe with cities as column names:

            NY  LA  Rome  London  Milan
date                                   
2023-01-01   1  81    26      55     95
2023-01-02  92  42    96      98      7
2023-01-03  14   4    60      88     73

In another "countries" dataframe I have cities and their countries:

     City Country
0      NY      US
1      LA      US
2  London      UK
3    Rome   Italy
4   Milan   Italy

I want to change values in the "cities" dataframe, replacing the value of each city with the median value of all cities in the same country on the date. Here’s the expected output. For example on 2023-01-01 the value for NY (41) is the median of 1 and 81.

            NY  LA  Rome  London  Milan
date                                   
2023-01-01  41  41  60.5      55   60.5
2023-01-02  67  67  51.5      98   51.5
2023-01-03   9   9  66.5      88   66.5

I think I need to use groupby but couldn’t make it work. Any help? Thanks.

Asked By: younggotti

||

Source

Answer 1

Lets reconstruct your data

import pandas as pd
cities = pd.DataFrame({'NY': {'2023-01-01': 1, '2023-01-02': 92, '2023-01-03': 14},
 'LA': {'2023-01-01': 81, '2023-01-02': 42, '2023-01-03': 4},
 'Rome': {'2023-01-01': 26, '2023-01-02': 96, '2023-01-03': 60},
 'London': {'2023-01-01': 55, '2023-01-02': 98, '2023-01-03': 88},
 'Milan': {'2023-01-01': 95, '2023-01-02': 7, '2023-01-03': 73}})

cities = cities.rename_axis('date')

countries = pd.DataFrame({'City': ['NY', 'LA', 'London', 'Rome', 'Milan'],
 'Country': ['US', 'US', 'UK', 'Italy', 'Italy']})

Melt the cities df and merge w/countries. Then take the mean and pivot.

df = cities.reset_index().melt(id_vars='date', var_name='City').merge(countries, on='City')
df['mean'] = df.groupby(['date','Country'])['value'].transform('mean')

df = df.pivot_table(index='date', columns='City', values='mean')

print(df)

Output

City          LA  London  Milan    NY  Rome
date                                       
2023-01-01  41.0    55.0   60.5  41.0  60.5
2023-01-02  67.0    98.0   51.5  67.0  51.5
2023-01-03   9.0    88.0   66.5   9.0  66.5

Answered By: Chris

Answer 2

We can also take advantage of pandas method chaining here:

import pandas as pd
imoprt numpy as np

(df.reset_index().melt(id_vars='date')
 .merge(df2, left_on='variable', right_on='index')
 .assign(value = lambda d: d.groupby(['date', 'City Country']).transform(np.median))
 .pivot(index='date', columns='index', values='value'))

index         LA  London  Milan    NY  Rome
date                                       
2023-01-01  41.0    55.0   60.5  41.0  60.5
2023-01-02  67.0    98.0   51.5  67.0  51.5
2023-01-03   9.0    88.0   66.5   9.0  66.5

Answered By: Anoushiravan R

Replace values based on column names and other dataframe

Question:

Answers: