Is there a way to transform a pandas dataframe to have a particular column's value as a ID or something?


I am working with a metrics dataset. I want to transform the given dataset into something where stringValue == ‘IN’ will be the ID of the row so that we can filter/extract info based on the stringValue == ‘IN’. We will have to group by timeComputed as well.

The following image is of the dataset that we have as an input:

enter image description here

Our ultimate goal is to find other metrics for the specific country. Here the country is India – ‘IN’ (there will be different countries in the dataset). I want to find ‘col_stats:SUM:Quantity’ or other similar metrics for the country ‘IN’ given the same ‘timeComputed’.

I can do it by extracting ‘IN’ first, then getting the timeComputed and then searching for other metrics with the extracted timeComputed. But this seems like a overdo

I am expecting the resulting dataset similar to following dataset:

countryCode timeComputed metricId
IN 2021-04-04 records:COUNT_RECORDS
KR 2022-05-05 col_stats:SUM:Quantity

@jezrael I tried the updated solution and it gives me a dataframe as follows:

enter image description here

So now we need to have a solution where the output dataframe is like where except countryCode every other metricId in that timeComputed should be a column:

countryCode timeComputed reporting:METRICS_COMPUTATION_DURATION basic:COUNT_COLUMNS col_stats:COUNT_NULL:EndCustomerAccount
IN 2023-02-21 13:28:15.705000+00:00 2282 25 75229
IN 2023-02-21 13:28:38.354000+00:00 2765 25 75229
Asked By: subtle_code



If need partition and timeComputed per IN and all rows with match use:

df1 = df.loc[df['stringValue'].eq('IN'), ['partition','timeComputed']]

df2 = (df.merge(df1.drop_duplicates())['stringValue','timeComputed','metricId']]

If need timeComputed per IN and all rows with match use:

s = df.loc[df['stringValue'].eq('IN'), 'timeComputed']

df2 = (df.loc[df['timeComputed'].isin(s),['stringValue','timeComputed','metricId']]
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.