group-by

mark duplicate as 0 in new column based on condition

mark duplicate as 0 in new column based on condition Question: I have dataframe as below data =[[‘a’,96.21623993,1], [‘a’,99.88211060,1], [‘b’,99.90232849,1], [‘b’,99.91232849,1], [‘b’,99.91928864,1], [‘c’,99.89162445,1], [‘d’,99.95264435,1], [‘a’,99.82862091,2], [‘a’,99.84466553,2], [‘b’,99.89685059,2], [‘c’,78.10614777,2], [‘c’,97.73305511,2], [‘d’,95.42383575,2], ] df = pd.DataFrame(data, columns=[‘ename’,’score’, ‘groupid’]) df I need to mark duplicate as 0 in new column but NOT the one with highest score. and …

Total answers: 2

How to Get Proper Count of Values in Python

How to Get Proper Count of Values in Python Question: I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ …

Total answers: 2

Why do these different outlier methods fail to detect outliers?

Why do these different outlier methods fail to detect outliers? Question: I am trying to find the outliers by group for my dataframe. I have two groups: Group1 and Group2, and I am trying to find the best way to implement an outlier method data = {‘Group1’:[‘A’, ‘A’, ‘A’, ‘B’, ‘B’, ‘B’,’A’,’A’,’B’,’B’,’B’,’A’,’A’,’A’,’B’,’B’,’B’,’A’,’A’,’A’,’B’,’B’,’B’,’A’,’A’,’A’,’A’,’A’,’B’,’B’], ‘Group2’:[‘C’, ‘C’, ‘C’, …

Total answers: 1

how to merge tuple and dataframe data

how to merge tuple and dataframe data Question: may be my questions is too basic but I am learning python. Let me know if you need more information. I have dataframe as below. ID Model MVersion dId sGroup eName eValue 0 1 Main V15 40 GROUP 1 dNumber U220059090(C) 1 2 Main V15 40 GROUP …

Total answers: 2

Adding rank column for every numerical column in pandas

Adding rank column for every numerical column in pandas Question: Here is an example of my dataframe (my actual dataframe has 20+ columns and 100+ rows) df = pd.DataFrame([[‘Jim’, 93, 87, 66], [‘Bob’, 88, 90, 65], [‘Joe’, 72, 100, 70]], columns=[‘Name’, ‘Score1’, ‘Score2’, ‘Score3’]) Name Score1 Score2 Score3 Jim 93 87 66 Bob 88 90 …

Total answers: 2

How can I calculate a date differential in Python across multiple rows and columns?

How can I calculate a date differential in Python across multiple rows and columns? Question: I’m trying to calculate the differential between the first Sent date/time in an ID and the last Received date/time in an ID, grouping them by Source and Destination. Sample (named test_subset) looks like this (but it is ‘000s of rows): …

Total answers: 1

How To Remove Specific Rows With Consecutive Values

How To Remove Specific Rows With Consecutive Values Question: I have a Pandas dataframe, df_next, that is a monthly aggregation of crime type incidents for specific jurisdictions. For example, something like: ID Year_Month Total AL0010000 1991-01 2024 AL0010000 1991-02 3017 … … … WV0550300 2018-11 30147 WV0550300 2018-12 32148 I want to reduce the size …

Total answers: 2

Python Polars group by on both time and categorical values

Python Polars group by on both time and categorical values Question: There is a polars dataframe which consists of 3 fields listed below. user_id date part_of_day i32 datetime[ns] cat 173367 2021-08-03 00:00:00 "day" 132702 2021-10-28 00:00:00 "evening" 100853 2021-07-29 00:00:00 "night" 305810 2021-08-24 00:00:00 "day" 305239 2021-08-13 00:00:00 "day" My task is to calculate the …

Total answers: 1

Ratio after a groupby in pyspark

Ratio after a groupby in pyspark Question: I have a pyspark df like this +————+————-+ |Gender | Language| +————+————-+ | Male| Spanish| | Female| English| | Female| Indian| | Female| Spanish| | Female| Indian| | Male| English| | Male| English| | Female|Latin Spanish| | Male| Spanish| | Female| English| | Male| Indian| | Male| Catalan| …

Total answers: 1

Group rows partially [Python] [Pandas]

Group rows partially [Python] [Pandas] Question: 0 Good morning everyone. I have the following data: import pandas as pd info = { ‘states’: [-1, -1, -1, 1, 1, -1, 0, 1, 1, 1], ‘values’: [34, 29, 28, 30, 35, 33, 33, 36, 40, 41] } df = pd.DataFrame(data=info) print(df) >>> states values 0 -1 34 …

Total answers: 1