Iterate dataframe and sum transactions by condition
Question:
I have the following sample of data:
id year type num
1 1994 A 0
2 1950 A 2333
3 1977 B 4444
4 1995 B 555
1 1994 A 0
6 1955 A 333
7 2006 B 4123
6 1975 A 0
9 1999 B 123
3 1950 A 1234
I’m looking for the easiest way how to sum column ‘num’ based on conditions of type == ‘A’ and year < 1999
I’m iterating through the dataframe df with the data:
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
df_sum = pd.DataFrame
for index, row in df.iterrows():
if row['type'] == 'A' and row['year'] < 1999:
df_sum = df_sum.append(row) //This doesn't work
and trying to store the rows that match the conditions into df_sum where I’d make the sumarized num by id. Have no idea how to iterate and store the data based on condition into new dataframe.
The desired output would be:
id num_sum
1 0
2 2333
6 333
.....
Answers:
You can use df.query()
to accomplish that.
filtered_df = df.query('type == "A" and year < 1999')
sum_df = filtered_df.groupby("id")["num"].sum().reset_index()
print(sum_df)
Output:
id num
0 1 0
1 2 2333
2 3 1234
3 6 333
For summarised data you could filter, groupby sum then reset index
df_sum = df[((df.type=='A')&(df.year<1999))].groupby('id').sum('num').reset_index()
df_sum
Out[276]:
id year num
0 1 3988 0
1 2 1950 2333
2 3 1950 1234
3 6 3930 333
I have the following sample of data:
id year type num
1 1994 A 0
2 1950 A 2333
3 1977 B 4444
4 1995 B 555
1 1994 A 0
6 1955 A 333
7 2006 B 4123
6 1975 A 0
9 1999 B 123
3 1950 A 1234
I’m looking for the easiest way how to sum column ‘num’ based on conditions of type == ‘A’ and year < 1999
I’m iterating through the dataframe df with the data:
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
df_sum = pd.DataFrame
for index, row in df.iterrows():
if row['type'] == 'A' and row['year'] < 1999:
df_sum = df_sum.append(row) //This doesn't work
and trying to store the rows that match the conditions into df_sum where I’d make the sumarized num by id. Have no idea how to iterate and store the data based on condition into new dataframe.
The desired output would be:
id num_sum
1 0
2 2333
6 333
.....
You can use df.query()
to accomplish that.
filtered_df = df.query('type == "A" and year < 1999')
sum_df = filtered_df.groupby("id")["num"].sum().reset_index()
print(sum_df)
Output:
id num
0 1 0
1 2 2333
2 3 1234
3 6 333
For summarised data you could filter, groupby sum then reset index
df_sum = df[((df.type=='A')&(df.year<1999))].groupby('id').sum('num').reset_index()
df_sum
Out[276]:
id year num
0 1 3988 0
1 2 1950 2333
2 3 1950 1234
3 6 3930 333