How to save multiple values on different rows as a variable or list in a CSV using Python Pandas
Question:
I’m currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a ‘1’ in another column (not shown) to mark that ID and row/s.
I’m looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I’ll create a 1 in another column on the same row.
ID
DATE
1
11/11/2011
1
11/11/2011
2
5/05/2011
2
20/06/2011
3
2/04/2011
3
10/08/2011
4
8/12/2011
4
1/02/2012
4
12/03/2012
For this post, I’m mainly looking to save the multiple values as variables or a list. I’m hoping to figure out the rest once this roadblock has been removed.
Here’s what I got currently, but I don’t think it’ll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.
import pandas as pd
from datetime import *
filename = 'TestData.csv'
df = pd.read_csv(filename)
print (df.iloc[0,1])
x = 0
for i in df.iloc:
FixDate = df.iloc[x, 1]
d1, m1, y1 = FixDate.split('/')
d1 = int(d1)
m1 = int(m1)
y1 = int(y1)
finaldate = date(y1, m1, d1)
print(finaldate)
x = x + 1
Any help is appreciated, thank you!
Answers:
In pandas for performance is best avoid loops, if need new column tested if same values in DATE
per groups use GroupBy.transform
with DataFrameGroupBy.nunique
and then compare values by 1
:
df = pd.read_csv(filename)
df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
ID DATE test
0 1 11/11/2011 1
1 1 11/11/2011 1
2 2 5/05/2011 0
3 2 20/06/2011 0
4 3 2/04/2011 0
5 3 10/08/2011 0
6 4 8/12/2011 0
7 4 1/02/2012 0
8 4 12/03/2012 0
If need filter matched rows:
mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)
df1 = df[mask]
print (df1)
ID DATE
0 1 11/11/2011
1 1 11/11/2011
In last step convert values to lists:
IDlist = df1['ID'].tolist()
I’m currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a ‘1’ in another column (not shown) to mark that ID and row/s.
I’m looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I’ll create a 1 in another column on the same row.
ID | DATE |
---|---|
1 | 11/11/2011 |
1 | 11/11/2011 |
2 | 5/05/2011 |
2 | 20/06/2011 |
3 | 2/04/2011 |
3 | 10/08/2011 |
4 | 8/12/2011 |
4 | 1/02/2012 |
4 | 12/03/2012 |
For this post, I’m mainly looking to save the multiple values as variables or a list. I’m hoping to figure out the rest once this roadblock has been removed.
Here’s what I got currently, but I don’t think it’ll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.
import pandas as pd
from datetime import *
filename = 'TestData.csv'
df = pd.read_csv(filename)
print (df.iloc[0,1])
x = 0
for i in df.iloc:
FixDate = df.iloc[x, 1]
d1, m1, y1 = FixDate.split('/')
d1 = int(d1)
m1 = int(m1)
y1 = int(y1)
finaldate = date(y1, m1, d1)
print(finaldate)
x = x + 1
Any help is appreciated, thank you!
In pandas for performance is best avoid loops, if need new column tested if same values in DATE
per groups use GroupBy.transform
with DataFrameGroupBy.nunique
and then compare values by 1
:
df = pd.read_csv(filename)
df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
ID DATE test
0 1 11/11/2011 1
1 1 11/11/2011 1
2 2 5/05/2011 0
3 2 20/06/2011 0
4 3 2/04/2011 0
5 3 10/08/2011 0
6 4 8/12/2011 0
7 4 1/02/2012 0
8 4 12/03/2012 0
If need filter matched rows:
mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)
df1 = df[mask]
print (df1)
ID DATE
0 1 11/11/2011
1 1 11/11/2011
In last step convert values to lists:
IDlist = df1['ID'].tolist()