Keeping duplicate values and trying pivot
Question:
I have a CSV file that has various repeating elements. Below is an example of the table.
id code date
314asre B1 20201010
343asr B2 20200405
150ase B1 20210615
314asre B1 20210506
343asr B3 20220408
I want to create something like this:
id
B1
B2
B3
314asre
20201010
NULL
NULL
343asr
NULL
20200405
NULL
150ase
20210615
NULL
NULL
314asre
20210506
NULL
NULL
343asr
NULL
NULL
20220408
I tried using pivot and pivot table, but I got an error saying duplicate values in the index for the pivot, and pivot_table removes the duplicate values completely. I want to keep the duplicate values as shown in the above table.
Any help will be appreciated.
Code I have tried:
df.pivot_table(index='id' , columns = 'code' , values= 'date')
df.reset_index(inplace=True)
Answers:
Make pivot table with random index and change index to id column as follows.
df['temp'] = range(len(df))
result = df.pivot_table(index= 'temp', columns = 'code' , values= 'date')
result.index = df['id']
result
the result is as follows.
code B1 B2 B3
id
314asre 20201010.0 NaN NaN
343asr NaN 20200405.0 NaN
150ase 20210615.0 NaN NaN
314asre 20210506.0 NaN NaN
343asr NaN NaN 20220408.0
I have a CSV file that has various repeating elements. Below is an example of the table.
id code date
314asre B1 20201010
343asr B2 20200405
150ase B1 20210615
314asre B1 20210506
343asr B3 20220408
I want to create something like this:
id | B1 | B2 | B3 |
---|---|---|---|
314asre | 20201010 | NULL | NULL |
343asr | NULL | 20200405 | NULL |
150ase | 20210615 | NULL | NULL |
314asre | 20210506 | NULL | NULL |
343asr | NULL | NULL | 20220408 |
I tried using pivot and pivot table, but I got an error saying duplicate values in the index for the pivot, and pivot_table removes the duplicate values completely. I want to keep the duplicate values as shown in the above table.
Any help will be appreciated.
Code I have tried:
df.pivot_table(index='id' , columns = 'code' , values= 'date')
df.reset_index(inplace=True)
Make pivot table with random index and change index to id column as follows.
df['temp'] = range(len(df))
result = df.pivot_table(index= 'temp', columns = 'code' , values= 'date')
result.index = df['id']
result
the result is as follows.
code B1 B2 B3
id
314asre 20201010.0 NaN NaN
343asr NaN 20200405.0 NaN
150ase 20210615.0 NaN NaN
314asre 20210506.0 NaN NaN
343asr NaN NaN 20220408.0