How to replace a column in a DataFrame with a column of tuples
Question:
So I’ve got an integer Series and I want to transform it into a Series of tuples where the dictionary is the transformation.
The size of Data is big so speed is important
The relationship between numbers here is not relevant to the problem (1 -> (any tuple))
int_series = pd.Series([1, 2, 3, 1, 5])
replacement_dict = {
1: (1, 11),
2: (2, 22),
3: (3, 33),
5: (5, 55)
}
# Expected output
0 (1, 11)
1 (2, 22)
2 (3, 33)
3 (1, 11)
4 (5, 55)
dtype: object
Using replace has an unexpected (to me) output. Where it iterates based on row over indexes of the tuple
# Using replace
tuple_series = int_series.replace(replacement_dict)
print(tuple_series)
# output
0 1
1 22
2 3
3 11
4 5
dtype: int64
So I know I can do this by list packing, but I was wondering if a better solution exists.
# List packing solution
tuple_series = pd.Series([replacement_dict.get(value) for value in int_series.to_numpy()])
It’s not actually that important that a tuple is preserved, only that the information inside of it is held inside something that can be inserted into a np.ndarry. (i.e. if a solution exists with lists or some other object that is quicker, then that is also acceptable as a solution)
Answers:
Try using map
:
int_series.map(replacement_dict)
Output:
0 (1, 11)
1 (2, 22)
2 (3, 33)
3 (1, 11)
4 (5, 55)
dtype: object
Use pd.Series.map
int_series.map(replacement_dict)
So I’ve got an integer Series and I want to transform it into a Series of tuples where the dictionary is the transformation.
The size of Data is big so speed is important
The relationship between numbers here is not relevant to the problem (1 -> (any tuple))
int_series = pd.Series([1, 2, 3, 1, 5])
replacement_dict = {
1: (1, 11),
2: (2, 22),
3: (3, 33),
5: (5, 55)
}
# Expected output
0 (1, 11)
1 (2, 22)
2 (3, 33)
3 (1, 11)
4 (5, 55)
dtype: object
Using replace has an unexpected (to me) output. Where it iterates based on row over indexes of the tuple
# Using replace
tuple_series = int_series.replace(replacement_dict)
print(tuple_series)
# output
0 1
1 22
2 3
3 11
4 5
dtype: int64
So I know I can do this by list packing, but I was wondering if a better solution exists.
# List packing solution
tuple_series = pd.Series([replacement_dict.get(value) for value in int_series.to_numpy()])
It’s not actually that important that a tuple is preserved, only that the information inside of it is held inside something that can be inserted into a np.ndarry. (i.e. if a solution exists with lists or some other object that is quicker, then that is also acceptable as a solution)
Try using map
:
int_series.map(replacement_dict)
Output:
0 (1, 11)
1 (2, 22)
2 (3, 33)
3 (1, 11)
4 (5, 55)
dtype: object
Use pd.Series.map
int_series.map(replacement_dict)