Add uuid to a new column in a pandas DataFrame
Question:
I’m looking to add a uuid for every row in a single new column in a pandas DataFrame. This obviously fills the column with the same uuid:
import uuid
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
index=['apple', 'banana', 'cherry', 'date'])
df['uuid'] = uuid.uuid4()
print(df)
a b c uuid
apple 0.687601 -1.332904 -0.166018 34115445-c4b8-4e64-bc96-e120abda1653
banana -2.252191 -0.844470 0.384140 34115445-c4b8-4e64-bc96-e120abda1653
cherry -0.470388 0.642342 0.692454 34115445-c4b8-4e64-bc96-e120abda1653
date -0.943255 1.450051 -0.296499 34115445-c4b8-4e64-bc96-e120abda1653
What I am looking for is a new uuid in each row of the ‘uuid’ column. I have also tried using .apply() and .map() without success.
Answers:
This is one way:
df['uuid'] = [uuid.uuid4() for _ in range(len(df.index))]
To create a new column, you must have enough values to fill the column. If we know the number of rows (by calculating the len of the dataframe), we can create a set of values that can then be applied to a column.
import uuid
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
index=['apple', 'banana', 'cherry', 'date'])
# you can create a simple list of values using a list comprehension
# based on the len (or number of rows) of the dataframe
df['uuid'] = [uuid.uuid4() for x in range(len(df))]
print(df)
apple -0.775699 -1.104219 1.144653 f98a9c76-99b7-4ba7-9c0a-9121cdf8ad7f
banana -1.540495 -0.945760 0.649370 179819a0-3d0f-43f8-8645-da9229ef3fc3
cherry -0.340872 2.445467 -1.071793 b48a9830-3a10-4ce0-bca0-0cc136f09732
date -1.286273 0.244233 0.626831 e7b7c65c-0adc-4ba6-88ab-2160e9858fc4
from uuid import uuid4
df['uuid'] = df.index.to_series().map(lambda x: uuid4())
I can’t speak to computational efficiency here, but I prefer the syntax here, as it’s consistent with the other apply-lambda modifications I usually use to generate new columns:
df['uuid'] = df.apply(lambda _: uuid.uuid4(), axis=1)
You can also pick a random column to remove the axis requirement (why axis=0
is the default, I’ll never understand):
df['uuid'] = df['col'].apply(lambda _: uuid.uuid4())
The downside to these is technically you’re passing in a variable (_
) that you don’t actually use. It would be mildly nice to have the capability to do something like lambda: uuid.uuid4()
, but apply
doesn’t support lambas with no args, which is reasonable given its use case would be rather limited.
A revised version of S. A. Calder’s answer using Pandas v1.5.2:
from uuid import uuid4
df['uuid'] = df.index.map(lambda _: uuid4())
There is no need to convert the index to a Series. replacing lambda x:
with lambda _:
indicates to the programmer that the series elements provided by the map method are unused in calculating the UUIDs.
I’m looking to add a uuid for every row in a single new column in a pandas DataFrame. This obviously fills the column with the same uuid:
import uuid
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
index=['apple', 'banana', 'cherry', 'date'])
df['uuid'] = uuid.uuid4()
print(df)
a b c uuid
apple 0.687601 -1.332904 -0.166018 34115445-c4b8-4e64-bc96-e120abda1653
banana -2.252191 -0.844470 0.384140 34115445-c4b8-4e64-bc96-e120abda1653
cherry -0.470388 0.642342 0.692454 34115445-c4b8-4e64-bc96-e120abda1653
date -0.943255 1.450051 -0.296499 34115445-c4b8-4e64-bc96-e120abda1653
What I am looking for is a new uuid in each row of the ‘uuid’ column. I have also tried using .apply() and .map() without success.
This is one way:
df['uuid'] = [uuid.uuid4() for _ in range(len(df.index))]
To create a new column, you must have enough values to fill the column. If we know the number of rows (by calculating the len of the dataframe), we can create a set of values that can then be applied to a column.
import uuid
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'),
index=['apple', 'banana', 'cherry', 'date'])
# you can create a simple list of values using a list comprehension
# based on the len (or number of rows) of the dataframe
df['uuid'] = [uuid.uuid4() for x in range(len(df))]
print(df)
apple -0.775699 -1.104219 1.144653 f98a9c76-99b7-4ba7-9c0a-9121cdf8ad7f
banana -1.540495 -0.945760 0.649370 179819a0-3d0f-43f8-8645-da9229ef3fc3
cherry -0.340872 2.445467 -1.071793 b48a9830-3a10-4ce0-bca0-0cc136f09732
date -1.286273 0.244233 0.626831 e7b7c65c-0adc-4ba6-88ab-2160e9858fc4
from uuid import uuid4
df['uuid'] = df.index.to_series().map(lambda x: uuid4())
I can’t speak to computational efficiency here, but I prefer the syntax here, as it’s consistent with the other apply-lambda modifications I usually use to generate new columns:
df['uuid'] = df.apply(lambda _: uuid.uuid4(), axis=1)
You can also pick a random column to remove the axis requirement (why axis=0
is the default, I’ll never understand):
df['uuid'] = df['col'].apply(lambda _: uuid.uuid4())
The downside to these is technically you’re passing in a variable (_
) that you don’t actually use. It would be mildly nice to have the capability to do something like lambda: uuid.uuid4()
, but apply
doesn’t support lambas with no args, which is reasonable given its use case would be rather limited.
A revised version of S. A. Calder’s answer using Pandas v1.5.2:
from uuid import uuid4
df['uuid'] = df.index.map(lambda _: uuid4())
There is no need to convert the index to a Series. replacing lambda x:
with lambda _:
indicates to the programmer that the series elements provided by the map method are unused in calculating the UUIDs.