Python & Pandas: How do I create a column of fake data containing dates in a specific format

Question:

I’m trying to create a fake column of dates within a Pandas dataframe with the following format: year.month (ex: 2022.01 for January 2022). I have ~200,000 rows in the dataframe and I would basically like to randomly assign them a date, ranging from 2010.01 to 2020.12, how can I do this using Pandas? Ideally the dtype for this new column would be a float (I am trying to recreate a training example I found and this is how it has its date formatted).

Asked By: M. Y.

||

Answers:

Combine pandas.date_range and numpy.random.choice:

import numpy as np

dates = (pd.date_range('2010-01', '2020-12', freq='M')
           .strftime('%Y.%m').astype(float)
        )

N = 1000
df = pd.DataFrame({'date': np.random.choice(dates, size=N)})

print(df) 

NB. Using floats is a tricky choice as you cannot control the trailing zeros. 2010-Oct could appear as 2010.1.

Example:

        date
0    2015.03
1    2014.01
2    2014.06
3    2011.10
4    2010.11
..       ...
995  2018.07
996  2019.01
997  2015.05
998  2017.09
999  2016.03

[1000 rows x 1 columns]
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.