Assign Date to Missing Cell Based on Dataframe
Question:
I have a dataframe as below. My question is, i want to assign the missing dates based on the first day of the Month available in the dataset. The dates will only be for a single month per file. It wont be multiple months in a single file. One cell will always have a date.
Please note that the files could be from a previous month so i cant use Today’s date. I have used
fillna(method='ffill')
But this only works if the first cell has a date in it. If its blank itll make the rest blank as well.
EDIT: The filled dates can be the first of the month or ANY previously populated dates in the column. As long as they are from the same month, it should be fine.
Sample Data:
Other Columns
Date
Invoice
Other Columns
…
ABC232
…
…
5-27-2022
ABC232
…
…
5-27-2022
ANBFN2323
…
…
SADNF343
…
…
1232HHH
…
Expected Output:
Other Columns
Date
Invoice
Other Columns
…
5-01-2022
ABC232
…
…
5-27-2022
ABC232
…
…
5-27-2022
ANBFN2323
…
…
5-01-2022
SADNF343
…
…
5-01-2022
1232HHH
…
Please let me know if you have any questions. I cant find anything to even get me started.
Thanks
Answers:
import re
# Get the index of the first non-NA Date
idx = df["Date"].first_valid_index()
# Replace the day with 01
fill_value = re.sub(r"(d+)-(d+)-(d+)", r"1-01-3", df.loc[idx, "Date"])
# Fill
df["Date"] = df["Date"].fillna(fill_value)
I have a dataframe as below. My question is, i want to assign the missing dates based on the first day of the Month available in the dataset. The dates will only be for a single month per file. It wont be multiple months in a single file. One cell will always have a date.
Please note that the files could be from a previous month so i cant use Today’s date. I have used
fillna(method='ffill')
But this only works if the first cell has a date in it. If its blank itll make the rest blank as well.
EDIT: The filled dates can be the first of the month or ANY previously populated dates in the column. As long as they are from the same month, it should be fine.
Sample Data:
Other Columns | Date | Invoice | Other Columns |
---|---|---|---|
… | ABC232 | … | |
… | 5-27-2022 | ABC232 | … |
… | 5-27-2022 | ANBFN2323 | … |
… | SADNF343 | … | |
… | 1232HHH | … |
Expected Output:
Other Columns | Date | Invoice | Other Columns |
---|---|---|---|
… | 5-01-2022 | ABC232 | … |
… | 5-27-2022 | ABC232 | … |
… | 5-27-2022 | ANBFN2323 | … |
… | 5-01-2022 | SADNF343 | … |
… | 5-01-2022 | 1232HHH | … |
Please let me know if you have any questions. I cant find anything to even get me started.
Thanks
import re
# Get the index of the first non-NA Date
idx = df["Date"].first_valid_index()
# Replace the day with 01
fill_value = re.sub(r"(d+)-(d+)-(d+)", r"1-01-3", df.loc[idx, "Date"])
# Fill
df["Date"] = df["Date"].fillna(fill_value)