Merge Dataframe rows based on the date

Question:

I have a dataframe that looks like this, It has the name of the company, the date and the title of a headline that was published regarding that company on that day. There are multiple headlines published on that single day and every single one of those headlines take up a different row even for the same date.

What I wish to do is merge all the title rows as per the date, so the Title column would represent ALL the headlines that were published on the day.
I tried doing it, but just messed up my dataframe.

Any help will be greatly appreciated!

Asked By: eme

||

Answers:

You can groupby and aggregate:

from datetime import date

import pandas as pd

df = pd.DataFrame(
    {
        "company": ["GOOG", "GOOG", "META", "META"],
        "date": [
            date(2022, 6, 1),
            date(2022, 6, 1),
            date(2022, 6, 1),
            date(2022, 6, 2),
        ],
        "title": ["google good", "google bad", "meta good", "meta bad"],
    }
)

df.groupby(["company", "date"]).aggregate(list).reset_index()

gives

  company        date                      title
0    GOOG  2022-06-01  [google good, google bad]
1    META  2022-06-01                [meta good]
2    META  2022-06-02                 [meta bad]
Answered By: ljmc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.