Python: most efficient way to categorize transactions

Question

I have a large list of transactions that I want to categorize.
It looks like this:

transactions: [
     {
        "id": "20200117-16045-0",
        "date": "2020-01-17",
        "creationTime": null,
        "text": "SuperB Vesterbro T 74637",
        "originalText": "SuperB Vesterbro T 74637",
        "details": null,
        "category": null,
        "amount": {
            "value": -160.45,
            "currency": "DKK"
        },
        "balance": {
            "value": 12572.68,
            "currency": "DKK"
        },
        "type": "Card",
        "state": "Booked"
    },
    {
        "id": "20200117-4800-0",
        "date": "2020-01-17",
        "creationTime": null,
        "text": "Rent        45228",
        "originalText": "Rent        45228",
        "details": null,
        "category": null,
        "amount": {
            "value": -48.00,
            "currency": "DKK"
        },
        "balance": {
            "value": 12733.13,
            "currency": "DKK"
        },
        "type": "Card",
        "state": "Booked"
    },
    {
        "id": "20200114-1200-0",
        "date": "2020-01-14",
        "creationTime": null,
        "text": "Superbest          86125",
        "originalText": "SUPERBEST          86125",
        "details": null,
        "category": null,
        "amount": {
            "value": -12.00,
            "currency": "DKK"
        },
        "balance": {
            "value": 12781.13,
            "currency": "DKK"
        },
        "type": "Card",
        "state": "Booked"
    }
]

I loaded in the data like this:

with open('transactions.json') as transactions:
    file = json.load(transactions)

data = json_normalize(file)['transactions'][0]
return pd.DataFrame(data)

And I have the following categories so far, I want to group the transactions by:

CATEGORIES = {
    'Groceries': ['SuperB', 'Superbest'],
    'Housing': ['Insurance', 'Rent']
}

Now I would like to loop through each row in the DataFrame and group each transaction.
I would like to do this, by checking if text contains one of the values from the CATEGORIES dictionary.

If so, that transaction should get categorized as the key of the CATEGORIES dictionary – for instance Groceries.

How do I do this most efficiently?

Asked By: Mathias Lund

||

Source

Answer 1

If I understand your requirement correctly.

we can create a pipe delimited list from your dictionary and do some assignment with .loc

print(df)
for k,v in CATEGORIES.items():
    pat = '|'.join(v)
    df.loc[df['text'].str.contains(pat),'category'] = k
print(df[['text','category']])
                       text   category
0  SuperB Vesterbro T 74637  Groceries
1         Rent        45228    Housing
2  Superbest          86125  Groceries

Python: most efficient way to categorize transactions

Question:

Answers: