Fastest way of replacing strings with its counterpart

Question:

I need to replace acronym slangs within a string to its expanded part. The dataset for the slang I use is this one with over 3k items. This is my current code for the process:

import pandas as pd

slangs = pd.read_csv('slang.csv', index_col=[0])

def expand_slang_acronyms():
    word_list = 'foo brb bar'.split(' ')
    for i in range(len(word_list)):
        for j in range(len(slangs)):
            if word_list[i] == slangs.loc[j, 'acronym']:
                word_list[i] = slangs.loc[j, 'expansion']

    print(' '.join(word_list)) # 'foo be right back bar'

Running it as is is quite fast but I need to replace thousands of strings. Timing the code executing just 100 times:

from timeit import timeit
timeit(expand_slang_acronyms, number=100)

In this instance it output 6.519681000005221 which is really slow considering it’s only 100 times. I need a faster way to do this.

Asked By: teduniq

||

Answers:

I think there are many ways to do this. Here is one way to speed up the process.

import pandas as pd

slangs = pd.read_csv('slang.csv')
slang_dict = dict(zip(slangs['acronym'], slangs['expansion']))

def expand_slang_acronyms():
    word_list = 'foo brb bar'.split(' ')
    for i in range(len(word_list)):
        if word_list[i] in slang_dict:
            word_list[i] = slang_dict[word_list[i]]

    print(' '.join(word_list)) # 'foo be right back bar'

timeit(expand_slang_acronyms, number=100)

This should result in a performance boost, as dictionary lookups are O(1) on average, compared to O(n) for iterating through a DF.

Answered By: Gihan
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.