Replace word starting with string anywhere in sentence in each row python

Question:

I am trying to replace words starting with string in text data stored in multiple rows of a dataframe. The df has 6 columns : date, user, tweet, language, coordinates, place. The replacement takes place in tweet column that has for example row 1 :

« Nous ne sommes pas très favorables au télétravail mais nous avons de super locaux tout neufs »
Il manque un babyfoot et les courtiers auront rejoint les rangs de la Start-up Nation https://link.

In row 2 : « Une étude de la Spire Healthcare a révélé, que le #télétravail pouvait avoir un impact sur le #cyclemenstruel, et ce, notamment à cause de la fatigue, du #stress et du manque d’activité physique induits par le travail à distance »

Via @marieclaire_fr https://link.

In row 3 : @IrisDessine @fredchristian__ Mais c’est super pratique car j’ai horreur de passer la serpillière le mien aspire et lave et franchement c’est devenu mon meilleur ami il bosse tranquillou quand je suis en télétravail

Etc.

I would like to replace words starting with ‘@’ by ‘@user’ and replace the links (word starting with ‘http’) by ‘http’. All the columns of the df are considered as object. I have tried multiple things :

for individual_word in df["Tweet"]:
#print(individual_word)
if individual_word.startswith('@') and len(individual_word) > 1:
    individual_word = '@user'

With this code nothing is happening, no error, no replacement. Another code :

for individual_word in df["Tweet"].split(' '):
#print(individual_word)
if individual_word.split(' ').startswith('@') and len(individual_word) > 1:
    individual_word = '@user'

With this code I have the error : ‘Series’ object has no attribute ‘split’. Another code :

for individual_word in df["Tweet"].str.split(' '):
#print(individual_word)
if individual_word.str.split(' ').startswith('@') and len(individual_word) > 1:
    individual_word = '@user'
    #print(individual_word)

With this code I have the error : ‘list’ object has no attribute ‘str’. I have tried to do the same when the column Tweet is converted as string but nothing changes. Depending on the code tried, I think each row is considered as a list so I have to look for word in list of list starting with ‘@’ and ‘http’ and replace them. Or, each row is considered as a word and not a sentence. So if the first word starts with ‘@’ it will be changed, but if the word starts with ‘@’ later in the sentence it won’t be changed.

I have also tried with list of list. I can have my data in a list called my_list and 3 columns Type, Size, Value. Row 1 of my_list in column Value :

[‘être’, ‘favorable’, ‘télétravail’, ‘super’, ‘local’, ‘neuf’, ‘manque’, ‘babyfoot’, ‘courtier’, ‘rejoindre’, ‘rang’, ‘start’, ‘nation’, ‘https://link’]

Row 2 of my_list in column Value :

[‘étude’, ‘spire’, ‘healthcare’, ‘révéler’, ‘télétravail’, ‘impact’, ‘cyclemenstruel’, ’cause’, ‘fatigue’, ‘stress’, ‘manque’, ‘activité’, ‘physique’, ‘induire’, ‘travail’, ‘distance’, ‘@marieclaire_fr’, ‘https://link’]

Row 3 of my_list in column Value :

[‘vive’, ‘télétravail’, ‘commeunlundi’, ‘https://link’]

I have tried the code :

for each_list in my_list:
#print(each_list)
for each_word in each_list:
    #print(each_word)
    if each_word.startswith('@') and len(each_word) > 1:
        #print(each_word)
        each_word = '@user'

I don’t have any errors but the word isn’t changed in each list of my_list.

Thank you for your help !

Asked By: MarionEtp

||

Answers:

try to save values in a new file, if word starts with @ save @user else save the word and then delete the original file. What’s happening now is that the variable individual_word is getting the value @user but it’s a variable hence no permanent change

Answered By: Mareek Roy

You can try with pandas string methods. Also have a look at regex 101 to check which regex works best for your case.

df['tweets'] = df['tweets'].str.replace('@S+', '@user')
>>>df['tweets']
    tweets
0   « Une étude de la Spire Healthcare a révélé, q...
1   @user @user Mais c'est super pratique car j'ai...
Answered By: Yolao_21