Convert pandas series of strings to a series of lists

Question:

For iinstance I have a dataframe as below

import pandas as pd
df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

    |col
-------------------|
0   |"AM RLC, F C" |
1   |"AM/F C"      |
2   |"DM"          |
3   |"D C"         |

My expected output is as following

    |col
----|-----------------------|
 0  |["AM", "RLC", "F", "C"]|
 1  |["AM", "F", "C"]       |
 2  |["DM" ]                |
 3  |["D", "C"]             |

",", "/" and "space" should be treated as delimiter,

The answers in this question do not answer my queries

Asked By: Macosso

||

Answers:

try this:

df["col"].apply(lambda x:x.replace(",","").replace("/"," ").split(" "))
Answered By: to_data

I would use str.split or str.findall:

df['col'] = df['col'].str.split('[s,/]+')

# or
df['col'] = df['col'].str.findall('w+')

Output:

               col
0  [AM, RLC, F, C]
1       [AM, F, C]
2             [DM]
3           [D, C]

Regex:

[s,/]+  # at least one of space/comma/slash with optional repeats

w+      # one or more word characters
Answered By: mozway

An one-liner that finds any punctuation in your string and replaces it with empty space. Then you can split the string and get a clean list:

import string

df['col'].str.replace(f'[{string.punctuation}]', ' ', regex=True).str.split().to_frame()
Answered By: ali bakhtiari

Apply a function on rows of col column to filter its content. In this case the function is written in lambda form.

import pandas as pd
import re

df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

df['col'] = df['col'].apply(lambda x: str(re.findall(r"[w']+", x)))

print(df.head())

output:

                       col
0  ['AM', 'RLC', 'F', 'C']
1         ['AM', 'F', 'C']
2                   ['DM']
3               ['D', 'C']
Answered By: Ali
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.