Extract a list of values from a column in a pandas dataframe

Question:

I’m trying to extract a list of values from a column in a dataframe.

For example:

# dataframe with "num_fruit" column 
fruit_df = pd.DataFrame({"num_fruit": ['1 "Apple"', 
                                        '100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']})
# desired output: a list of values from the "num_fruit" column 
[['1 "Apple"'],
 ['100 "Peach Juice3"', '1234 "Not_fruit"', '23 "Straw-berry"', '2 "Orange"']]

Any suggestions? Thanks a lot.

What I’ve tried:

import re 

def split_fruit_val(val):
    return re.findall('(d+ ".+")', val)

result_list = []
for val in fruit_df['num_fruit']:
    result = split_fruit_val(val)
    result_list.append(result)

print(result_list) 
#output: some values were not split appropriately 
[['1 "Apple"'],
 ['100 "Peach Juice3" 1234 "Not_fruit" 23 "Straw-berry" 2 "Orange"']]
Asked By: Stella

||

Answers:

Lets split with positive lookahead for a number

fruit_df['num_fruit'].str.split(r's(?=d+)')

0                                          [1 "Apple"]
1    [100 "Peach Juice3", 1234 "Not_fruit", 23 "Str...
Name: num_fruit, dtype: object
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.