Printing substring from one dataframe to another dataframe

Question:

In the df_input[‘Visit’ ] column, there are three different timepoints that I would like to extract and have print into a new dataframe (df_output). The time points are Pre, Post, and Screening.

data_input

data_output

I essentially would like to make a for loop (or just a single code strand) stating:

if data_input[‘Visit ‘] contains the word "Pre", print "Pre" in df_output[‘VISIT’]
elif data_input[‘Visit ‘] contains the word "Post", print "Post" in df_output[‘VISIT’]
else data_input[‘Visit ‘] contains the word "Screening", print "Screening" in df_output[‘VISIT’]

I am just not sure of the proper way to do that.

So far, the only thing I have is this line of code:

df_output[‘VISIT’] = df_input[df_input[‘Visit ‘].str.contains(‘Pr|Po|Sc’))

that gives the error message "Columns must be the same length as key"

I’ve also tried: df_output[‘VISIT’] = df_input[‘Visit ‘].str.contains(‘Pr|Po|Sc’), which prints True or False into my output dataframe.

Asked By: A-Simone

||

Answers:

df_output['VISIT'] = df_input['Visit'].apply(lambda x: 'Pre' if x.str.contains('Pre') else 'Post' if x.str.contains('Post') else 'Screening' if x.str.contains('Screening') else "")

You must create df_ouput first, tho.

Answered By: Rodrigo Guzman

you can use np.select

import numpy

df_output['VISIT'] = np.select( [df_input['Visit'].str.contain('Pr'), # condition #1
                                df_input['Visit'].str.contain('Po'),  # condition #2
                                df_input['Visit'].str.contain('Sc')], # condition #3
                                ['Pre','Post','Screening'], # corresponding value when true
                                '')# default value

Answered By: Naveed