Is there a way in Python/Pandas to use a generic variable name with a wildcard to select all similar columns?

Question:

In Stata, if I typed Week_*, it would select all columns Week_1, Week_2, etc. Is there a similar way to do this in Python/Pandas?

Code example, including last line for what I want to do.

# One-hot Encode Week: Create variables Week_1, Week_2, ... etc.
dt_temp0 = dt_temp0.join(pd.get_dummies(dt_temp0['Week'],prefix='Week'))

# Features to Use
feat_cols = ['lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp',
             'ClinicID_fac', 'Week_*']

x_train = dt_temp1.loc[dt_temp1['train'] == 1,feat_cols]
Asked By: Ren

||

Answers:

You could select your week columns with a list comprehension:

week_cols = [col for col in df_temp1.columns if col.startswith('Week_')]
feat_cols = ['lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp',
             'ClinicID_fac', *week_cols]

You can combine these into one line if you want.

Answered By: jprebys

I actually found another way to do this, as well… using filter(). Then you just have to concatenate the string arrays together. Thanks for all the help!

week_cols = dt_temp0.filter(regex = "Week_" ).columns.tolist()
feat_cols = ['ClinicID_fac','lag2_tfk_total','lag3_tfk_total','lag2_Trips_pp','lag3_Trips_pp'] + week_cols 
Answered By: Ren
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.