Columns selection on specific text

Question:

I want to extract specific columns that contain specific names. Below you can see my data

import numpy as np
import pandas as pd


data = {
         'Names': ['Store (007) Total amount of Sales ',
                   'Store perc (65) Total amount of sales ',
                   'Mall store, aid (005) Total amount of sales',
                   'Increase in the value of sales / Additional seling (22) Total amount of sales',
                   'Dividends (0233) Amount of income tax',
                   'Other income (098) Total amount of Sales',
                   'Other income (0245) Amount of Income Tax',
                    ],
         'Sales':[10,10,9,7,5,5,5],
         
         
        }

df = pd.DataFrame(data, columns = ['Names',
                                   'Sales',
                             
                                   ])
df

This data have some specific columns that I need to be selected in the separate data frame. Keywords for this selection are words Total amount of Sales or Total amount of sales . These words are placed after the second brackets ). Also please take into account that text is no trimmed so empty spaces are possible.

enter image description here

So can anybody help me how to solve this ?

Asked By: silent_hunter

||

Answers:

Use Series.str.contains without test cases with case=False in boolean indexing:

df1 = df[df['Names'].str.contains('Total amount of Sales', case=False)]
print (df1)
                                               Names  Sales
0                 Store (007) Total amount of Sales      10
1             Store perc (65) Total amount of sales      10
2        Mall store, aid (005) Total amount of sales      9
3  Increase in the value of sales / Additional se...      7
5           Other income (098) Total amount of Sales      5

Or if need test sales or Sales use:

df2 = df[df['Names'].str.contains('Total amount of [Ss]ales')]
Answered By: jezrael

try this :

searchfor = ['Total amount of Sales','Total amount of sales']
df = df[df['Names'].str.contains('|'.join(searchfor))]
df.reset_index(inplace=True, drop=True)
print(df)
Answered By: Mohit
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.