Select dataframe columns that starts with certain string and additional columns
Question:
I have a dataframe with columns: 'Id', 'Category', 'Shop', ....., 'Brandtxsu1', 'Brandxyw2', ...
I want to select columns: ID
, Category
, and start with Brand
. I can select the columns that start with Brand
using the following code, but how do I select ID
and Category
?
df[df.columns[pd.Series(df.columns).str.startswith('Brand')]]
Answers:
You can provide a list of columns you want to filter for:
cols = [c for c in df.columns if c.startswith('Brand') or c in ('Id', 'Category', ...)]
df[cols]
You can try join
with filter
out = df[['ID', 'Category']].join(df.filter(regex='^Brand'))
Solution
- Select the ‘Id’ and ‘Category’ columns from the dataframe.
- Select the columns from the dataframe whose column names start with ‘Brand’.
Join
them together.
Sample Code
import pandas as pd
df = pd.DataFrame({
'Id':['001', '002', '003', '004'],
'Category':['A', 'A', 'S', 'B'],
'Shop':['Shop1', 'Shop2', 'Shop3', 'Shop4'],
'Brandtxsu1':[1, 1, 1, 1],
'Brandxyw2':[2, 2, 2, 2]
})
df_output = df[['Id', 'Category']].join(df.loc[:, df.columns.str.startswith('Brand')])
print(df_output)
Output
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2
One option is with pd.filter:
df.filter(regex="Id|Category|Brand.+")
Out[23]:
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2
Another option is with pyjanitor select_columns:
# pip install pyjanitor
import pandas as pd
import janitor
df.select_columns('Id', 'Category', 'Brand*')
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2
I have a dataframe with columns: 'Id', 'Category', 'Shop', ....., 'Brandtxsu1', 'Brandxyw2', ...
I want to select columns: ID
, Category
, and start with Brand
. I can select the columns that start with Brand
using the following code, but how do I select ID
and Category
?
df[df.columns[pd.Series(df.columns).str.startswith('Brand')]]
You can provide a list of columns you want to filter for:
cols = [c for c in df.columns if c.startswith('Brand') or c in ('Id', 'Category', ...)]
df[cols]
You can try join
with filter
out = df[['ID', 'Category']].join(df.filter(regex='^Brand'))
Solution
- Select the ‘Id’ and ‘Category’ columns from the dataframe.
- Select the columns from the dataframe whose column names start with ‘Brand’.
Join
them together.
Sample Code
import pandas as pd
df = pd.DataFrame({
'Id':['001', '002', '003', '004'],
'Category':['A', 'A', 'S', 'B'],
'Shop':['Shop1', 'Shop2', 'Shop3', 'Shop4'],
'Brandtxsu1':[1, 1, 1, 1],
'Brandxyw2':[2, 2, 2, 2]
})
df_output = df[['Id', 'Category']].join(df.loc[:, df.columns.str.startswith('Brand')])
print(df_output)
Output
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2
One option is with pd.filter:
df.filter(regex="Id|Category|Brand.+")
Out[23]:
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2
Another option is with pyjanitor select_columns:
# pip install pyjanitor
import pandas as pd
import janitor
df.select_columns('Id', 'Category', 'Brand*')
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 A 1 2
2 003 S 1 2
3 004 B 1 2