Can I get a sub-DataFrame according to first letter in columns names?
Question:
I want to get only columns whose names start with 'Q1'
and those starting with 'Q3'
, I know that this is possible by doing:
new_df=df[['Q1_1', 'Q1_2', 'Q1_3','Q3_1', 'Q3_2', 'Q3_3']]
But since my real df
is too large (more than 70 variables) I search a way to get the new_df
by using only desired first letters in the columns titles.
My example dataframe is:
df=pd.DataFrame({
'Q1_1': [np.random.randint(1,100) for i in range(10)],
'Q1_2': np.random.random(10),
'Q1_3': np.random.randint(2, size=10),
'Q2_1': [np.random.randint(1,100) for i in range(10)],
'Q2_2': np.random.random(10),
'Q2_3': np.random.randint(2, size=10),
'Q3_1': [np.random.randint(1,100) for i in range(10)],
'Q3_2': np.random.random(10),
'Q3_3': np.random.randint(2, size=10),
'Q4_1': [np.random.randint(1,100) for i in range(10)],
'Q4_2': np.random.random(10),
'Q4_3': np.random.randint(2, size=10)
})
df
has the following display:
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q4_1 Q4_2 Q4_3
0 92 0.551722 1 36 0.063269 1 95 0.541573 1 91 0.521076 1
1 89 0.951076 1 82 0.853572 1 49 0.782290 1 98 0.232572 0
2 88 0.909953 1 19 0.544450 1 66 0.021061 1 51 0.951225 0
3 66 0.904642 1 17 0.727190 1 85 0.697792 0 35 0.412844 1
4 78 0.802783 1 23 0.634575 1 77 0.759861 0 55 0.460012 0
5 41 0.943271 1 63 0.460578 1 95 0.004986 1 89 0.970059 0
6 54 0.600558 0 18 0.031487 0 84 0.716314 0 84 0.636364 1
7 2 0.458006 0 95 0.029421 0 10 0.927356 1 27 0.031572 1
8 38 0.029658 1 30 0.125706 1 94 0.096702 1 32 0.241613 1
9 52 0.584300 1 85 0.026642 0 78 0.358952 0 70 0.696008 0
I want a simpler way to get the following sub-df:
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
0 92 0.551722 1 95 0.541573 1
1 89 0.951076 1 49 0.782290 1
2 88 0.909953 1 66 0.021061 1
3 66 0.904642 1 85 0.697792 0
4 78 0.802783 1 77 0.759861 0
5 41 0.943271 1 95 0.004986 1
6 54 0.600558 0 84 0.716314 0
7 2 0.458006 0 10 0.927356 1
8 38 0.029658 1 94 0.096702 1
9 52 0.584300 1 78 0.358952 0
Please if you need more detail let me know in comments,
Any help from your side will be highly appreciated.
Answers:
You may use list comprehension to get the desired column headers like this:
cols = [col for col in df.columns if col[:2] in ('Q1', 'Q3')]
new_df = df[cols].copy()
You can use pd.DataFrame.filter for this:
df.filter(regex = r'Q1_d|Q3_d')
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
0 5 0.631041 0 46 0.768563 0
1 32 0.594106 1 46 0.982396 1
2 78 0.703139 1 38 0.252107 0
3 98 0.353230 0 35 0.324079 0
4 77 0.913203 1 11 0.456287 0
5 62 0.565350 1 77 0.387365 0
6 38 0.975652 1 59 0.276421 1
7 97 0.505808 1 84 0.035756 0
8 15 0.525452 0 57 0.675310 1
9 94 0.545259 0 25 0.628030 0
I want to get only columns whose names start with 'Q1'
and those starting with 'Q3'
, I know that this is possible by doing:
new_df=df[['Q1_1', 'Q1_2', 'Q1_3','Q3_1', 'Q3_2', 'Q3_3']]
But since my real df
is too large (more than 70 variables) I search a way to get the new_df
by using only desired first letters in the columns titles.
My example dataframe is:
df=pd.DataFrame({
'Q1_1': [np.random.randint(1,100) for i in range(10)],
'Q1_2': np.random.random(10),
'Q1_3': np.random.randint(2, size=10),
'Q2_1': [np.random.randint(1,100) for i in range(10)],
'Q2_2': np.random.random(10),
'Q2_3': np.random.randint(2, size=10),
'Q3_1': [np.random.randint(1,100) for i in range(10)],
'Q3_2': np.random.random(10),
'Q3_3': np.random.randint(2, size=10),
'Q4_1': [np.random.randint(1,100) for i in range(10)],
'Q4_2': np.random.random(10),
'Q4_3': np.random.randint(2, size=10)
})
df
has the following display:
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q4_1 Q4_2 Q4_3
0 92 0.551722 1 36 0.063269 1 95 0.541573 1 91 0.521076 1
1 89 0.951076 1 82 0.853572 1 49 0.782290 1 98 0.232572 0
2 88 0.909953 1 19 0.544450 1 66 0.021061 1 51 0.951225 0
3 66 0.904642 1 17 0.727190 1 85 0.697792 0 35 0.412844 1
4 78 0.802783 1 23 0.634575 1 77 0.759861 0 55 0.460012 0
5 41 0.943271 1 63 0.460578 1 95 0.004986 1 89 0.970059 0
6 54 0.600558 0 18 0.031487 0 84 0.716314 0 84 0.636364 1
7 2 0.458006 0 95 0.029421 0 10 0.927356 1 27 0.031572 1
8 38 0.029658 1 30 0.125706 1 94 0.096702 1 32 0.241613 1
9 52 0.584300 1 85 0.026642 0 78 0.358952 0 70 0.696008 0
I want a simpler way to get the following sub-df:
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
0 92 0.551722 1 95 0.541573 1
1 89 0.951076 1 49 0.782290 1
2 88 0.909953 1 66 0.021061 1
3 66 0.904642 1 85 0.697792 0
4 78 0.802783 1 77 0.759861 0
5 41 0.943271 1 95 0.004986 1
6 54 0.600558 0 84 0.716314 0
7 2 0.458006 0 10 0.927356 1
8 38 0.029658 1 94 0.096702 1
9 52 0.584300 1 78 0.358952 0
Please if you need more detail let me know in comments,
Any help from your side will be highly appreciated.
You may use list comprehension to get the desired column headers like this:
cols = [col for col in df.columns if col[:2] in ('Q1', 'Q3')]
new_df = df[cols].copy()
You can use pd.DataFrame.filter for this:
df.filter(regex = r'Q1_d|Q3_d')
Q1_1 Q1_2 Q1_3 Q3_1 Q3_2 Q3_3
0 5 0.631041 0 46 0.768563 0
1 32 0.594106 1 46 0.982396 1
2 78 0.703139 1 38 0.252107 0
3 98 0.353230 0 35 0.324079 0
4 77 0.913203 1 11 0.456287 0
5 62 0.565350 1 77 0.387365 0
6 38 0.975652 1 59 0.276421 1
7 97 0.505808 1 84 0.035756 0
8 15 0.525452 0 57 0.675310 1
9 94 0.545259 0 25 0.628030 0