Pandas – rearrange columns if column exists
Question:
I have a dataframes that are in jumbled order how can I rearrange the columns if they exits.
One Two Three Six Four Five
1 2 3 6 4 5
1 2 3 6 4 5
...
How can I arrange this columns in order? The issue here is that all six column might not be there in all occasions. So I need a simple line that can arrange it in order of One Two Three Four Five Six
if that column exits. I mean if Two is not in df then it should be One Three Four Five Six
Answers:
You can change order by DataFrame.reindex
and then remove only missing values columns:
df1 = (df.reindex(['One', 'Two','Three','Four','Five','Six'], axis=1)
.dropna(how='all', axis=1))
print (df1)
One Three Four Five Six
0 1 3 4 5 6
1 1 3 4 5 6
Or is possible create order categoricals in columns and then sorting columns:
c = ['One', 'Two','Three','Four','Five','Six']
df.columns = pd.CategoricalIndex(df.columns, categories=c, ordered=True)
df1 = df.sort_index(axis=1)
print (df1)
One Three Four Five Six
0 1 3 4 5 6
1 1 3 4 5 6
That depends on the columns’ names. If they are numbers, it’s easy: just capture the columns, use the built-in sorted()
function to put them in order and them change the column
attribute of the data-frame. This is a one-liner:
df.columns = sorted(df.columns)
If the columns’ names are words, than it’s a little more complex. You need to make a map that will associate numbers and their values in order. For example:
def word_to_number(word):
map = {
"one": 1
"two" : 2
...
}
return map[word.lower()]
df.columns = sorted(df.columns, key=word_to_number)
See here for more information on sorted()
: https://docs.python.org/3/howto/sorting.html.
Instead of writing the map itself (maybe that’s difficult if your DF has too many columns or if you are not sure which columns you will be working on), you can use a library that does that for you, such as: https://pypi.org/project/word2number/
Use intersection
cols = ["One", "Two", "Three", "Four", "Five", "Six"]
new_column = (pd.Index(cols).intersection(df.columns, sort=False)
new_df = df[new_column]
ref post: Pandas select columns ordered at the beginning and the rest remain unchanged
I have a dataframes that are in jumbled order how can I rearrange the columns if they exits.
One Two Three Six Four Five
1 2 3 6 4 5
1 2 3 6 4 5
...
How can I arrange this columns in order? The issue here is that all six column might not be there in all occasions. So I need a simple line that can arrange it in order of One Two Three Four Five Six
if that column exits. I mean if Two is not in df then it should be One Three Four Five Six
You can change order by DataFrame.reindex
and then remove only missing values columns:
df1 = (df.reindex(['One', 'Two','Three','Four','Five','Six'], axis=1)
.dropna(how='all', axis=1))
print (df1)
One Three Four Five Six
0 1 3 4 5 6
1 1 3 4 5 6
Or is possible create order categoricals in columns and then sorting columns:
c = ['One', 'Two','Three','Four','Five','Six']
df.columns = pd.CategoricalIndex(df.columns, categories=c, ordered=True)
df1 = df.sort_index(axis=1)
print (df1)
One Three Four Five Six
0 1 3 4 5 6
1 1 3 4 5 6
That depends on the columns’ names. If they are numbers, it’s easy: just capture the columns, use the built-in sorted()
function to put them in order and them change the column
attribute of the data-frame. This is a one-liner:
df.columns = sorted(df.columns)
If the columns’ names are words, than it’s a little more complex. You need to make a map that will associate numbers and their values in order. For example:
def word_to_number(word):
map = {
"one": 1
"two" : 2
...
}
return map[word.lower()]
df.columns = sorted(df.columns, key=word_to_number)
See here for more information on sorted()
: https://docs.python.org/3/howto/sorting.html.
Instead of writing the map itself (maybe that’s difficult if your DF has too many columns or if you are not sure which columns you will be working on), you can use a library that does that for you, such as: https://pypi.org/project/word2number/
Use intersection
cols = ["One", "Two", "Three", "Four", "Five", "Six"]
new_column = (pd.Index(cols).intersection(df.columns, sort=False)
new_df = df[new_column]
ref post: Pandas select columns ordered at the beginning and the rest remain unchanged