Optimal way to drop unknown elements and rearrange to a specific sequence

Question:

For context, I wanted to find the most optimal/fastest way to remove unknown elements and rearranging them in numpy arrays [can be in any order]

i.e. I want to rearrange columns_2 to columns_1 and then use the rearranged indices on a numpy array

columns_1 = ["one", "two", "three", "four"]
columns_2 = ["one", "three", "four", "two", "extra"]

One solution I could come up with is like

column_idx = {col: idx for idx, col in enumerate(columns_2)}
array[:, [column_idx[col] for col in columns_1]]

Is there any better/faster alternative?
Note: Solution must fail if any of the elements of columns_1 is missing

Asked By: Kyo

||

Answers:

list.index(item) returns the first index that matches item in a list. You can do this:

[columns_2.index(col) for col in columns_1]

And then create your array. Since it won’t have to iterate over unneeded elements of columns_2 it will be faster. The size of the benefit will depend on the overlap of the lists.

Answered By: philosofool
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.