Faster solution for checking if value is already in another dataframe with numpy


this is my code right now

for i, row in gdf_pot.iterrows():
        if row.ID not in historized.ID.to_list():
  [i, "FLAG_NEW"] = 1
  [i, "FLAG_NEW"] = 0

it’s very slow, because the dataframe is very big.

I saw some solutions with np.where but I could’nt make it work.

Maybe you have some ideas?


Asked By: CodeoDE



One way is to use boolean indexing with pandas.Series.isin and pandas.Series.astype.

The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame

gdf_pot["FLAG_NEW"] = gdf_pot["ID"].isin(historized["ID"]).astype(int)

NB : True behaves as 1 and False as 0 in Python. So, when we call astype(int), the boolean Series returned by isin() is mapped implicitly to those two numbers.

Answered By: Timeless
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.