How to optimally find if "dataframe cell value" contains "cell value from another dataframe" and fill cell with it?
Question:
I have dataframe with 2 unequal columns:
One-word
Many-Words
Bird
Bird with no blood
Stone
Stone that killed the bird
Blood
Bird without brains
<none>
stone and blood
And i am trying to fill the new third column with all of the many-words that contain one-word. (5 or less)
So it would be like:
One-word
Many-Words
Many-Words with One-word
Bird
Bird with no blood
Bird with no blood, Stone that killed the bird, Bird without brains
Stone
Stone that killed the bird
Stone that killed the bird, stone and blood
Blood
Bird without brains
Bird without brains, Bird with no blood, stone and blood
<none>
stone and blood
I actually found a way, but it is very slow.
-
Go with loop in column ‘many-rows".
1.1 Within loop create a dictionary, where key is cell from "many-words" and value is list made with split
-
Go with loop in column "one-word"
2.1 Within loop create another loop in keys,values of dictionary in 1.1
2.2.Within these to loops check whether list from 1.1 contains word from one-word
2.3 If it does – concatenate corresponding cell in third column with the key of dictionary on a condition, that amount of concatenations is 5 or less.
I am actually looping through dataframe-column cells, and creating dicts and lists from it, which i read is very very bad.
I am novice in Python but i am pretty sure that my way is unholy.
There is got to be a better, faster, and cleaner way. Maybe something with vectorization?
Thank you!
Answers:
You can use iterrows
to loop over your df rows and build a list of Many-Words
containing One-word
:
df["Many-Words with One-word"] = pd.Series([
df[df["Many-Words"].str.lower().str.contains(row["One-word"].lower())]["Many-Words"].to_list()
for _, row in df.iterrows()
])
Note: using lower
to make the match case-insensitive.
Output:
One-word Many-Words Many-Words with One-word
0 Bird Bird with no blood [Bird with no blood, Stone that killed the bir...
1 Stone Stone that killed the bird [Stone that killed the bird, stone and blood]
2 Blood Bird without brains [Bird with no blood, stone and blood]
3 <none> stone and blood []
I have dataframe with 2 unequal columns:
One-word | Many-Words |
---|---|
Bird | Bird with no blood |
Stone | Stone that killed the bird |
Blood | Bird without brains |
<none> | stone and blood |
And i am trying to fill the new third column with all of the many-words that contain one-word. (5 or less)
So it would be like:
One-word | Many-Words | Many-Words with One-word |
---|---|---|
Bird | Bird with no blood | Bird with no blood, Stone that killed the bird, Bird without brains |
Stone | Stone that killed the bird | Stone that killed the bird, stone and blood |
Blood | Bird without brains | Bird without brains, Bird with no blood, stone and blood |
<none> | stone and blood |
I actually found a way, but it is very slow.
-
Go with loop in column ‘many-rows".
1.1 Within loop create a dictionary, where key is cell from "many-words" and value is list made with split
-
Go with loop in column "one-word"
2.1 Within loop create another loop in keys,values of dictionary in 1.1
2.2.Within these to loops check whether list from 1.1 contains word from one-word
2.3 If it does – concatenate corresponding cell in third column with the key of dictionary on a condition, that amount of concatenations is 5 or less.
I am actually looping through dataframe-column cells, and creating dicts and lists from it, which i read is very very bad.
I am novice in Python but i am pretty sure that my way is unholy.
There is got to be a better, faster, and cleaner way. Maybe something with vectorization?
Thank you!
You can use iterrows
to loop over your df rows and build a list of Many-Words
containing One-word
:
df["Many-Words with One-word"] = pd.Series([
df[df["Many-Words"].str.lower().str.contains(row["One-word"].lower())]["Many-Words"].to_list()
for _, row in df.iterrows()
])
Note: using lower
to make the match case-insensitive.
Output:
One-word Many-Words Many-Words with One-word
0 Bird Bird with no blood [Bird with no blood, Stone that killed the bir...
1 Stone Stone that killed the bird [Stone that killed the bird, stone and blood]
2 Blood Bird without brains [Bird with no blood, stone and blood]
3 <none> stone and blood []