Pandas – Find all rows with a specific value and keep all rows with matching column value

Question:

I have a dataframe with two columns, Date and Name. I want to search for a specific name and get the different Dates on which that name occurs. Then, I only want to keep those rows that have those dates.
This is my data:

Meeting Dates Name
1746-06-02 Sa Majesté (#p32)
1746-06-02 Maupertuis (#p4)
1746-06-02 Jordan (#p31)
1746-06-09 Maupertuis (#p4)
1746-06-09 Voltaire (#p37)
1746-06-09 de la Condamine (#p38)
1746-06-09 Süssmilch (#p16)
1746-06-09 Sa Majesté (#p32)
1746-06-09 Formey (#p27)
1746-06-16 Marggraf (#p20)
1746-06-23 Dohna (#p39)
1746-06-23 Euler (#p10)

I have used the following code to find all occurrences of my name:

df["Name"].value_counts()["Sa Majesté (#p32)"]

I know what to store the corresponding dates but I am unsure how. Once I have those dates, I plan to store the values and filter through my original dataframe and keep only those rows with those dates.

I would really appreciate any help/pointers with this.

Asked By: noahjb

||

Answers:

IIUC, is this what you’re looking for? a function that takes a name and return the filtered dataset?


def find_name(name):
    return df.loc[df['Name'].eq(name)]
find_name('Sa Majesté (#p32)')

OR


def find_name(name):
    return df.loc[df['Name'].str.contains(name, regex=False)] 
find_name('(#p32)')  

Meeting Dates   Name
0   1746-06-02  Sa Majesté (#p32)
7   1746-06-09  Sa Majesté (#p32)

OR

if you just need the list of dates


def find_name(name):
    return df.loc[df['Name'].eq(name)]['Meeting Dates'].to_list()

find_name('Sa Majesté (#p32)')
['1746-06-02 ', '1746-06-09 ']
Answered By: Naveed

You can filter on the string value, then get the unique for Meeting Dates column.

>>> df[df['Name'].eq("Sa Majesté (#p32)")]['Meeting Dates'].unique()

array(['1746-06-02', '1746-06-09'], dtype=object)

And if you want to do it for all names, then you can use groupby and get the unique dates:

>>> df.groupby('Name')['Meeting Dates'].agg('unique')

Name
Dohna (#p39)                          [1746-06-23]
Euler (#p10)                          [1746-06-23]
Formey (#p27)                         [1746-06-09]
Jordan (#p31)                         [1746-06-02]
Marggraf (#p20)                       [1746-06-16]
Maupertuis (#p4)          [1746-06-02, 1746-06-09]
Sa Majesté (#p32)        [1746-06-02, 1746-06-09]
Süssmilch (#p16)                     [1746-06-09]
Voltaire (#p37)                       [1746-06-09]
de la Condamine (#p38)                [1746-06-09]
Name: Meeting Dates, dtype: object
Answered By: ThePyGuy

My provided solution is not based on your specific dataset, however is valid for your problem. My hope is you are able to see the logic and implement in your own use case.

id name
1 ..
2 ..
3 ..

According to this general dataset, we are trying to find all rows with a certain value under the columns name "name".. then we take the id of those rows and then find ALL rows with those values.

import pandas as pd

df = pd.read_csv("MOCK_DATA.csv")

# FIND ALL ROWS THAT THE SECOND COLUMNS HAS NAME "Two-toed tree sloth"
tempOne = df[(df == "Two-toed tree sloth").any(axis=1)]

# SUM UP ALL OF THE ID'S THAT THIS IS TRUE
tempTwo = df[(df == "Two-toed tree sloth").any(axis=1)]["id"].to_list()

# OUTPUT ALL ROWS WITH THESE LIST ITEMS
df.loc[df["id"].isin(tempTwo)]
Answered By: sigh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.