Email Classifier to classify emails according to the time

Question:

I have to design a program that can classify emails as spam or nonspam using Python and Pandas.

I have done to classify the email as spam or nonspam according to the email’s subject. For my second task, I have to classify the emails as spam or nonspam according to the time. If the email gets received on (‘Friday and ‘Saturday’) it should be classified as spam. Otherwise nonspam. I literally don’t have any idea how to do that. I tried to search but ended up with nothing.

This is a screenshot from the excel file
Email Table.xlsx

import pandas as pd
ExcelFile = pd.read_excel(r'C:UsersDocumentsEmail Table.xlsx')
Subject = pd.DataFrame(ExcelFile, columns=['Subject'])

def spam(Subject):
A = len(ExcelFile[ExcelFile['Subject'].isnull()]) 
print("Number of spam emails ",A)
print(ExcelFile[ExcelFile['Subject'].isnull()]) 

spam(Subject)
Asked By: Rahaf

||

Answers:

There are a million ways you could do this, but this is how I would do it. I provided comments and some naming conventions simply for clarity which should allow you to take and modify as necessary to fit your specific needs

#All necessary imports
import pandas as pd
import numpy as np
import datetime
#Create same sample data (just made this up nothing specific)
data = {
    'From' : ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]'],
    'Subject' : ['Free Stuff', 'Buy Stuff', np.nan,'More Free Stuff', 'More Buy Stuff'],
    'Dates' : ['2022-05-18 01:00:00', '2022-05-18 03:00:00', '2022-05-19 08:00:00', '2022-05-20 01:00:00', '2022-05-21 10:00:00']
}

#Create a Dataframe with the data
df = pd.DataFrame(data)

#Set all nulls/nones/NaN to a blank string
df.fillna('', inplace = True)

#Set the Dates column to a date column with YYYY-MM-DD HH:MM:SS format
df['Dates'] = pd.to_datetime(df['Dates'], format = '%Y-%m-%d %H:%M:%S')

#Create a column that will identify the what day the Dates column is on
df['Day'] = df['Dates'].dt.day_name()

#Write a np.select() to determine if the Subject column is null or if the Day column is on Friday or Saturday

#This is where you specify which days are spam days
list_of_spam_days = ['Friday', 'Saturday']

#List of conditions to test of true or false (np.nan is equivilent of a null)
condition_list = [df['Subject'] == '', df['Day'].isin(list_of_spam_days)]

#Mirroring the condition_list from before what should happen if the condition is true
true_list = ['Spam', 'Spam']

#Make a new column to which holds all of the results of our condition and true lists
#The final 'Not Spam' is the default if the condition list was not satisfied
df['Spam or Not Spam'] = np.select(condition_list, true_list, 'Not Spam')
df
Answered By: ArchAngelPwn
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.