data proccesing to make total payment python

Question:

Nama No.ID Tgl/Waktu No.PIN Kode Verifikasi
Alif 100061 17/12/2022 07:53:26 Sidik Jari
Alif 100061 17/12/2022 13:00:25 Sidik Jari
Alif 100061 19/12/2022 07:54:59 Sidik Jari
Alif 100061 19/12/2022 16:18:14 Sidik Jari
Alif 100061 20/12/2022 07:55:54 Sidik Jari
Alif 100061 20/12/2022 16:16:16 Sidik Jari
Alif 100061 21/12/2022 07:54:46 Sidik Jari
Alif 100061 21/12/2022 16:15:41 Sidik Jari
Alif 100061 22/12/2022 07:55:54 Sidik Jari
Alif 100061 22/12/2022 16:15:59 Sidik Jari
Alif 100061 23/12/2022 07:56:26 Sidik Jari
Alif 100061 23/12/2022 16:16:56 Sidik Jari
budi 100063 17/12/2022 07:45:28 Sidik Jari
budi 100063 17/12/2022 13:00:23 Sidik Jari
budi 100063 19/12/2022 07:39:29 Sidik Jari
budi 100063 19/12/2022 16:17:37 Sidik Jari
budi 100063 20/12/2022 13:13:06 Sidik Jari
budi 100063 20/12/2022 16:16:14 Sidik Jari
budi 100063 21/12/2022 07:39:54 Sidik Jari
budi 100063 21/12/2022 16:15:38 Sidik Jari
budi 100063 22/12/2022 07:39:02 Sidik Jari
budi 100063 22/12/2022 16:15:55 Sidik Jari
budi 100063 23/12/2022 07:41:13 Sidik Jari
budi 100063 23/12/2022 16:16:25 Sidik Jari

so from that raw excel file i want to make a app to output the waging system

!pip install xlrd
import pandas as pd
from datetime import time, timedelta
import openpyxl

from google.colab import drive
drive.mount('/content/drive')

# Read the Excel file
path = '/content/drive/MyDrive/Colab Notebooks/Book1.xlsx'
df = pd.read_excel(path)


# Convert the 'Tgl/Waktu' column to datetime format
df['Tgl/Waktu'] = pd.to_datetime(df['Tgl/Waktu'])

# Extract the date and time from the 'Tgl/Waktu' column
df['Date'] = df['Tgl/Waktu'].dt.date
df['Time'] = df['Tgl/Waktu'].dt.time

# Group the data by employee name and date
grouped_df = df.groupby(['Nama', 'Date'])

# Set the overtime threshold to 16:30:00
overtime_threshold = time(hour=16, minute=30)

# Iterate over the grouped data
for (name, date), group in grouped_df:
    # Calculate the total work hours and overtime hours for each employee on each day
    start_time = group['Time'].min()
    end_time = group['Time'].max()
    total_hours = (timedelta(hours=end_time.hour, minutes=end_time.minute, seconds=end_time.second) - 
                   timedelta(hours=start_time.hour, minutes=start_time.minute, seconds=start_time.second)).total_seconds() / 3600
    if total_hours > 8:
        hours_worked = 8
        if end_time > overtime_threshold:
          overtime_hours += (end_time - overtime_threshold).total_seconds() / 3600
    else:
        hours_worked = total_hours
        overtime_hours = 0
    if end_time > overtime_threshold:
        overtime_hours += (end_time - overtime_threshold).total_seconds() / 3600
    # Calculate the payment for each employee on each day
    payment_each_date = 75000 * hours_worked + 50000 * overtime_hours
    
       # Add the total work hours, overtime hours, and payment as new columns to the dataframe
    df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Hours Worked'] = hours_worked
    df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Overtime Hours'] = overtime_hours
    df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Payment Each Date'] = payment_each_date

# Print the resulting dataframe
print(df)

# write DataFrame to excel
df.to_excel(excel_writer=r'/content/drive/MyDrive/Colab Notebooks/test.xlsx')

from that code i want to add another column that is total payment, i have an idea to add payment each date thats grouped by name and with different date

how do i do that?

i already try used groupby nama then sum the payment each day but its still error

Asked By: Akazadi

||

Answers:

Updated Answer

If you want to remove duplicate date rows prior to calculating the total payment for each Nama value, then first group by Nama and date, take the max value (min is the same as the two values are equal), group that result by Nama and sum the Payment Each Date values. This produces a series with the total payment for each Nama, s_total, which can then be joined to the original dataframe to obtain the desired Total Payment column:

s_total = df.groupby(['Nama', 'Date'])['Payment Each Date'].max().groupby('Nama').sum().rename('Total Payment')
df = df.merge(s_total, how='left', on='Nama')

Left join is safer as that will always keep all rows of df just in case.

Updated Output df:

enter image description here

Original Answer

If you just want to sum all the Payment Each Date values in a new column, then add the following line of code after the for loop code block and before the line that prints the df:

df['Total Payment'] = df.groupby('Nama')['Payment Each Date'].transform('sum')

The transform takes care of aligning the indices of the series containing the sum for each Nama value with the original df.

Original Output df:
enter image description here

This SO post has some similar answers: How do I create a new column from the output of pandas groupby().sum()?

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.