How to pivot a pandas df where each column header is a date and column has multiple values

Question

I used python BeautifulSoup and selenium to extract data from Jira’s timesheet in order to get the log work by resource.

this is the result when print my dataframe :

Resources Hours	We1/2	Th2/2
aaa	8.0	8.0
bbb	8.0	8.0
ccc	8.0	8.0

but the result I want to establish is :

date	Resources	value
We1/2	aaa	8.0
We1/2	bbb	8.0
We1/2	ccc	8.0
Th2/2	aaa	8.0
Th2/2	bbb	8.0
Th2/2	ccc	8.0

is there a way to loop over dataframe headers and append the cell elements ?

here is the python script so far :


chromedriver_path = r"C:selinum driverschromedriver.exe"

driver = webdriver.Chrome(chromedriver_path)

# Login credentials
username = "username"
password = "pwd"

# Login to the website
driver.get("http://*******/login.jsp")
driver.find_element_by_id("login-form-username").send_keys(username)
driver.find_element_by_id("login-form-password").send_keys(password)
driver.find_element_by_id("login-form-submit").click()

# URL to retrieve table
url = "http://********/secure/projecttimesheet!project.jspa"
# Navigate to the URL
driver.get(url)


# Open the dropdown menu
dropdown_menu_button = driver.find_element(By.XPATH, '//button[@ng-init="ts.getFilterProject();"]')
dropdown_menu_button.click()


checkbox_div = driver.find_element(By.CLASS_NAME, "toggleProject")
checkbox_div.click()

# Click on the body of the page to close the dropdown menu
body = driver.find_element(By.TAG_NAME, "body")
body.click()

# Wait for the table to load
time.sleep(2)

resources_button = driver.find_element(By.ID, "sp-group-by-resources")
resources_button.click()


# Wait for the table to load
time.sleep(2)

# Parse the HTML content
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Close the browser
driver.close()

# Find the table element in the HTML
table = soup.find('table')


# Read the table data into a pandas dataframe, starting from the second row
df = pd.read_html(str(table), decimal=',', thousands='.', header=1)[0]

# Remove the last 3 rows
df = df.iloc[:-4]

# Remove the "Unnamed: 22", "∑ Hours", and "∑ Days" columns
df = df.drop(columns=["Unnamed: 1" , "Unnamed: 22", "∑ Hours", "∑ Days"])

# Replace NaN values with 0
df = df.fillna(0)

Asked By: Achref Laayouni

||

Source

Answer 1

I converted your first DataFrame into the 2nd df. I think it solves:

import pandas as pd

Preparing data:

header = 'Resources Hours,We1/2,Th2/2'.split(',')
header
d = (('aaa', 8.0, 8.0), ('bbb', 8.0, 8.0), ('ccc', 8.0, 8.0))
d1 = pd.DataFrame(columns = header, data = d)

I reproduced your first df in d1

So I prepared the result, beginning by header:

header2 = ('date Resources value').split()

So I converted the data into desired format:

d = [(wd, r, x) for wd in header[1:3] for x, r in zip(d1[wd],d1['Resources Hours'])]

Put d into df:

d2 = pd.DataFrame(columns = header2, data = d)
d2

The output:

Answered By: Mário César Fracalossi Bais

How to pivot a pandas df where each column header is a date and column has multiple values

Question:

Answers: