How to pivot a pandas df where each column header is a date and column has multiple values
Question:
I used python BeautifulSoup and selenium to extract data from Jira’s timesheet in order to get the log work by resource.
this is the result when print my dataframe :
Resources Hours
We1/2
Th2/2
aaa
8.0
8.0
bbb
8.0
8.0
ccc
8.0
8.0
but the result I want to establish is :
date
Resources
value
We1/2
aaa
8.0
We1/2
bbb
8.0
We1/2
ccc
8.0
Th2/2
aaa
8.0
Th2/2
bbb
8.0
Th2/2
ccc
8.0
is there a way to loop over dataframe headers and append the cell elements ?
here is the python script so far :
chromedriver_path = r"C:selinum driverschromedriver.exe"
driver = webdriver.Chrome(chromedriver_path)
# Login credentials
username = "username"
password = "pwd"
# Login to the website
driver.get("http://*******/login.jsp")
driver.find_element_by_id("login-form-username").send_keys(username)
driver.find_element_by_id("login-form-password").send_keys(password)
driver.find_element_by_id("login-form-submit").click()
# URL to retrieve table
url = "http://********/secure/projecttimesheet!project.jspa"
# Navigate to the URL
driver.get(url)
# Open the dropdown menu
dropdown_menu_button = driver.find_element(By.XPATH, '//button[@ng-init="ts.getFilterProject();"]')
dropdown_menu_button.click()
checkbox_div = driver.find_element(By.CLASS_NAME, "toggleProject")
checkbox_div.click()
# Click on the body of the page to close the dropdown menu
body = driver.find_element(By.TAG_NAME, "body")
body.click()
# Wait for the table to load
time.sleep(2)
resources_button = driver.find_element(By.ID, "sp-group-by-resources")
resources_button.click()
# Wait for the table to load
time.sleep(2)
# Parse the HTML content
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Close the browser
driver.close()
# Find the table element in the HTML
table = soup.find('table')
# Read the table data into a pandas dataframe, starting from the second row
df = pd.read_html(str(table), decimal=',', thousands='.', header=1)[0]
# Remove the last 3 rows
df = df.iloc[:-4]
# Remove the "Unnamed: 22", "∑ Hours", and "∑ Days" columns
df = df.drop(columns=["Unnamed: 1" , "Unnamed: 22", "∑ Hours", "∑ Days"])
# Replace NaN values with 0
df = df.fillna(0)
Answers:
I converted your first DataFrame into the 2nd df. I think it solves:
import pandas as pd
Preparing data:
header = 'Resources Hours,We1/2,Th2/2'.split(',')
header
d = (('aaa', 8.0, 8.0), ('bbb', 8.0, 8.0), ('ccc', 8.0, 8.0))
d1 = pd.DataFrame(columns = header, data = d)
I reproduced your first df in d1
So I prepared the result, beginning by header:
header2 = ('date Resources value').split()
So I converted the data into desired format:
d = [(wd, r, x) for wd in header[1:3] for x, r in zip(d1[wd],d1['Resources Hours'])]
Put d into df:
d2 = pd.DataFrame(columns = header2, data = d)
d2
I used python BeautifulSoup and selenium to extract data from Jira’s timesheet in order to get the log work by resource.
this is the result when print my dataframe :
Resources Hours | We1/2 | Th2/2 |
---|---|---|
aaa | 8.0 | 8.0 |
bbb | 8.0 | 8.0 |
ccc | 8.0 | 8.0 |
but the result I want to establish is :
date | Resources | value |
---|---|---|
We1/2 | aaa | 8.0 |
We1/2 | bbb | 8.0 |
We1/2 | ccc | 8.0 |
Th2/2 | aaa | 8.0 |
Th2/2 | bbb | 8.0 |
Th2/2 | ccc | 8.0 |
is there a way to loop over dataframe headers and append the cell elements ?
here is the python script so far :
chromedriver_path = r"C:selinum driverschromedriver.exe"
driver = webdriver.Chrome(chromedriver_path)
# Login credentials
username = "username"
password = "pwd"
# Login to the website
driver.get("http://*******/login.jsp")
driver.find_element_by_id("login-form-username").send_keys(username)
driver.find_element_by_id("login-form-password").send_keys(password)
driver.find_element_by_id("login-form-submit").click()
# URL to retrieve table
url = "http://********/secure/projecttimesheet!project.jspa"
# Navigate to the URL
driver.get(url)
# Open the dropdown menu
dropdown_menu_button = driver.find_element(By.XPATH, '//button[@ng-init="ts.getFilterProject();"]')
dropdown_menu_button.click()
checkbox_div = driver.find_element(By.CLASS_NAME, "toggleProject")
checkbox_div.click()
# Click on the body of the page to close the dropdown menu
body = driver.find_element(By.TAG_NAME, "body")
body.click()
# Wait for the table to load
time.sleep(2)
resources_button = driver.find_element(By.ID, "sp-group-by-resources")
resources_button.click()
# Wait for the table to load
time.sleep(2)
# Parse the HTML content
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Close the browser
driver.close()
# Find the table element in the HTML
table = soup.find('table')
# Read the table data into a pandas dataframe, starting from the second row
df = pd.read_html(str(table), decimal=',', thousands='.', header=1)[0]
# Remove the last 3 rows
df = df.iloc[:-4]
# Remove the "Unnamed: 22", "∑ Hours", and "∑ Days" columns
df = df.drop(columns=["Unnamed: 1" , "Unnamed: 22", "∑ Hours", "∑ Days"])
# Replace NaN values with 0
df = df.fillna(0)
I converted your first DataFrame into the 2nd df. I think it solves:
import pandas as pd
Preparing data:
header = 'Resources Hours,We1/2,Th2/2'.split(',')
header
d = (('aaa', 8.0, 8.0), ('bbb', 8.0, 8.0), ('ccc', 8.0, 8.0))
d1 = pd.DataFrame(columns = header, data = d)
I reproduced your first df in d1
So I prepared the result, beginning by header:
header2 = ('date Resources value').split()
So I converted the data into desired format:
d = [(wd, r, x) for wd in header[1:3] for x, r in zip(d1[wd],d1['Resources Hours'])]
Put d into df:
d2 = pd.DataFrame(columns = header2, data = d)
d2