Python – getting src from a table cell

Question:

I have a table as below for which I want to export the text OR the src to a csv file.

<table class="GridView plm-table" id="pageLayout_projectTeamMembersGridView_gridView">

<tbody>

<tr id="pageLayout_projectTeamMembersGridView_gridView_headerRow" class="GridViewHeaderRow">
<th class="GridViewHeader" scope="col">A</th>
<th class="GridViewHeader" scope="col">B</th>
<th class="GridViewHeader" scope="col">C</th>
<th class="GridViewHeader" scope="col">D</th>
<th class="GridViewHeader" scope="col">E</th>
<th class="GridViewHeader" scope="col">F</th>
<th class="GridViewHeader" scope="col">G</th>
</tr>

<tr id="pageLayout_projectTeamMembersGridView_DataRow0" class="GridViewRow">
  <td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
  <td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
  <td class="GridViewCell">John</td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
<tr id="pageLayout_projectTeamMembersGridView_DataRow1" class="GridViewRow">
  <td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
  <td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
  <td class="GridViewCell">Steve</td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
<tr id="pageLayout_projectTeamMembersGridView_DataRow2" class="GridViewRow">
  <td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
  <td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
  <td class="GridViewCell">Mary</td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
  <td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
</tbody>
</table>

What I have done so far is:

table1 = soup.find('table', id = 'pageLayout_projectTeamMembersGrdView_gridView')

headers = []
for i in table1.find_all('th'):
    title = i.text.strip()
    headers.append(title)

df = pd.DataFrame(columns = headers)

for row in table1.find_all('tr')[1:]:
    data = row.find_all('td')
    row_data = [td.text.strip() for td in data]
    length = len(df)
    df.loc[length] = row_data

df.to_csv('Export.csv', index=False)
print("CSV created!")

I am getting the text value in the 3rd Column (C) but how can I get the src value as "0.png" or "1.png" in the corresponding columns (A, B, D, E and F) ?

this is what I get

This is what I want

Asked By: Herve

||

Answers:

The problem in the following code

data = row.find_all('td')
row_data = [td.text.strip() for td in data]
length = len(df)
df.loc[length] = row_data

is that td can have a text element, an img or some other element and you’re not checking that.

You can do something like

for row in table1.find_all('tr')[1:]:
    data = row.find_all('td')
    row_data = []
    for td in data:
        if (td.find("img")):
            row_data.append(td.img.attrs.get('src').split("/")[-1])
        else:
            row_data.append(td.text)
    length = len(df)
    df.loc[length] = row_data

This will output

A,B,C,D,E,F,G
,1.png,John,0.png,1.png,1.png,0.png
,1.png,Steve,1.png,1.png,0.png,0.png
,1.png,Mary,0.png,1.png,1.png,0.png

And A column is empty as expected since it only contains input type. But you can probably handle that case as well.

Answered By: d34n