Python: How to assign column titles to each iteration of a for loop from an existing array
Question:
I am a beginner. I have a static array from which I’d like to pull its variables successively and assign them as the column title to each iteration of a for loop. For example, after the first loop, assign the first variable in the col_titles as the column title. After the second loop, assign the second variable in the col_titles as the column title, and so on. Here’s what I have going so far:
data = []
col_titles = ['30024`, '30033', '30038']
urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]
counter = 1
for url in urls:
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
try:
for h2 in h2s:
if counter <= 5:
data.append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data.append("None")
driver.close()
print(data)
Currently, the output is an array containing all the variables from each loop like so (with each h2 reflecting unique h2 titles from each url):
[h2, h2, h2, h2, h2, h2, h2, h2, None, None, h2, h2, h2, h2, None]
This is fine, as all I’ve done is append each iteration to the "data" array.
This is where I get stuck.
I think I should be creating a DataFrame within the for loop to grab a column title from the "col_titles" array, assigning it as a column title following (or preceding) each iteration of the for loop, but I don’t know how to do this properly. What I’m hoping to achieve is an output like the following:
30024 30033 30038
h2 h2 h2
h2 h2 h2
h2 h2 h2
h2 None h2
h2 None None
Any insight is very appreciated!
Answers:
Use collections.defaultdict and zip
function.
To get the result which is then passed to pandas DataFrame as columns/values it’s more convenient in your case to use a dictionary-like data structure.
Instead of data = []
initialize:
from collections import defaultdict
data = defaultdict(list)
Then you iterate over your urls
and accumulate values for each column separately:
for col, url in zip(*[col_titles, urls]):
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
try:
for h2 in h2s:
if counter <= 5:
data[col].append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data[col].append("None")
driver.close()
Eventually, when generating dataframe as pd.DataFrame(data)
you’ll get a structure like (similar) this:
30024 30033 30038
0 h2 h2 h2
1 h2 h2 h2
2 h2 h2 h2
3 h2 h2 h2
4 None None None
First you create dictionary, and add key from col_titles and assign value from each iteration which you get a list. And zip dictionary to dataframe-
Code will be something like –
col_titles = ['30024`, '30033', '30038']
urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]
counter = 1
ctr = 0
my_dict={}
for url in urls:
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
data = []
try:
for h2 in h2s:
if counter <= 5:
data.append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data.append("None")
driver.close()
ctr = ctr + 1
my_dict[col_titles[ctr]] = data
df = pd.DataFrame(my_dict)
print(df)
I am a beginner. I have a static array from which I’d like to pull its variables successively and assign them as the column title to each iteration of a for loop. For example, after the first loop, assign the first variable in the col_titles as the column title. After the second loop, assign the second variable in the col_titles as the column title, and so on. Here’s what I have going so far:
data = []
col_titles = ['30024`, '30033', '30038']
urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]
counter = 1
for url in urls:
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
try:
for h2 in h2s:
if counter <= 5:
data.append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data.append("None")
driver.close()
print(data)
Currently, the output is an array containing all the variables from each loop like so (with each h2 reflecting unique h2 titles from each url):
[h2, h2, h2, h2, h2, h2, h2, h2, None, None, h2, h2, h2, h2, None]
This is fine, as all I’ve done is append each iteration to the "data" array.
This is where I get stuck.
I think I should be creating a DataFrame within the for loop to grab a column title from the "col_titles" array, assigning it as a column title following (or preceding) each iteration of the for loop, but I don’t know how to do this properly. What I’m hoping to achieve is an output like the following:
30024 30033 30038
h2 h2 h2
h2 h2 h2
h2 h2 h2
h2 None h2
h2 None None
Any insight is very appreciated!
Use collections.defaultdict and zip
function.
To get the result which is then passed to pandas DataFrame as columns/values it’s more convenient in your case to use a dictionary-like data structure.
Instead of data = []
initialize:
from collections import defaultdict
data = defaultdict(list)
Then you iterate over your urls
and accumulate values for each column separately:
for col, url in zip(*[col_titles, urls]):
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
try:
for h2 in h2s:
if counter <= 5:
data[col].append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data[col].append("None")
driver.close()
Eventually, when generating dataframe as pd.DataFrame(data)
you’ll get a structure like (similar) this:
30024 30033 30038
0 h2 h2 h2
1 h2 h2 h2
2 h2 h2 h2
3 h2 h2 h2
4 None None None
First you create dictionary, and add key from col_titles and assign value from each iteration which you get a list. And zip dictionary to dataframe-
Code will be something like –
col_titles = ['30024`, '30033', '30038']
urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]
counter = 1
ctr = 0
my_dict={}
for url in urls:
driver.get(url)
h2s = driver.find_elements(By.TAG_NAME, 'h2')
data = []
try:
for h2 in h2s:
if counter <= 5:
data.append(h2.get_attribute("innerText"))
counter = counter + 1
except (ElementNotVisibleException, NoSuchElementException):
data.append("None")
driver.close()
ctr = ctr + 1
my_dict[col_titles[ctr]] = data
df = pd.DataFrame(my_dict)
print(df)