Pandas for-loop with a list of columns

Question

I’m trying to open links in my dataframe using selenium webdriver, the dataframe ‘df1’ looks like this:

	user	repo1	repo2	repo3
0	breed	cs149-f22	kattis2canvas	grpc-maven-skeleton
1	GrahamDumpleton	mod_wsgi	wrapt	NaN

The links I want to open include the content in column ‘user’ and one of 3 ‘repo’ columns. I encounter a bug when I iterate the ‘repo’ columns.

Could anyone help me out? Thank you!

Here is my best try:

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row['repo_name']
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            driver.get(current_url)
            time.sleep(0.5)
        except:
            pass

Here is the bug I encounter:

KeyError: 'repo_name' 

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
~anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'repo_name'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-50-eb068230c3fd> in <module>
      4     user = row['user']
      5     for repo_name in repo_cols:
----> 6         repo = row['repo_name']
      7         current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
      8         driver.get(current_url)

~anaconda3libsite-packagespandascoreseries.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~anaconda3libsite-packagespandascoreseries.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'repo_name'

Asked By: Fred

||

Source

Answer 1

You’re getting the KeyError because there is no column named repro_name.
You need to replace row['repo_name'] with row[repo_name].

Try this :

import pandas as pd
from selenium import webdriver

df1= pd.DataFrame({'user': ['breed', 'GrahamDumpleton'],
 'repo1': ['cs149-f22', 'mod_wsgi'],
 'repo2': ['kattis2canvas', 'wrapt']})

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row[repo_name]
            browser=webdriver.Chrome()
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            browser.get(current_url)
            time.sleep(0.5)
        except:
            pass

Answered By: abokey

Answer 2

I think you should remove the quotation mark on the:

repo = row[‘repo_name’]

It should be:

repo = row[repo_name]

Answered By: Evan

Pandas for-loop with a list of columns

Question:

Answers: