Error on Getting Title for URL in Dataframe (Pandas / Python)

Question:

I’m trying to get the webpage titles for a column of URLs in a dataframe.

Using:

from urllib.request import urlopen
from bs4 import BeautifulSoup

def geturl(x):
    return (BeautifulSoup(urlopen(x)).title.get_text())

geturl('https://msn.com')

Returns:
‘MSN | Outlook, Office, Skype, Bing, Breaking News, and Latest Videos’

However, when actually working with a dataframe:

data = [['1001','https://msn.com'],['1002','https://google.com'],['1003','https://yahoo.com']]
df = pd.DataFrame(data, columns=['ID', 'URL'])
df

ID  URL
0   1001    https://msn.com
1   1002    https://google.com
2   1003    https://yahoo.com

df['title'] = df['url'].apply(geturl())

Results in an error. Any help would be greatly appreciated.

Asked By: 120m256

||

Answers:

When I try to run your script I get below error:

  File "C:UsersuserPycharmProjectstesttest.py", line 235, in <module>
    df['title'] = df['url'].apply(geturl())
  File "C:UsersuserPycharmProjectstestvenvlibsite-packagespandascoreframe.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:UsersuserPycharmProjectstestvenvlibsite-packagespandascoreindexesbase.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'url'

At your DF you setup column as URL but at below line you call with df["url"]

df['title'] = df['url'].apply(geturl())

Since its key sensitive its generating KeyError

Answered By: Lanre
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.