How do I create a list from a webpage?

Question:

I am attempting to create a list of words from the website text. I would like to be able to randomise the word that is produced from this list using random. I hope this makes sense.

import random as r
from bs4 import BeautifulSoup
import requests as rq


url = 'https://www.mit.edu/~ecprice/wordlist.10000'
page = rq.get(url)
soup = [BeautifulSoup(page.text, 'html.parser')]

print(r.choice(soup))

I tried this but I get the full list. I presume this is due to the fact that the website I am scraping from does not use breaks or anything else so I am unsure how to specify what to take from.

Asked By: Nyx

||

Answers:

There is no need of BeautifulSoup in this context, simply split() the text from the response into list.

Example

import random as r
import requests as rq


url = 'https://www.mit.edu/~ecprice/wordlist.10000'
word_list = rq.get(url).text.split()
print(r.choice(word_list))

If you really need to use BeautifulSoup you could get_text() and split():

word_list = BeautifulSoup(rq.get(url).text).get_text('n',strip=True).split()
Answered By: HedgeHog

If you use [BeautifulSoup(page.text, 'html.parser')], the entire document will be converted as single element of the list. Instead convert into string and then use string split method to convert to list.

import random as r
from bs4 import BeautifulSoup
import requests as rq


url = 'https://www.mit.edu/~ecprice/wordlist.10000'
page = rq.get(url)
soup = str(BeautifulSoup(page.text, 'html.parser'))
soup = soup.split('n')
print(r.choice(soup))

Note: I wanted to use the same approach you used so that you will understand the difference.

Answered By: Suramuthu R