How do I separate text after using BeautifulSoup in order to plot?

Question:

I am trying to make a program that scrapes the data from open insider and take that data and plot it. Open insider shows what insiders of the company are buying or selling the stock. I want to be able to show, in an easy to read format, what company, insider type and how much of the stock was purchased.
Here is my code so far:

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")

'''print(page.status_code)
checks to see if the page was downloaded successfully'''

soup = BeautifulSoup(page.content,'html.parser')
table = soup.find(class_="tinytable")
data = table.get_text()
#results = data.prettify
print(data, 'n')

Here is an example of some of the results:

X
Filing Date
Trade Date
Ticker
Company NameInsider NameTitle
Trade Type  
Price
Qty
Owned
ΔOwn
Value
1d
1w
1m
6m

2022-12-01 16:10:122022-11-30 AKUSAkouos, Inc.Kearny Acquisition Corp10%P – Purchase$12.50+29,992,668100-100%+$374,908,350
2022-11-30 20:57:192022-11-29 HHCHoward Hughes CorpPershing Square Capital Management, L.P.Dir, 10%P – Purchase$70.00+1,560,20515,180,369+11%+$109,214,243
2022-12-02 17:29:182022-12-02 IOVAIovance Biotherapeutics, Inc.Rothbaum Wayne P.DirP – Purchase$6.50+10,000,00018,067,333+124%+$65,000,000

However, for me each year starts a new line.

Is there a better way to use BeautifulSoup? Or is there an easy way to sort through this data and retrieve the specific information I am looking for? Thank You in advance I have been stuck on this for a while.

Asked By: Daniel

||

Answers:

To extract the specific information you are looking for from the data using BeautifulSoup, you can use the find_all() method to find all the rows of the table, and then iterate over each row to extract the relevant data. Here is an example of how you can do this:

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")
soup = BeautifulSoup(page.content, 'html.parser')

# Find the table with the insider purchase data
table = soup.find(class_="tinytable")

# Find all rows of the table
rows = table.find_all('tr')

# Loop through each row
for row in rows:
    # Extract the company name, insider name, and trade type from the row
     data = row.find_all("td")
    company = data[4].text if len(data) > 4 else "No company name"
    insider = data[5].text if len(data) > 5 else "No insider"
    trade_type = data[7].text if len(data) > 7 else "No trade type"
    # Print the extracted data
    print(f'Company: {company}, Insider: {insider}, Trade Type: {trade_type}')

This code will loop through each row of the table and extract the company name, insider name, and trade type from the row. You can modify this code to extract any other information you are interested in from the table.

Answered By: Julian Harkless

What Julian said then store values in a dict, load it into a Pandas dataframe and visualize it with plotly.express.

Answered By: Ty Batten

The real credit goes to @JulianHarkless. When they come back to update their question, I will take this down, but they deserve credit for this answer – they just didn’t parse the end properly.

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")
soup = BeautifulSoup(page.content, 'html.parser')

# Find the table with the insider purchase data
table = soup.find(class_="tinytable")

# Find all rows of the table
rows = table.find_all('tr')

# Loop through each row
for row in rows:
    # Extract the company name, insider name, and trade type from the row
    data = row.find_all("td")
    company = data[4].text if len(data) > 4 else "No company name"
    insider = data[5].text if len(data) > 5 else "No insider"
    trade_type = data[7].text if len(data) > 7 else "No trade type"
    # Print the extracted data
    print(f'Company: {company}, Insider: {insider}, Trade Type: {trade_type}')
Answered By: Shmack
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.