Tbody data showing "None" when scraped?

Question

Last time I ran into this issue, adding the Header info fixed — doesn’t seem to be the case here. Trying different methods, but ultimately my goal is to scrape the info from all of the tables on each of the links listed.

It’s coming up as tbody data — specifically class: table-responsive.xs (I think).

I’ve tried taking all of the tbody data, and also just this class, but I’m not getting any result except a list of "none" values.

Is there another approach? I hoped adding the Header in was the solution, but doesn’t seem to be.

from requests_html import HTMLSession
from bs4 import BeautifulSoup
profiles = []
session = HTMLSession()

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}

urls = [
    'https://magicseaweed.com/New-Jersey-Monmouth-County-Surfing/277/',
    'https://magicseaweed.com/New-Jersey-Ocean-City-Surfing/279/'
]
for url in urls:
    r = session.get(url)
    # wait for 3s until the page fully loaded
    r.html.render(sleep=3, timeout=20)
    soup = BeautifulSoup(r.html.raw_html, "html.parser")
    for profile in soup.find_all('div', attrs={"class": "table-responsive.xs"}):
        profiles.append(profile)
for p in profiles:
    print(p)

Also tried:

from requests_html import HTMLSession
from bs4 import BeautifulSoup
profiles = []
session = HTMLSession()

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}

urls = [
    'https://magicseaweed.com/New-Jersey-Monmouth-County-Surfing/277/',
    'https://magicseaweed.com/New-Jersey-Ocean-City-Surfing/279/'
]
for url in urls:
    r = session.get(url)
    # wait for 3s until the page fully loaded
    r.html.render(sleep=3, timeout=20)
    soup = BeautifulSoup(r.html.raw_html, "html.parser")
    for profile in soup.find_all('a'):
        profile = profile.get('tbody')
        profiles.append(profile)
for p in profiles:
    print(p)

Lastly –

With someone’s great guidance here, I am separately able to pull the full json data with this script below:

import requests
import pandas as pd
import json


r = requests.get('https://magicseaweed.com/api/mdkey/spot?&limit=-1')
df = pd.DataFrame(r.json()).to_csv('out.csv', index=False)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)


print(df)

However, being that I live in NJ, I only really care about the NJ waves. I used an HREF scrape to get the URLs I’d like to see data for. Ideally, I could pull a week’s worth of info, but if the day is the only possible option, I’ll survive.

I tried including an if statement that only focuses on specific URLs (it is in the JSON data), but not having luck. Ultimately I want to add an OR to include all of the links listed, unless someone has a better idea?

I know I could easily match them once extracted, but I don’t want to run 9,000 rows every time, when I only need a select few.

import requests
import pandas as pd
import json


r = requests.get('https://magicseaweed.com/api/mdkey/spot?&limit=-1')
df = pd.DataFrame(r.json()).to_csv('out.csv', index=False)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)

for d in df:
    if d and '/Belmar-Surf-Report/3683' in df:
        print(d)


# '/Belmar-Surf-Report/3683'
# '/Manasquan-Surf-Report/386/'
# '/Ocean-Grove-Surf-Report/7945/'
# '/Asbury-Park-Surf-Report/857/'
# '/Avon-Surf-Report/4050/'
# '/Bay-Head-Surf-Report/4951/'
# '/Belmar-Surf-Report/3683/'
# '/Boardwalk-Surf-Report/9183/'
# '/Bradley-Beach-Surf-Report/7944/'
# '/Casino-Surf-Report/9175/'
# '/Deal-Surf-Report/822/'
# '/Dog-Park-Surf-Report/9174/'
# '/Jenkinsons-Surf-Report/4053/'
# '/Long-Branch-Surf-Report/7946/'
# '/Long-Branch-Surf-Report/7947/'
# '/Manasquan-Surf-Report/386/'
# '/Monmouth-Beach-Surf-Report/4055/'
# '/Ocean-Grove-Surf-Report/7945/'
# '/Point-Pleasant-Surf-Report/7942/'
# '/Sea-Girt-Surf-Report/7943/'
# '/Spring-Lake-Surf-Report/7941/'
# '/The-Cove-Surf-Report/385/'
# '/Belmar-Surf-Report/3683/'
# '/Avon-Surf-Report/4050/'
# '/Deal-Surf-Report/822/'
# '/North-Street-Surf-Report/4946/'
# '/Margate-Pier-Surf-Report/4054/'
# '/Ocean-City-NJ-Surf-Report/391/'
# '/7th-St-Surf-Report/7918/'
# '/Brigantine-Surf-Report/4747/'
# '/Brigantine-Seawall-Surf-Report/4942/'
# '/Crystals-Surf-Report/4943/'
# '/Longport-32nd-St-Surf-Report/1158/'
# '/Margate-Pier-Surf-Report/4054/'
# '/North-Street-Surf-Report/4946/'
# '/Ocean-City-NJ-Surf-Report/391/'
# '/South-Carolina-Ave-Surf-Report/4944/'
# '/St-James-Surf-Report/7917/'
# '/States-Avenue-Surf-Report/390/'
# '/Ventnor-Pier-Surf-Report/4945/'
# '/14th-Street-Surf-Report/9055/'
# '/18th-St-Surf-Report/9056/'
# '/30th-St-Surf-Report/9057/'
# '/56th-St-Surf-Report/9059/'
# '/Diamond-Beach-Surf-Report/9061/'
# '/Strathmere-Surf-Report/7919/'
# '/The-Cove-Surf-Report/7921/'
# '/14th-Street-Surf-Report/9055/'
# '/18th-St-Surf-Report/9056/'
# '/30th-St-Surf-Report/9057/'
# '/56th-St-Surf-Report/9059/'
# '/Avalon-Surf-Report/821/'
# '/Diamond-Beach-Surf-Report/9061/'
# '/Nuns-Beach-Surf-Report/7948/'
# '/Poverty-Beach-Surf-Report/4056/'
# '/Sea-Isle-City-Surf-Report/1281/'
# '/Stockton-Surf-Report/393/'
# '/Stone-Harbor-Surf-Report/7920/'
# '/Strathmere-Surf-Report/7919/'
# '/The-Cove-Surf-Report/7921/'
# '/Wildwood-Surf-Report/392/'

//or can use the SurfIDs:

3683
386
7945
857
4050
4951
3683
9183
7944
9175
822
9174
4053
7946
7947
386
4055
7945
7942
7943
7941
385
3683
4050
822
4946
4054
391
7918
4747
4942
4943
1158
4054
4946
391
4944
7917
390
4945
9055
9056
9057
9059
9061
7919
7921
9055
9056
9057
9059
821
9061
7948
4056
1281
393
7920
7919
7921
392

Asked By: Anthony Madle

||

Source

Answer 1

EDIT: Given you confirmed your links list (and they remain static, do not change), you can check all of them daily like this:

import requests
import pandas as pd
from bs4 import BeautifulSoup


id_list = [
'/Belmar-Surf-Report/3683',
'/Manasquan-Surf-Report/386/',
'/Ocean-Grove-Surf-Report/7945/',
'/Asbury-Park-Surf-Report/857/',
'/Avon-Surf-Report/4050/',
'/Bay-Head-Surf-Report/4951/',
'/Belmar-Surf-Report/3683/',
'/Boardwalk-Surf-Report/9183/',
'/Bradley-Beach-Surf-Report/7944/',
'/Casino-Surf-Report/9175/',
'/Deal-Surf-Report/822/',
'/Dog-Park-Surf-Report/9174/',
'/Jenkinsons-Surf-Report/4053/',
'/Long-Branch-Surf-Report/7946/',
'/Long-Branch-Surf-Report/7947/',
'/Manasquan-Surf-Report/386/',
'/Monmouth-Beach-Surf-Report/4055/',
'/Ocean-Grove-Surf-Report/7945/',
'/Point-Pleasant-Surf-Report/7942/',
'/Sea-Girt-Surf-Report/7943/',
'/Spring-Lake-Surf-Report/7941/',
'/The-Cove-Surf-Report/385/',
'/Belmar-Surf-Report/3683/',
'/Avon-Surf-Report/4050/',
'/Deal-Surf-Report/822/',
'/North-Street-Surf-Report/4946/',
'/Margate-Pier-Surf-Report/4054/',
'/Ocean-City-NJ-Surf-Report/391/',
'/7th-St-Surf-Report/7918/',
'/Brigantine-Surf-Report/4747/',
'/Brigantine-Seawall-Surf-Report/4942/',
'/Crystals-Surf-Report/4943/',
'/Longport-32nd-St-Surf-Report/1158/',
'/Margate-Pier-Surf-Report/4054/',
'/North-Street-Surf-Report/4946/',
'/Ocean-City-NJ-Surf-Report/391/',
'/South-Carolina-Ave-Surf-Report/4944/',
'/St-James-Surf-Report/7917/',
'/States-Avenue-Surf-Report/390/',
'/Ventnor-Pier-Surf-Report/4945/',
'/14th-Street-Surf-Report/9055/',
'/18th-St-Surf-Report/9056/',
'/30th-St-Surf-Report/9057/',
'/56th-St-Surf-Report/9059/',
'/Diamond-Beach-Surf-Report/9061/',
'/Strathmere-Surf-Report/7919/',
'/The-Cove-Surf-Report/7921/',
'/14th-Street-Surf-Report/9055/',
'/18th-St-Surf-Report/9056/',
'/30th-St-Surf-Report/9057/',
'/56th-St-Surf-Report/9059/',
'/Avalon-Surf-Report/821/',
'/Diamond-Beach-Surf-Report/9061/',
'/Nuns-Beach-Surf-Report/7948/',
'/Poverty-Beach-Surf-Report/4056/',
'/Sea-Isle-City-Surf-Report/1281/',
'/Stockton-Surf-Report/393/',
'/Stone-Harbor-Surf-Report/7920/',
'/Strathmere-Surf-Report/7919/',
'/The-Cove-Surf-Report/7921/',
'/Wildwood-Surf-Report/392/'
]

headers = {

'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

for x in id_list:

    url = 'https://magicseaweed.com' + x

    r = requests.get(url, headers=headers)
    try:
        soup = BeautifulSoup(r.text, 'html.parser')
        dfs = pd.read_html(str(soup))
        for df in dfs:
            print(df)
            if df.shape[0] > 50:
                df.to_csv(f"{x.replace('/', '_').replace('-', '_')}.csv")
            print('____________')
    except Exception as e:
        print(x, e)

This return several dataframes for each page, some more, some less, and saves the ones with more than 50 rows:

    0   1   2
0   Low     12:24AM     -0.05m
1   High    6:25AM  1.28m
2   Low     12:28PM     -0.01m
3   High    6:49PM  1.66m

____________

    0   1
0   First Light     5:36AM
1   Sunrise     6:05AM
2   Sunset  8:00PM
3   Last Light  8:30PM

____________

    Unnamed: 0  Surf    Swell Rating    Primary Swell   Primary Swell.1     Primary Swell.2     Secondary Swell     Secondary Swell.1   Secondary Swell.2   Secondary Swell.3   ...     Wind    Wind.1  Weather     Weather.1   Prob.   Unnamed: 17     Unnamed: 18     Unnamed: 19     Unnamed: 20     Unnamed: 21
0   Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     ...     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08     Wednesday 10/08
1   12am    0.5-0.8m    NaN     0.9m    6s  NaN     0.5m    9s  NaN     NaN     ...     11 11 kph   NaN     NaN     26°c    NaN     NaN     NaN     NaN     NaN     NaN
2   3am     0.3-0.5m    NaN     0.5m    9s  NaN     0.8m    6s  NaN     NaN     ...     13 17 kph   NaN     NaN     24°c    NaN     NaN     NaN     NaN     NaN     NaN
3   6am     0.2-0.3m    NaN     0.5m    9s  NaN     0.7m    6s  NaN     NaN     ...     12 16 kph   NaN     NaN     24°c    NaN     NaN     NaN     NaN     NaN     NaN
4   9am     0.3-0.6m    NaN     0.5m    9s  NaN     0.7m    6s  NaN     NaN     ...     13 16 kph   NaN     NaN     25°c    NaN     NaN     NaN     NaN     NaN     NaN
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
121     High    11:57PM     1.34m   NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
122     First Light     5:42AM  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
123     Sunrise     6:10AM  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
124     Sunset  7:53PM  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
125     Last Light  8:21PM  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

126 rows × 22 columns

____________

    0   1   2
0   Low     12:24AM     -0.05m
1   High    6:25AM  1.28m
2   Low     12:28PM     -0.01m
3   High    6:49PM  1.66m

____________

    0   1
0   First Light     5:36AM
1   Sunrise     6:05AM
2   Sunset  8:00PM
3   Last Light  8:30PM

____________

    0   1   2
0   Low     1:19AM  -0.13m
1   High    7:21AM  1.37m
2   Low     1:26PM  -0.06m
3   High    7:43PM  1.7m

____________

    0   1
0   First Light     5:37AM
1   Sunrise     6:06AM
2   Sunset  7:59PM
3   Last Light  8:28PM

____________

    0   1   2
0   Low     2:11AM  -0.18m
1   High    8:14AM  1.43m
2   Low     2:21PM  -0.09m
3   High    8:34PM  1.69m

____________

    0   1
0   First Light     5:38AM
1   Sunrise     6:07AM
2   Sunset  7:58PM
3   Last Light  8:27PM

____________

    0   1   2
0   Low     2:59AM  -0.21m
1   High    9:05AM  1.47m
2   Low     3:13PM  -0.09m
3   High    9:24PM  1.64m

____________

    0   1
0   First Light     5:39AM
1   Sunrise     6:08AM
2   Sunset  7:57PM
3   Last Light  8:25PM

____________

    0   1   2
0   Low     3:46AM  -0.2m
1   High    9:57AM  1.47m
2   Low     4:03PM  -0.06m
3   High    10:14PM     1.56m

____________

    0   1
0   First Light     5:40AM
1   Sunrise     6:09AM
2   Sunset  7:55PM
3   Last Light  8:24PM

____________

    0   1   2
0   Low     4:29AM  -0.15m
1   High    10:48AM     1.46m
2   Low     4:52PM  0.01m
3   High    11:05PM     1.46m

____________

    0   1
0   First Light     5:41AM
1   Sunrise     6:10AM
2   Sunset  7:54PM
3   Last Light  8:23PM

____________

    0   1   2
0   Low     5:12AM  -0.07m
1   High    11:39AM     1.43m
2   Low     5:42PM  0.1m
3   High    11:57PM     1.34m

____________

    0   1
0   First Light     5:42AM
1   Sunrise     6:10AM
2   Sunset  7:53PM
3   Last Light  8:21PM

Answered By: platipus_on_fire

Tbody data showing "None" when scraped?

Question:

Answers: