web scraping of webpage on chartink.com
Question:
Please help me to scrape this link.
link – https://chartink.com/screener/time-pass-48
I am trying to web scrape but it is not showing the table which I want. please help me the same.
I have tried this code, but it is not giving me the desired result.
import requests
from bs4 import BeautifulSoup
URL = 'https://chartink.com/screener/time-pass-48'
page = requests.get(URL)
print(page)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)
Answers:
import requests
import bs4
page = requests.get("https://chartink.com/screener/time-pass-48")
bs4.BeautifulSoup(page.text,'lxml')
I think this should do it.
You can access the table data by making a post
request. You can have a look in the Chrome Dev Tools Network tab and see which elements are loading from elsewhere.
The data from the table is loading from https://chartink.com/screener/process
post request (look at the ‘process’ name in the network tab). You can make a post
request using the requests
library as QHarr suggested.
Alternatively, you can achieve this without making things complicated by using requests-html
library even though it will be much faster by getting data directly from the source, e.g. making a post
request.
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://chartink.com/screener/time-pass-48')
# renders javascript
response.html.render()
for result in response.html.xpath('//*[@id="DataTables_Table_0"]/tbody/tr'):
print(f'{result.text}n')
# part of the output:
'''
1
Kothari Products Limited
KOTHARIPRO
P&F | F.A
19.96%
106.7
262,997
'''
And from there all needs to be done is to split()
elements and get the desired element (index
), e.g:
for result in response.html.xpath('//*[@id="DataTables_Table_0"]/tbody/tr'):
# getting text data, splitting by a new line and grabbing first index [1]
# the process is the same for every other column
stock_name = result.text.split('n')[1]
print(stock_name)
# part of the output:
'''
Kothari Products Limited
STEELXIND
Oswal Chemicals & Fertilizers Limited
Hbl Power Systems Limited
'''
Data indeed comes from a POST request. You don’t need to allow JavaScript to run. You simply need to pick up one cookie (ci_session
– which can be done using Session object to hold cookies from initial landing page request to pass on with subsequent POST), and one token (X-CSRF-TOKEN
– which can be pulled from a meta
tag in the initial request response):
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
data = {
'scan_clause': '( {cash} ( monthly rsi( 14 ) > 60 and weekly rsi( 14 ) > 60 and latest rsi( 14 ) > 60 and 1 day ago rsi( 14 ) <= 60 and latest volume > 100000 ) ) '
}
with requests.Session() as s:
r = s.get('https://chartink.com/screener/time-pass-48')
soup = bs(r.content, 'lxml')
s.headers['X-CSRF-TOKEN'] = soup.select_one('[name=csrf-token]')['content']
r = s.post('https://chartink.com/screener/process', data=data).json()
#print(r.json())
df = pd.DataFrame(r['data'])
print(df)
I changed the data Scan clause as below. However, I’m getting an empty dataframe! what part of the code should be changed?
data = {
'scan_clause': '( {cash} ( latest close > 10 and latest tema(latest close,10) > latest tema( latest close,20) and latest volume > 50000 and market cap > 500) ) '
}
with requests.Session() as s:
r = s.get('https://chartink.com/screener/tema-swing-buy')
soup = bs(r.content, 'lxml')
s.headers['X-CSRF-TOKEN'] = soup.select_one('[name=csrf-token]')['content']
r = s.post('https://chartink.com/screener/process', data=data).json()
#print(r.json())
df = pd.DataFrame(r['data'])
print(df)
Please help me to scrape this link.
link – https://chartink.com/screener/time-pass-48
I am trying to web scrape but it is not showing the table which I want. please help me the same.
I have tried this code, but it is not giving me the desired result.
import requests
from bs4 import BeautifulSoup
URL = 'https://chartink.com/screener/time-pass-48'
page = requests.get(URL)
print(page)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)
import requests
import bs4
page = requests.get("https://chartink.com/screener/time-pass-48")
bs4.BeautifulSoup(page.text,'lxml')
I think this should do it.
You can access the table data by making a post
request. You can have a look in the Chrome Dev Tools Network tab and see which elements are loading from elsewhere.
The data from the table is loading from https://chartink.com/screener/process
post request (look at the ‘process’ name in the network tab). You can make a post
request using the requests
library as QHarr suggested.
Alternatively, you can achieve this without making things complicated by using requests-html
library even though it will be much faster by getting data directly from the source, e.g. making a post
request.
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://chartink.com/screener/time-pass-48')
# renders javascript
response.html.render()
for result in response.html.xpath('//*[@id="DataTables_Table_0"]/tbody/tr'):
print(f'{result.text}n')
# part of the output:
'''
1
Kothari Products Limited
KOTHARIPRO
P&F | F.A
19.96%
106.7
262,997
'''
And from there all needs to be done is to split()
elements and get the desired element (index
), e.g:
for result in response.html.xpath('//*[@id="DataTables_Table_0"]/tbody/tr'):
# getting text data, splitting by a new line and grabbing first index [1]
# the process is the same for every other column
stock_name = result.text.split('n')[1]
print(stock_name)
# part of the output:
'''
Kothari Products Limited
STEELXIND
Oswal Chemicals & Fertilizers Limited
Hbl Power Systems Limited
'''
Data indeed comes from a POST request. You don’t need to allow JavaScript to run. You simply need to pick up one cookie (ci_session
– which can be done using Session object to hold cookies from initial landing page request to pass on with subsequent POST), and one token (X-CSRF-TOKEN
– which can be pulled from a meta
tag in the initial request response):
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
data = {
'scan_clause': '( {cash} ( monthly rsi( 14 ) > 60 and weekly rsi( 14 ) > 60 and latest rsi( 14 ) > 60 and 1 day ago rsi( 14 ) <= 60 and latest volume > 100000 ) ) '
}
with requests.Session() as s:
r = s.get('https://chartink.com/screener/time-pass-48')
soup = bs(r.content, 'lxml')
s.headers['X-CSRF-TOKEN'] = soup.select_one('[name=csrf-token]')['content']
r = s.post('https://chartink.com/screener/process', data=data).json()
#print(r.json())
df = pd.DataFrame(r['data'])
print(df)
I changed the data Scan clause as below. However, I’m getting an empty dataframe! what part of the code should be changed?
data = {
'scan_clause': '( {cash} ( latest close > 10 and latest tema(latest close,10) > latest tema( latest close,20) and latest volume > 50000 and market cap > 500) ) '
}
with requests.Session() as s:
r = s.get('https://chartink.com/screener/tema-swing-buy')
soup = bs(r.content, 'lxml')
s.headers['X-CSRF-TOKEN'] = soup.select_one('[name=csrf-token]')['content']
r = s.post('https://chartink.com/screener/process', data=data).json()
#print(r.json())
df = pd.DataFrame(r['data'])
print(df)