Unable to fetch timestamp correctly in beautiful soup
Question:
Please refer to the attached picture image. I am trying fetch the timestamp and the below 10 #content as shown in the image and in expected output in below code, However I am not able to fetch "40 minutes ago" type text. instead I am getting "08-04-2021 16:48:34" in this format.
from bs4 import BeautifulSoup
import requests
URL="https://trends24.in/india/"
html_text=requests.get(URL)
soup= BeautifulSoup(html_text.content,'lxml')
results = []
job_elem=soup.findAll(attrs={'class': 'trend-card'})
for j in job_elem:
print(j.find('h5').get_text())
for i in soup.select('#trend-list li'):
d = dict()
d[i.a.text] = ''
try:
val = i.select_one('.tweet-count').text
except:
val = "NA"
finally:
d[i.a.text] = val
results.append(d)
print(d)
**Output:**
08-04-2021 16:48:34
08-04-2021 15:54:30
08-04-2021 15:01:07
...
{'#AskNivetha': 'NA'}
{'#TikaUtsav': 'NA'}
{'#VakeelSaabFestivalBegins': '62K'}
...
**expected output :**
40 minutes ago
{'#AskNivetha': 'NA'}
{'#TikaUtsav': 'NA'}
{'#VakeelSaabFestivalBegins': '62K'}
{'ANMOL SUSHANT': '33K'}
{'#TheBigBull': 'NA'}
{'#IPL2021': '73K'}
{'nidra ley uv creations': '64K'}
{'Chief Ministers': 'NA'}
{'B. True 48MP Camera': 'NA'}
{'conan': '51K'}
1 hour ago
{'#AskNivetha': 'NA'}
{'#VakeelSaabFestivalBegins': '50K'}
{'NIDRA LEY UV CREATIONS': '59K'}
{'#SecretOfHappyLiving': 'NA'}
{'#MeditateToRaiseWillpower': 'NA'}
{'#HappinessMantra': 'NA'}
{'ANMOL SUSHANT': 'NA'}
{'Tika Utsav': 'NA'}
{'Chief Ministers': 'NA'}
{'conan': '46K'}
Also i am trying to fetch the timestamp and then 10 #content titles. as shown in the screenshot attached.
Answers:
That is the format the datetime info stored in. Disable JavaScript and you will see:
What you see in webpage is the data-timestamp
attribute value that gets prettified when JavaScript runs in the webpage. More specifically, when the following is called:
T24.prettyDate = function(t) {
var e = new Date(1e3 * t),
n = ((new Date).getTime() - e.getTime()) / 1e3,
a = Math.floor(n / 86400);
return isNaN(a) || a < 0 ? "" : 0 === a && ((n < 900 ? "just now" : n < 1800 && "few minutes ago") || n < 3600 && Math.floor(n / 60) + " minutes ago" || n < 7200 && "1 hour ago" || n < 86400 && Math.floor(n / 3600) + " hours ago") || 1 === a && "Yesterday" || a < 7 && a + " days ago" || a < 31 && Math.ceil(a / 7) + " weeks ago" || 31 < a && Math.ceil(a / 30) + " months ago"
}
You could write your own function, with the above as a logic guide, and use that, or use selenium to automate a browser.
Please refer to the attached picture image. I am trying fetch the timestamp and the below 10 #content as shown in the image and in expected output in below code, However I am not able to fetch "40 minutes ago" type text. instead I am getting "08-04-2021 16:48:34" in this format.
from bs4 import BeautifulSoup
import requests
URL="https://trends24.in/india/"
html_text=requests.get(URL)
soup= BeautifulSoup(html_text.content,'lxml')
results = []
job_elem=soup.findAll(attrs={'class': 'trend-card'})
for j in job_elem:
print(j.find('h5').get_text())
for i in soup.select('#trend-list li'):
d = dict()
d[i.a.text] = ''
try:
val = i.select_one('.tweet-count').text
except:
val = "NA"
finally:
d[i.a.text] = val
results.append(d)
print(d)
**Output:**
08-04-2021 16:48:34
08-04-2021 15:54:30
08-04-2021 15:01:07
...
{'#AskNivetha': 'NA'}
{'#TikaUtsav': 'NA'}
{'#VakeelSaabFestivalBegins': '62K'}
...
**expected output :**
40 minutes ago
{'#AskNivetha': 'NA'}
{'#TikaUtsav': 'NA'}
{'#VakeelSaabFestivalBegins': '62K'}
{'ANMOL SUSHANT': '33K'}
{'#TheBigBull': 'NA'}
{'#IPL2021': '73K'}
{'nidra ley uv creations': '64K'}
{'Chief Ministers': 'NA'}
{'B. True 48MP Camera': 'NA'}
{'conan': '51K'}
1 hour ago
{'#AskNivetha': 'NA'}
{'#VakeelSaabFestivalBegins': '50K'}
{'NIDRA LEY UV CREATIONS': '59K'}
{'#SecretOfHappyLiving': 'NA'}
{'#MeditateToRaiseWillpower': 'NA'}
{'#HappinessMantra': 'NA'}
{'ANMOL SUSHANT': 'NA'}
{'Tika Utsav': 'NA'}
{'Chief Ministers': 'NA'}
{'conan': '46K'}
Also i am trying to fetch the timestamp and then 10 #content titles. as shown in the screenshot attached.
That is the format the datetime info stored in. Disable JavaScript and you will see:
What you see in webpage is the data-timestamp
attribute value that gets prettified when JavaScript runs in the webpage. More specifically, when the following is called:
T24.prettyDate = function(t) {
var e = new Date(1e3 * t),
n = ((new Date).getTime() - e.getTime()) / 1e3,
a = Math.floor(n / 86400);
return isNaN(a) || a < 0 ? "" : 0 === a && ((n < 900 ? "just now" : n < 1800 && "few minutes ago") || n < 3600 && Math.floor(n / 60) + " minutes ago" || n < 7200 && "1 hour ago" || n < 86400 && Math.floor(n / 3600) + " hours ago") || 1 === a && "Yesterday" || a < 7 && a + " days ago" || a < 31 && Math.ceil(a / 7) + " weeks ago" || 31 < a && Math.ceil(a / 30) + " months ago"
}
You could write your own function, with the above as a logic guide, and use that, or use selenium to automate a browser.