Script to check for keyword on website

Question:

I want to write a script that goes trough a list of url’s checking wheter they are valid or not.

The page does not redirect to a 404 but rather displays the sentence ‘Sorry, not found! if the URL is invalid.

So if the script finds this sentence, the URL is invalid. If it does not it should most likely be valid.

Any idea on how to realize that in JS? Pointers to possible methods in other languages are welcome too!

Thanks!

Answers:

A simple Python way would be:

import requests

urls = ['https://www.google.com'] # Fill this however
for url in urls:
    resp = requests.get(url)
    if 'Sorry, not found!' in resp.text:
        print(url + ' had no page') # or something
Answered By: dheiberg

I succeeded with jQuery. I don’t think anyone can do this in javascript alone. You will have to use jQuery anyways.

First you should try out in Chrome Console:

1.Add this extension to get rid of CORS policy error
Chrome Extension. Make sure it is enabled in Chrome->More Tools->Extensions

2.Now we have to run get() and we cannot call it like $.get() which you usually use in .js files. So we need to convert it for console by running below lines in console:

var jq = document.createElement('script');
jq.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(jq);

3.Fire get request:

var rsp = jQuery.get("https://www.google.com/");

wait for 2 sec…(ES6 has disabled synchronous requests, so wait for rsp to get populated)

if (rsp.responseText && rsp.responseText.includes("was not found")) { //In your js file replace with Sorry! not found
console.log("The Url is Invalid"); 
}
else {
console.log("could be a valid url"); //this must get printed
}

Try invalid url:

var rsp = jQuery.get("https://www.goesfsfsfsffogle.com/");

wait for 2 sec…

if (rsp.responseText && rsp.responseText.includes("was not found")) { //In your js file replace with Sorry! not found
console.log("The Url is Invalid"); //this must get printed
}
else {
console.log("could be a valid url"); 
}

Running inside your jQuery project file:

var urls = ["https://www.google.com/"];
var url;
for ( url in urls ){
var rsp = $.get(url);
//A wait should be added here for rsp to get populated
//console.log("readyState="+rsp.readyState);
if (rsp.responseText && rsp.responseText.includes("Sorry! not found")) 
{  
console.log("The Url is Invalid"); 
}
else {
console.log("Its a valid url"); 
}
}

Again if rsp doesn’t contain readyState === 4, it means the async response has not been received. We need to add wait before if check in that case.

Let me know if this doesn’t help you.

Answered By: user3206070