Crawling IMDB for movie trailers?

Question:

I want to crawl IMDB and download the trailers of movies (either from YouTube or IMDB) that fit some criteria (e.g.: released this year, with a rating above 2).

I want to do this in Python – I saw that there were packages for crawling IMDB and downloading YouTube videos. The thing is, my current plan is to crawl IMDB and then search youtube for ‘$movie_name’ + ‘trailer’ and hope that the top result is the trailer, and then download it.

Still, this seems a bit convoluted and I was wondering if there was perhaps an easier way.

Any help would be appreciated.

Asked By: Barry

||

Answers:

There is no easier way. I doubt IMDB allows people to scrap their website freely so your IP is probably gonna get blacklisted and to counter that you’ll need proxies. Good luck and scrape respectfully.

EDIT: Please take a look at @pds’s answer below. My answer is no longer valid.

The imdbpy API https://imdbpy.github.io/ will get you started, it’s very straightforward.

  import imdb # pip install IMDbPY
  ia = imdb.IMDb()
  list_of_movies = ia.search_movie("string text")
  [ia.(m, info=['main','votes']) for m in list_of_movies[:1]]
  for m in list_of_movies[:1]:
    yt_search_term = m.get("name") + "trailer"
    # connect to YT API to start that part of the search.

Then lookup how to connect to the YTv3 API with appropriate authentication and download the corresponding Google client API – Sample code here

Issues: One challenge is that movie titles are not unique, so searching YT by name+" trailer" will not necessarily return your intended trailer. So you’ll need to account for that somehow. For new hollywood blockbusters (and similar), you may be successful.

Legal: As indicated by the other answer, do verify your use case is in compliance with the terms and conditions and licenses of the technologies and information services that you are using. If in doubt seek the approval from those parties first or seek professional legal advice.

Answered By: pds

This will provide the video link for you.
like this

Video Link

The following code parses the HTML source file of this video page.
The mp4 link is here in the HTML source file. You can view the source file and search ".mp4"

The links are in <script type="application/json"> json file having links </script>

Each link expires in 1-2 hours, so you may download from the link instead of saving the links in a file or you can just run the script every time.

from bs4 import BeautifulSoup
import requests
video_id = "vi2766453273"
video_url = "https://www.imdb.com/video/"+video_id
print(video_url)

r = requests.get(url=video_url)
soup = BeautifulSoup(r.text, 'html.parser')
script =soup.find("script",{'type': 'application/json'})
json_object = json.loads(script.string)
print(json_object["props"]["pageProps"]["videoPlaybackData"]["video"]["playbackURLs"])

videos = json_object["props"]["pageProps"]["videoPlaybackData"]["video"]["playbackURLs"]
# links video quality order auto,1080,720
for video in videos[1:] :
    video_link = video["url"]
    print(video_link)  
    #break

Checkout the full code at GitHub

Answered By: Pavan Kalyan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.