Making GET request to a tiktok url in order to get a canonical link

Question:

I want to make a GET request to a tiktok url via python but it does not work.

Let’s say we have a tiktok link from a mobile app – https://vt.tiktok.com/ZS81uRSRR/ and I want to get its video_id which is available in a canonical link. This is the canonical link for the provided tiktok: https://www.tiktok.com/@notorious_foodie/video/7169643841316834566?_r=1&_t=8XdwIuoJjkX&is_from_webapp=v1&item_id=7169643841316834566

video_id comes after /video/, for example in the link above video_id would be 7169643841316834566

When I open a mobile link on my laptop in a browser it redirects me to the canonical link, I wanted to achieve the same behavior via code and managed to do it like so:

import requests
def get_canonical_url(url):
    return requests.get(url, timeout=5).url

It was working for a while but then it started raising timeout errors every time, I managed to fix it by providing cookie. I made a request to Postman(it works when I make GET request through postman though), copied cookies, modified my function to accept cookies and it started working again. It had been working with cookies for ~6 months although last week it stopped working again, I thought that the reason might be in the expired cookies but when I updated them it didn’t help.

This is the error I keep getting:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.tiktok.com', port=443): Read timed out. (read timeout=5)

The weirdest thing is that I can make my desired request just fine via curl:
enter image description here

Or via Postman:

enter image description here


Recap

So the problem is that my python GET request never succeeded and I can’t understand why. I tried using VPN in case tiktok has banned my ip, also I tried to run this request on some of my servers to try different server locations but none of my attempts worked.

Could you give me a piece of advice how to debug this issue further or maybe any other ideas how I can get video_id out of mobile tiktok link?

Asked By: qwerty qwerty

||

Answers:

Method 1 – Using subprocess

Execute curl command and catch the output and it will take ~0.5 seconds.

import subprocess
import re
process_detail = subprocess.Popen(["curl", "https://vt.tiktok.com/ZS81uRSRR/"], stdout=subprocess.PIPE)
output = process_detail.communicate()[0].decode()
process_detail.kill()
canonical_link = re.search("(?P<url>https?://[^s]+)+?", output).group("url")
print("Canonical link: ", canonical_link)

Method 2 – Using proxies

We need to use proxies. here is the solution for free proxies which we can scrap and apply dynamically using BeautifulSoup..

First install BeautifulSoup using pip install BeautifulSoup

Solution:

from bs4 import BeautifulSoup
import requests


def scrap_now(url):
    print(f"<======================> Scrapping Started <======================>")
    print(f"<======================> Getting proxy <======================>")
    source = requests.get('https://free-proxy-list.net/').text
    soup = BeautifulSoup(source, "html.parser")
    ips_container = soup.findAll("table", {"class": "table table-striped table-bordered"})
    ip_trs = ips_container[0].findAll('tr')
    for i in ip_trs[1:]:
        proxy_ip = i.findAll('td')[0].text + ":" + i.findAll('td')[1].text
        try:
            proxy = {"https": proxy_ip}
            print(f"<======================> Trying with: {proxy_ip}<======================>")
            headers = {'User-Agent': 'Mozilla/5.0'}
            resp = requests.get(url, headers=headers, proxies=proxy, timeout=5)
            if resp.status_code == requests.codes.ok:
                print(f"<======================> Got Success with: {proxy_ip}<======================>")
                return resp.url
        except Exception as e:
            print(e)
            continue
    return ""


canonical_link = scrap_now("https://vt.tiktok.com/ZS81uRSRR/")
print("Canonical link: ", canonical_link)

Output:
enter image description here

Method – 3: Using Selenium

We can do this with selenium as well. It will take almost 5 seconds
First, install selenium using pip install selenium==3.141.0

then execute below lines:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
    "profile.default_content_setting_values.media_stream_mic": 1,
    "profile.default_content_setting_values.media_stream_camera": 1,
    "profile.default_content_setting_values.geolocation": 1,
    "profile.default_content_setting_values.notifications": 1,
    "credentials_enable_service": False,
    "profile.password_manager_enabled": False
})
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ['enable-automation'])
browser = webdriver.Chrome(ChromeDriverManager(cache_valid_range=365).install(), options=options)
browser.get("https://vt.tiktok.com/ZS81uRSRR/")
print("Canonical link: ", browser.current_url)

Note: On first run it will take a bit more time as it will download web drivers automatically, but after that it will use cache only.

Answered By: Usman Arshad
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.