How to extract user account name and video id from a shortened tiktok URL?

Question:

I’m trying to get the URL of a tiktok video from a shortened URL in order to extract the @username of the poster and the video id of the post. Some examples of shortened URL’s I’ve come across seem to be shared URL’s on Facebook/Twitter in the form of "m.tiktok.com" or more specifically, "https://vm.tiktok.com/pF6GGf/". This link ends up redirecting to "https://www.tiktok.com/@blessy2flex/video/6796374554391448838…". Is there any way I could get this URL with only the shortened URL?

I want to be able to get the username (@blessy2flex) and the video id (6796374554391448838) from the shortened URL as it appears in the actual URL. I’ve tried tracking redirects but the URL I end up with "https://m.tiktok.com/v/6833793010149412101.html…" is this, which evidently is not the same.

I’ve also tried things like Selenium, which actually ends up giving me the HTML of the original video page, in which I can find the username and the video id by searching through the actual HTML, but this method doesn’t seem too scalable as I’m sure tiktok would notice and slow down my processes.

Asked By: dhsong

||

Answers:

TikTok might be not redirecting you the right URL because it is detecting your User-Agent. If you update your headers with some ‘browser-like’ User-Agent, it should work.

Here’s how you can solve your problem.

import re
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

url = 'https://vm.tiktok.com/pF6GGf/'
response = requests.get(url, headers=headers)

print(response.url) # the correct url with the username

Finally, you can find the username and the video id using regex.

re.findall(r'(@[a-zA-z0-9]*)/.*/([d]*)?',response.url)

OUTPUT: [('@blessy2flex', '6796374554391448838')]

Extra: Modern webservices are usually quite smart and may sometimes have different mechanisms to thwart crawling activities. If you plan to do a lot of crawling (I assume valid/legal), you’ll have to also take into account your rate of requesting the URL pages (among a lot of other things). If you need to manage more user-agents, you might find this pip package helpful (fake-useragent).

Answered By: DaveIdito

In views.py you can get id

import os.path
tiktok = Tiktoks.objects.get(pk=pk)
parsed = urlparse(tiktok.video_link).path
path1 = os.path.split(parsed)
get_id = path1[1]
Answered By: Asif Shahzad

I actually do use selenium to do this. Its much more reliable than requests imho and it allows someone to also standardize urls from utms and whatnot. For this you also need the package
webdriver_manager.

from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver


def tiktok_post_clean_up(self, url):
    if "tiktok" in url:
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--window-size=1920,1080")
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--incognito")
        chrome_options.add_argument("--disable-dev-shm-usage")
        driver = webdriver.Chrome(
            ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install(),
            options=chrome_options,
        )
        driver.get(url)
        time.sleep(15)
        url = driver.current_url
        url = url.split("/")
        tt_id = url[5].split("?")
        url = (
            url[0]
            + "/"
            + url[1]
            + "/"
            + url[2]
            + "/"
            + url[3]
            + "/"
            + url[4]
            + "/"
            + tt_id[0]
        )
        return url
    else:
        return url
Answered By: Marley Rosario