Python regex works on Regex101, but it does not work in Python 2

Question:

I create a regex to match the Chinese and English name of the TV shows.

My regex is located at https://regex101.com/r/rBJHDG. It is working perfectly on the regex. However, this regex does not work in Python 2.

For examples, string 亿万.Billions.S01E01.中英字幕.HDTVrip.1024X576.mp4.

The regex does not match 亿万 as name_chs in expect. Instead, it matches 亿万.Billions as name_en.

In [68]: r = '^(?P<name_chs>(?:[\u3007\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff]+)(?=\.))?(?P<name_en>\S+).S(?P<season>\d{2})E(?P<episode>\d{2})'

In [69]: re.match(r, u'亿万.Billions.S01E01.中英字幕.HDTVrip.1024X576.mp4').grou
    ...: pdict()
Out[69]:
{'episode': u'01',
 'name_chs': None,
 'name_en': u'u4ebfu4e07.Billions',
 'season': u'01'}

Second question:

How can I remove the . in name_en which between the Chinese name and English name?

# 亿万.Billions.S01E01.中英字幕.HDTVrip.1024X576.mp4
Full match    0-18    `亿万.Billions.S01E01`
Group `name_chs`    0-2    `亿万`
Group `name_en`    2-11    `.Billions`   <---- This DOT!
Group `season`    13-15    `01`
Group `episode`    16-18    `01`
Asked By: tywtyw2002

||

Answers:

It looks like the problem is that the regex tester includes the global and multiline flags but your code does not. If you uncheck those two flags in the regex tester you’ll find that the tester matches your current results.

You could try r = '^(?P<name_chs>(?:[\u3007\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff]+)(?=\.))?(?P<name_en>\S+).S(?P<season>\d{2})E(?P<episode>\d{2})', re.MULTILINE)

and

re.search(r, u'亿万.Billions.S01E01.中英字幕.HDTVrip.1024X576.mp4').grou
    ...: pdict()

As for your second question:

I would just make that dot its own capture group by adding (.) in front of the English name, like so…

^(?P<name_chs>(?:[u3007u4e00-u9fffu3400-u4dbfuf900-ufaff]+)(?=.))?(.)(?P<name_en>S+).S(?P<season>d{2})E(?P<episode>d{2})

Now when you print the English name it will only be the word because the dot is in its own capture group.

Answered By: S. Aldaris
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.