Get Number Group of Regex Python

Question:

So, i got some string that i want to get a pattern, the string has slight variation that can be string1 or string2

string1 = """
    Rak penyimpanan berbentuk high chest dengan gaya American Country.  Cocok digunakan untuk menyimpan 
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas.  Kualitas ekspor akan menjamin kepuasan 
Anda.  Dikirim jadi, tanpa perakitan. Panjang 76 cm Kedalaman 40 cm Tinggi 120 cm
"""

string2 = """
    Rak penyimpanan berbentuk high chest dengan gaya American Country.  Cocok digunakan untuk menyimpan 
segala keperluan hunian Anda! Dibuat dengan rangka kayu mahoni, papan mdf dan finishing cat duco berkualitas.  Kualitas ekspor akan menjamin kepuasan 
Anda.  Dikirim jadi, tanpa perakitan. P 76 cm L 40 cm T 120 cm
"""

What i want to achieve is to capture group pattern and get (51, 23, 47-89)
What i have done is create a pattern like this

pattern = (bP|Panjangb).+(d)+.+(bL|Kedalamanb).+(d)+.+(bT|Tinggib).+(d)+.[cm]+

i have tried it in https://regexr.com/ but the group only capture the last digit such as (1,3,9)
What am i missing, cause i already put + after the d in every group ?

Asked By: Michael Halim

||

Answers:

Regex

"(?:P|Panjang)s(?P<P>d+)scms(?:L|Kedalaman)s(?P<L>d+)scms(?:T|Tinggi)s(?P<T>d+)scm"g

About Regex:

  • See Regex 101
  • captures three groups: P, L and T
  • groups should have the digits match.
Answered By: pKiran

You can:

  • change the .+ to be more specific like scms or s
  • You can just match cm instead of using a character class [cm]+ that might also match ccc
  • If you only want the digits, you can omit the capture groups around the names

For example

bP(?:anjang)?s(d+)scms(?:L|Kedalaman)s(d+)scmsT(?:inggi)?s(d+)scmb

Explanation

  • b A word boundary to prevent a partial word match
  • P(?:anjang)?s Match P and optionally anjang
  • (d+)scms Capture 1+ digits in group 1, and match cm
  • (?:L|Kedalaman)s Match L or Kedalaman
  • (d+)scms Capture 1+ digits in group 2 and match cm
  • T(?:inggi)?s Match T and optionally inggi
  • (d+)scm Capture 1+ digit in group 3 and match cm
  • b A word boundary

Regex demo

Answered By: The fourth bird
bP(?:anjang)?s([d-]+)s(?:cm|m)?(?:s)?(?:L|Kedalaman)?s([d-]+)s(?:cm|m)?(?:s)?T(?:inggi)?s([d-]+)s(?:cm|m)?b
  • (?:) non-capturing group
  • b A word boundary
  • P(?:anjang)? Match P and or Panjang
  • s is whitespace
  • ([d-]+) Match 123 or 123-456
  • (?:cm|m)? Match cm or m or nothing
  • (?:s)? Match Whitespace or nothing
  • (?:L|Kedalaman)? Match L or Kedalaman
  • T(?:inggi)? Match T and or Tinggi
Answered By: limitededition
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.