Split string by special pattern

Question:

I have long string, which can consist of few sub-strings (not always, sometimes it’s one string, sometimes there are 4 sub-strings sticked together). Each one starts with byte length, for example 4D or 4E. Below is example big-string which consists of 4 sub-strings:

4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB

After splitting by pattern, the output SHOULD BE:

4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB5
4E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B969
4E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB

Each long string has ID – in this case it’s 44B909, each line has this ID after bytes. My original code took first 6 letters (4D44B9) and splitted string by this. It’s working in 95% cases – where EACH line has same length, for example 4D. The problem is that not always each line has same length – as in string above. Look at my code below:

def repeat():
    string = input('Please paste string below:'+'n')
    code = string[:6]   

    print('n')
    print('SPLITTED:')
    string = string.replace(code, 'n'+'n'+code)
    
    print(string)


while True:
   repeat()

When you try to paste this one long string, it won’t split it, because first line has 4D, and rest has 4E. I’d like it to "ignore" (for a moment) first 2 letters (4E) and take six next letters, as "split-pattern"? The output should be as these 4 lines above! I was changing code a bit, but I was getting some strange results, like below:

44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E
44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E
44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB

How can I make it work??

Asked By: michalb93

||

Answers:

If the first two characters encode the string’s length in hex, why do you not use that to decide how much of the string to consume?
However, the offsets in your example seem wrong; 4D is correct (decimal 78) but 4E should apparently be 51 (the string is four characters longer).

For the question about how to split on a slightly variable pattern, a regular expression seems like a good solution.

import re

splitted = re.split(r'4[DE](?=44B909)', string)

In so many words, this says "use 4D or 4E as the delimiter to split on, but only if it’s immediately followed by 44B909".

(There will be an empty group before the first value but that’s easy to shift off; or change the regex to r'(?<!^)4[DE](?=44B909O)'.)

If you don’t want to discard anything, include everything in the lookahead:

splitted = re.split(r'(?<!^)(?=4[DE]44B909)', string)
Answered By: tripleee
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.