Split string in Python into chunks with constant length, but right-aligned
Question:
Basically, I have a string like “12345678” and need a list containing this information, but split into substrings of length 3. The problem is, that I need it to be right-aligned, so the output must be ['12', '345', '678']
and NOT ['123', '456', '78']
.
How do I best achieve with few code lines and preferably without additional imports?
Answers:
It’s easy enough to adapt the top answer from How do I split a list into equally-sized chunks?:
def chunks_rightaligned(l, n):
orphan = len(l) % n
if orphan:
yield l[: orphan]
for i in range(orphan, len(l), n):
yield l[i : i + n]
This yields a chunk of the remainder length first, then iterates over the indices in chunk-size steps starting from the orphan size rather than 0.
Demo:
>>> def chunks_rightaligned(l, n):
... orphan = len(l) % n
... if orphan:
... yield l[: orphan]
... for i in range(orphan, len(l), n):
... yield l[i : i + n]
...
>>> list(chunks_rightaligned("12345678", 3))
['12', '345', '678']
>>> list(chunks_rightaligned("1234567", 3))
['1', '234', '567']
>>> list(chunks_rightaligned("123456", 3))
['123', '456']
If you want to try regular expressions, you can use the re.split()
function
>>> re.split(r"(...)(?=(?:ddd)+$)","12345678")
['12', '345', '678']
>>> re.split(r"(...)(?=(?:ddd)+$)","123")
['123']
EDIT
A better solution would be to use re.findall()
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "12345")
['12', '345']
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "123456")
['123', '456']
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "1234567")
['1', '234', '567']
What it does?
-
d{1,3}
Matches maximum of 3 characters, and minimum of 1 character.
-
(?=(?:d{3})*$)
Positive look ahead. Ensures that the matched characters is followed by a multiple of 3 digits.
(?:d{3})
matches 3 digits.
You can use variables in the regex string to generate variable chunk of data.
Example
>>> $limit=4
>>> regex = r"d{1,%d}(?=(?:d{%d})*$)" %(limit,limit)
>>> re.findall(regex, "1234567")
['123', '4567']
>>> limit=3
>>> regex = r"d{1,%d}(?=(?:d{%d})*$)" %(limit,limit)
>>> re.findall(regex, "1234567")
['1', '234', '567']
Basically, I have a string like “12345678” and need a list containing this information, but split into substrings of length 3. The problem is, that I need it to be right-aligned, so the output must be ['12', '345', '678']
and NOT ['123', '456', '78']
.
How do I best achieve with few code lines and preferably without additional imports?
It’s easy enough to adapt the top answer from How do I split a list into equally-sized chunks?:
def chunks_rightaligned(l, n):
orphan = len(l) % n
if orphan:
yield l[: orphan]
for i in range(orphan, len(l), n):
yield l[i : i + n]
This yields a chunk of the remainder length first, then iterates over the indices in chunk-size steps starting from the orphan size rather than 0.
Demo:
>>> def chunks_rightaligned(l, n):
... orphan = len(l) % n
... if orphan:
... yield l[: orphan]
... for i in range(orphan, len(l), n):
... yield l[i : i + n]
...
>>> list(chunks_rightaligned("12345678", 3))
['12', '345', '678']
>>> list(chunks_rightaligned("1234567", 3))
['1', '234', '567']
>>> list(chunks_rightaligned("123456", 3))
['123', '456']
If you want to try regular expressions, you can use the re.split()
function
>>> re.split(r"(...)(?=(?:ddd)+$)","12345678")
['12', '345', '678']
>>> re.split(r"(...)(?=(?:ddd)+$)","123")
['123']
EDIT
A better solution would be to use re.findall()
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "12345")
['12', '345']
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "123456")
['123', '456']
>>> re.findall(r"d{1,3}(?=(?:d{3})*$)", "1234567")
['1', '234', '567']
What it does?
-
d{1,3}
Matches maximum of 3 characters, and minimum of 1 character. -
(?=(?:d{3})*$)
Positive look ahead. Ensures that the matched characters is followed by a multiple of 3 digits.(?:d{3})
matches 3 digits.
You can use variables in the regex string to generate variable chunk of data.
Example
>>> $limit=4
>>> regex = r"d{1,%d}(?=(?:d{%d})*$)" %(limit,limit)
>>> re.findall(regex, "1234567")
['123', '4567']
>>> limit=3
>>> regex = r"d{1,%d}(?=(?:d{%d})*$)" %(limit,limit)
>>> re.findall(regex, "1234567")
['1', '234', '567']