Why are slice and range upper-bound exclusive?
Question:
I know that when I use range([start], stop[, step])
or slice([start], stop[, step])
, the stop
value is not included in the range or slice.
But why does it work this way?
Is it so that e.g. a range(0, x)
or range(x)
will contain x
many elements?
Is it for parallelism with the C for loop idiom, i.e. so that for i in range(start, stop):
superficially resembles for (i = start ; i < stop; i++) {
?
See also Loop backwards using indices for a case study: setting the stop
and step
values properly can be a bit tricky when trying to get values in descending order.
Answers:
The documentation implies this has a few useful properties:
word[:2] # The first two characters
word[2:] # Everything except the first two characters
Here’s a useful invariant of slice operations: s[:i] + s[i:]
equals s
.
For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3]
is 2
.
I think we can assume that the range functions act the same for consistency.
A bit late to this question, nonetheless, this attempts to answer the why-part of your question:
Part of the reason is because we use zero-based indexing/offsets when addressing memory.
The easiest example is an array. Think of an “array of 6 items” as a location to store 6 data items. If this array’s start location is at memory address 100, then data, let’s say the 6 characters ‘apple ’, are stored like this:
memory/
array contains
location data
100 -> 'a'
101 -> 'p'
102 -> 'p'
103 -> 'l'
104 -> 'e'
105 -> ' '
So for 6 items, our index goes from 100 to 105. Addresses are
generated using base + offset, so the first item is at base memory location 100 + offset 0
(i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, …, until 100
+ 5 is the last location.
This is the primary reason we use zero based indexing and leads to
language constructs such as for
loops in C:
for (int i = 0; i < LIMIT; i++)
or in Python:
for i in range(LIMIT):
When you program in a language like C where you deal with pointers
more directly, or assembly even more so, this base+offset scheme
becomes much more obvious.
Because of the above, many language constructs automatically use this range from start to length-1.
You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.
Example:
In C for instance if you have an array ar
and you subscript it as ar[3]
that really is equivalent to taking the (base) address of array ar
and adding 3
to it => *(ar+3)
which can lead to code like this printing the contents of an array, showing the simple base+offset approach:
for(i = 0; i < 5; i++)
printf("%cn", *(ar + i));
really equivalent to
for(i = 0; i < 5; i++)
printf("%cn", ar[i]);
Here’s the opinion of some Google+ user:
[…] I was swayed by the elegance of half-open intervals. Especially the
invariant that when two slices are adjacent, the first slice’s end
index is the second slice’s start index is just too beautiful to
ignore. For example, suppose you split a string into three parts at
indices i and j — the parts would be a[:i], a[i:j], and a[j:].
Google+ is closed, so link doesn’t work anymore. Spoiler alert: that was Guido van Rossum.
Here is another reason why an exclusive upper bound is a saner approach:
Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:
def apply_range_bad(lst, transform, start, end):
"""Applies a transform on the elements of a list in the range [start, end]"""
left = lst[0 : start-1]
middle = lst[start : end]
right = lst[end+1 :]
return left + [transform(i) for i in middle] + right
At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.
What would happen if:
start == 0
end == 0
end < 0
? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)
Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:
def apply_range_good(lst, transform, start, end):
"""Applies a transform on the elements of a list in the range [start, end)"""
left = lst[0:start]
middle = lst[start:end]
right = lst[end:]
return left + [transform(i) for i in middle] + right
(Note that apply_range_good
does not transform lst[end]
; it too treats end
as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)
(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)
Elegant-ness VS Obvious-ness
To be honest, I thought the way of slicing in Python is quite counter-intuitive, it’s actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it’s because there’s a lot of people don’t understand it intially.
Just for example, the following code had already caused headache for a lot of Python newbies.
x = [1,2,3,4]
print(x[0:1])
# Output is [1]
Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.
Now look at Ruby which uses upper-bound inclusive.
x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]
To be frank, I really thought the Ruby way of slicing is better for the brain.
Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.
# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]
Now let’s look at the the inclusive upper bound approach
# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]
Obviously, the code is less elegant, but there’s not much brain-processing to be done here.
Conclusion
In the end, it’s really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.
This upper bound exclusion improves code understanding greatly. I hope it comes to other languages.
I know that when I use range([start], stop[, step])
or slice([start], stop[, step])
, the stop
value is not included in the range or slice.
But why does it work this way?
Is it so that e.g. a range(0, x)
or range(x)
will contain x
many elements?
Is it for parallelism with the C for loop idiom, i.e. so that for i in range(start, stop):
superficially resembles for (i = start ; i < stop; i++) {
?
See also Loop backwards using indices for a case study: setting the stop
and step
values properly can be a bit tricky when trying to get values in descending order.
The documentation implies this has a few useful properties:
word[:2] # The first two characters
word[2:] # Everything except the first two characters
Here’s a useful invariant of slice operations:
s[:i] + s[i:]
equalss
.For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of
word[1:3]
is2
.
I think we can assume that the range functions act the same for consistency.
A bit late to this question, nonetheless, this attempts to answer the why-part of your question:
Part of the reason is because we use zero-based indexing/offsets when addressing memory.
The easiest example is an array. Think of an “array of 6 items” as a location to store 6 data items. If this array’s start location is at memory address 100, then data, let’s say the 6 characters ‘apple ’, are stored like this:
memory/
array contains
location data
100 -> 'a'
101 -> 'p'
102 -> 'p'
103 -> 'l'
104 -> 'e'
105 -> ' '
So for 6 items, our index goes from 100 to 105. Addresses are
generated using base + offset, so the first item is at base memory location 100 + offset 0
(i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, …, until 100
+ 5 is the last location.
This is the primary reason we use zero based indexing and leads to
language constructs such as for
loops in C:
for (int i = 0; i < LIMIT; i++)
or in Python:
for i in range(LIMIT):
When you program in a language like C where you deal with pointers
more directly, or assembly even more so, this base+offset scheme
becomes much more obvious.
Because of the above, many language constructs automatically use this range from start to length-1.
You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.
Example:
In C for instance if you have an array ar
and you subscript it as ar[3]
that really is equivalent to taking the (base) address of array ar
and adding 3
to it => *(ar+3)
which can lead to code like this printing the contents of an array, showing the simple base+offset approach:
for(i = 0; i < 5; i++)
printf("%cn", *(ar + i));
really equivalent to
for(i = 0; i < 5; i++)
printf("%cn", ar[i]);
Here’s the opinion of some Google+ user:
[…] I was swayed by the elegance of half-open intervals. Especially the
invariant that when two slices are adjacent, the first slice’s end
index is the second slice’s start index is just too beautiful to
ignore. For example, suppose you split a string into three parts at
indices i and j — the parts would be a[:i], a[i:j], and a[j:].
Google+ is closed, so link doesn’t work anymore. Spoiler alert: that was Guido van Rossum.
Here is another reason why an exclusive upper bound is a saner approach:
Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:
def apply_range_bad(lst, transform, start, end):
"""Applies a transform on the elements of a list in the range [start, end]"""
left = lst[0 : start-1]
middle = lst[start : end]
right = lst[end+1 :]
return left + [transform(i) for i in middle] + right
At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.
What would happen if:
start == 0
end == 0
end < 0
? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)
Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:
def apply_range_good(lst, transform, start, end):
"""Applies a transform on the elements of a list in the range [start, end)"""
left = lst[0:start]
middle = lst[start:end]
right = lst[end:]
return left + [transform(i) for i in middle] + right
(Note that apply_range_good
does not transform lst[end]
; it too treats end
as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)
(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)
Elegant-ness VS Obvious-ness
To be honest, I thought the way of slicing in Python is quite counter-intuitive, it’s actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it’s because there’s a lot of people don’t understand it intially.
Just for example, the following code had already caused headache for a lot of Python newbies.
x = [1,2,3,4]
print(x[0:1])
# Output is [1]
Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.
Now look at Ruby which uses upper-bound inclusive.
x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]
To be frank, I really thought the Ruby way of slicing is better for the brain.
Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.
# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]
Now let’s look at the the inclusive upper bound approach
# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]
Obviously, the code is less elegant, but there’s not much brain-processing to be done here.
Conclusion
In the end, it’s really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.
This upper bound exclusion improves code understanding greatly. I hope it comes to other languages.