Why are slice and range upper-bound exclusive?

Question:

I know that when I use range([start], stop[, step]) or slice([start], stop[, step]), the stop value is not included in the range or slice.

But why does it work this way?

Is it so that e.g. a range(0, x) or range(x) will contain x many elements?

Is it for parallelism with the C for loop idiom, i.e. so that for i in range(start, stop): superficially resembles for (i = start ; i < stop; i++) {?


See also Loop backwards using indices for a case study: setting the stop and step values properly can be a bit tricky when trying to get values in descending order.

Asked By: wap26

||

Answers:

The documentation implies this has a few useful properties:

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters

Here’s a useful invariant of slice operations: s[:i] + s[i:] equals s.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

I think we can assume that the range functions act the same for consistency.

Answered By: Toomai

A bit late to this question, nonetheless, this attempts to answer the why-part of your question:

Part of the reason is because we use zero-based indexing/offsets when addressing memory.

The easiest example is an array. Think of an “array of 6 items” as a location to store 6 data items. If this array’s start location is at memory address 100, then data, let’s say the 6 characters ‘apple’, are stored like this:

memory/
array      contains
location   data
 100   ->   'a'
 101   ->   'p'
 102   ->   'p'
 103   ->   'l'
 104   ->   'e'
 105   ->   ''

So for 6 items, our index goes from 100 to 105. Addresses are
generated using base + offset, so the first item is at base memory location 100 + offset 0
(i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, …, until 100
+ 5 is the last location.

This is the primary reason we use zero based indexing and leads to
language constructs such as for loops in C:

for (int i = 0; i < LIMIT; i++)

or in Python:

for i in range(LIMIT):

When you program in a language like C where you deal with pointers
more directly, or assembly even more so, this base+offset scheme
becomes much more obvious.

Because of the above, many language constructs automatically use this range from start to length-1.

You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.

Example:

In C for instance if you have an array ar and you subscript it as ar[3] that really is equivalent to taking the (base) address of array ar and adding 3 to it => *(ar+3) which can lead to code like this printing the contents of an array, showing the simple base+offset approach:

for(i = 0; i < 5; i++)
   printf("%cn", *(ar + i));

really equivalent to

for(i = 0; i < 5; i++)
   printf("%cn", ar[i]);
Answered By: Levon

Here’s the opinion of some Google+ user:

[…] I was swayed by the elegance of half-open intervals. Especially the
invariant that when two slices are adjacent, the first slice’s end
index is the second slice’s start index is just too beautiful to
ignore. For example, suppose you split a string into three parts at
indices i and j — the parts would be a[:i], a[i:j], and a[j:].

Google+ is closed, so link doesn’t work anymore. Spoiler alert: that was Guido van Rossum.

Answered By: Nigel Tufnel

Here is another reason why an exclusive upper bound is a saner approach:

Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:

def apply_range_bad(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end]"""
     left = lst[0 : start-1]
     middle = lst[start : end]
     right = lst[end+1 :]
     return left + [transform(i) for i in middle] + right

At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.

What would happen if:

  • start == 0
  • end == 0
  • end < 0

? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)

Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:

def apply_range_good(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end)"""
     left = lst[0:start]
     middle = lst[start:end]
     right = lst[end:]
     return left + [transform(i) for i in middle] + right

(Note that apply_range_good does not transform lst[end]; it too treats end as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)

(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)

Answered By: jamesdlin

Elegant-ness VS Obvious-ness

To be honest, I thought the way of slicing in Python is quite counter-intuitive, it’s actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it’s because there’s a lot of people don’t understand it intially.

Just for example, the following code had already caused headache for a lot of Python newbies.

x = [1,2,3,4]
print(x[0:1])
# Output is [1]

Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.

Now look at Ruby which uses upper-bound inclusive.

x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]

To be frank, I really thought the Ruby way of slicing is better for the brain.

Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.

# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]

Now let’s look at the the inclusive upper bound approach

# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]

Obviously, the code is less elegant, but there’s not much brain-processing to be done here.

Conclusion

In the end, it’s really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.

Answered By: Wong Jia Hau

This upper bound exclusion improves code understanding greatly. I hope it comes to other languages.

Answered By: telepinu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.