Why is True returned when checking if an empty string is in another?

Question:

My limited brain cannot understand why this happens:

>>> print '' in 'lolsome'
True

In PHP, a equivalent comparison returns false (and a warning):

var_dump(strpos('lolsome', ''));
Asked By: Marcelo Assis

||

Answers:

From the documentation:

For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

From looking at your print call, you’re using 2.x.

To go deeper, look at the bytecode:

>>> def answer():
...   '' in 'lolsome'

>>> dis.dis(answer)
  2           0 LOAD_CONST               1 ('')
              3 LOAD_CONST               2 ('lolsome')
              6 COMPARE_OP               6 (in)
              9 POP_TOP
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

COMPARE_OP is where we are doing our boolean operation and looking at the source code for in reveals where the comparison happens:

    TARGET(COMPARE_OP)
    {
        w = POP();
        v = TOP();
        if (PyInt_CheckExact(w) && PyInt_CheckExact(v)) {
            /* INLINE: cmp(int, int) */
            register long a, b;
            register int res;
            a = PyInt_AS_LONG(v);
            b = PyInt_AS_LONG(w);
            switch (oparg) {
            case PyCmp_LT: res = a <  b; break;
            case PyCmp_LE: res = a <= b; break;
            case PyCmp_EQ: res = a == b; break;
            case PyCmp_NE: res = a != b; break;
            case PyCmp_GT: res = a >  b; break;
            case PyCmp_GE: res = a >= b; break;
            case PyCmp_IS: res = v == w; break;
            case PyCmp_IS_NOT: res = v != w; break;
            default: goto slow_compare;
            }
            x = res ? Py_True : Py_False;
            Py_INCREF(x);
        }
        else {
          slow_compare:
            x = cmp_outcome(oparg, v, w);
        }
        Py_DECREF(v);
        Py_DECREF(w);
        SET_TOP(x);
        if (x == NULL) break;
        PREDICT(POP_JUMP_IF_FALSE);
        PREDICT(POP_JUMP_IF_TRUE);
        DISPATCH();
    }

and where cmp_outcome is in the same file, it’s easy to find our next clue:

res = PySequence_Contains(w, v);

which is in abstract.c:

{
    Py_ssize_t result;
    if (PyType_HasFeature(seq->ob_type, Py_TPFLAGS_HAVE_SEQUENCE_IN)) {
        PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
        if (sqm != NULL && sqm->sq_contains != NULL)
            return (*sqm->sq_contains)(seq, ob);
    }
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

and to come up for air from the source, we find this next function in the documentation:

objobjproc PySequenceMethods.sq_contains

This function may be used by PySequence_Contains() and has the same signature. This slot may be left to NULL, in this case PySequence_Contains() simply traverses the sequence until it finds a match.

and further down in the same documentation:

int PySequence_Contains(PyObject *o, PyObject *value)

Determine if o contains value. If an item in o is equal to value, return 1, otherwise return 0. On error, return -1. This is equivalent to the Python expression value in o.

Where '' isn’t null, the sequence 'lolsome' can be thought to contain it.

Answered By: user559633

I dug deeper and found the source code corresponding to the strpos function,

    if (!Z_STRLEN_P(needle)) {
        php_error_docref(NULL, E_WARNING, "Empty needle");
        RETURN_FALSE;
    }

They consider the empty string being searched as a problematic case. So, they are issuing a warning and returning false. Apart from this I couldn’t find any document discussing why it is being considered as a problem.

As far as Python is concerned, this behaviour is well defined in the Comparisons section,

Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

Answered By: thefourtheye

Suppose you have 2 piles of similar objects, say, best stanzas of your favorite poet, of 5 and 2 respectively. Does bigger set contains smaller set? How to check:
1) for any stanza in the smaller pile you may find it in a bigger one.
2) smaller pile doesn’t contain something which is absent in bigger one.

So we may use this pseudocode to check:

for object in smaller:
    if object not in bigger:
       return 'we found object from smaller absent in bigger'
    else:
       go to next object
return 'all is ok - all objects from smaller are in bigger'

If you haven’t found such an object, you come to the end of algo, and think smaller is a subset of bigger.

Now imagine smaller pile is of 0 stanzas.
Applying the same rules above, we perform 0 checks and also do not find object from smaller which is absent in bigger.

So it’s correct and handy to deem empty string as a subset of any other string. Even itself. And this is realized in python.

    >>> '' in 'adfsf'
    True
    >>> '' in ''
    True
Answered By: Eugene Lisitsky

Basically, from math:

The empty set is a subset of every set

The same logic works here. You can consider '' an empty set. And therefore, it’s a subset of every string set, since they must be the same type.

>>> a = ""
>>> b = "Python"
>>> a in b
True
>>> set(a).issubset(b)
True
>>> a = set() #empty set
>>> b = set([1,2,3])
>>> a.issubset(b)
True
>>> 

But be careful! A subset and a membership are different things.

enter image description here

Answered By: GLHF

The empty string is the unique string of length zero.
The empty string is the identity element of the concatenation operation.
The empty string precedes any other string under lexicographical order, because it is the shortest of all strings.
The empty string is a legitimate string, upon which most string operations should work.
Wikipedia

 > strlen("");
=> 0
 > "a" . "" == "a";
=> true
 > "" . "a" == "a";
=> true   
 > "" < "";
=> true   

From above, it seems PHP treats the empty string as a valid string.

> strstr("lolsome", "");
strstr(): Empty needle :1

But it doesn’t seem to consider the empty string as fully legitimate one. Most probably PHP is the only language which doesn’t allow the substring to be searched within a string to be an empty string.

Is it a defensive mechanism? Obviously, programmers don’t have to protect the needle with if. If so, why other languages allow this test to pass!!! Language designers have to answer

What’s a Python string made up of?

>>> ''.count('')
1

Obviously The empty string has one empty string.

>>> 'a'.count('')
2

One element string has two empty srings.

>>> 'ab'.count('')
3

So it seems Python string is concatenation of one element strings. Each element in a string is sandwiched between two empty strings.

>>> "lolsome".split('')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator

But here Python contradicts the validity of the empty string. Is it a bug?
Ruby and JavaScript pass the test here.

 > "lolsome".split("")
=> ["l", "o", "l", "s", "o", "m", "e"]

I’ve compiled several language examples from Rosetta code, it’s interesting to note that they all allow the empty string in substring search and return true.

AWK

awk 'BEGIN { print index("lolsome", "") != 0 }'

C

int main() {
    printf("%dn", strstr("lolsome", "") != NULL);
    return 0;
}

C++

#include <iostream>
#include <string>

int main() {
    std::string s = "lolsome";
    std::cout << (s.find("") != -1) << "n";
    return 0;
}

C#

using System;
class MainClass {
  public static void Main (string[] args) {
    string s = "lolsome";
    Console.WriteLine(s.IndexOf("", 0, s.Length) != -1);
  }
}

Clojure

(println (.indexOf "lolsome" ""))

Go

package main

import (
    "fmt"
    "strings"
)
func main() {
    fmt.Println(strings.Index("lolsome", "") != -1)
}

Groovy

println 'lolsome'.indexOf('')

returns 0, on error returns -1

Java

class Main {
  public static void main(String[] args) {
    System.out.println("lolsome".indexOf("") != -1);
  }
}

JavaScript

"lolsome".indexOf("") != -1

Lua

s = "lolsome"
print(s:find "" ~= nil)

Perl

print index("lolsome", "") != -1;

Python

"lolsome".find("") != -1

Ruby

"lolsome".index("") != nil
Answered By: Nizam Mohamed
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.