In Python, the operators in and not in test membership in lists, tuples, dictionaries, and so on.
This article describes the following contents.
- How to use the
in operator - Basic usage
- With
if statement -
in for the dictionary (dict) -
in for the string (str)
-
not in (negation of in) -
in for multiple elements - Time complexity of
in - Slow for the list:
O(n) - Fast for the set:
O(1) - For the dictionary
-
in in for statements and list comprehensions
How to use the in operator in Python
Basic usage
x in y returns True if x is included in y, and False if it is not.
print(1 in [0, 1, 2]) # True print(100 in [0, 1, 2]) # False
Not only list, but also tuple, set, range, and other iterable objects can be operated.
print(1 in (0, 1, 2)) # True print(1 in {0, 1, 2}) # True print(1 in range(3)) # True The dictionary (dict) and the string (str) are described later.
With if statement
in returns a bool value (True, False) and can be used directly in if statement.
l = [0, 1, 2] i = 0 if i in l: print('{} is a member of {}.'.format(i, l)) else: print('{} is not a member of {}.'.format(i, l)) # 0 is a member of [0, 1, 2]. l = [0, 1, 2] i = 100 if i in l: print('{} is a member of {}.'.format(i, l)) else: print('{} is not a member of {}.'.format(i, l)) # 100 is not a member of [0, 1, 2]. Note that lists, tuples, strings, etc. are evaluated as False if they are empty, and as True if they are not. If you want to check whether an object is empty or not, you can use the object as it is.
l = [0, 1, 2] if l: print('not empty') else: print('empty') # not empty l = [] if l: print('not empty') else: print('empty') # empty "in" for the dictionary (dict)
The in operation for the dictionary (dict) tests on the key.
d = {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'} print('key1' in d) # True print('value1' in d) # False Use values(), items() if you want to test on values or key-value pairs.
print('value1' in d.values()) # True print(('key1', 'value1') in d.items()) # True print(('key1', 'value2') in d.items()) # False "in" for the string (str)
The in operation for the string (str) tests the existence of a substring.
print('a' in 'abc') # True print('x' in 'abc') # False print('ab' in 'abc') # True print('ac' in 'abc') # False not in (negation of "in")
x not in y returns the negation of x in y
print(10 in [1, 2, 3]) # False print(10 not in [1, 2, 3]) # True
The same result is returned by adding not to the entire in operation.
print(not 10 in [1, 2, 3]) # True
However, if you add not to the entire in operation, it will be interpreted in two ways, as shown below, so it is recommended to use the more explicit not in.
print(not (10 in [1, 2, 3])) # True print((not 10) in [1, 2, 3]) # False
Since in has a higher precedence than not, it is treated as the former if there are no parentheses.
The latter case is recognized as follows.
print(not 10) # False print(False in [1, 2, 3]) # False
"in" for multiple elements
If you want to check if multiple elements are included, using a list of those elements as follows will not work. It will be tested whether the list itself is included or not.
print([0, 1] in [0, 1, 2]) # False print([0, 1] in [[0, 1], [1, 0]]) # True
Use and, or or sets.
Use and, or
Combine multiple in operations using and and or. It will be tested whether both or either are included.
l = [0, 1, 2] v1 = 0 v2 = 100 print(v1 in l and v2 in l) # False print(v1 in l or v2 in l) # True print((v1 in l) or (v2 in l)) # True
Since in and not in have higher precedence than and and or, parentheses are not necessary. Of course, if it is difficult to read, you can enclose it in parentheses as in the last example.
Use sets
If you have a lot of elements you want to check, it is easier to use the set than and, or.
For example, whether list A contains all the elements of list B is equivalent to whether list B is a subset of list A.
l1 = [0, 1, 2, 3, 4] l2 = [0, 1, 2] l3 = [0, 1, 5] l4 = [5, 6, 7] print(set(l2) <= set(l1)) # True print(set(l3) <= set(l1)) # False
Whether list A does not contain the elements of list B is equivalent to whether list A and list B are relatively prime.
print(set(l1).isdisjoint(set(l4))) # True
If list A and list B are not relatively prime, it means that list A contains at least one element of list B.
print(not set(l1).isdisjoint(set(l3))) # True
Time complexity of "in"
The execution speed of the in operator depends on the type of the target object.
The measurement results of the execution time of in for lists, sets, and dictionaries are shown below.
Note that the code below uses the Jupyter Notebook magic command %%timeit and does not work when run as a Python script.
Take a list of 10 elements and 10000 elements as an example.
n_small = 10 n_large = 10000 l_small = list(range(n_small)) l_large = list(range(n_large))
The sample code below is executed in CPython 3.7.4, and of course, the results may vary depending on the environment.
Slow for the list: O(n)
The average time complexity of the in operator for lists is O(n). It becomes slower when there are many elements.
%%timeit -1 in l_small # 178 ns ± 4.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %%timeit -1 in l_large # 128 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The execution time varies greatly depending on the position of the value to look for. It takes the longest time when its value is at the end or when it does not exist.
%%timeit 0 in l_large # 33.4 ns ± 0.397 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %%timeit 5000 in l_large # 66.1 µs ± 4.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %%timeit 9999 in l_large # 127 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Fast for the set: O(1)
The average time complexity of the in operator for sets is O(1). It does not depend on the number of elements.
s_small = set(l_small) s_large = set(l_large) %%timeit -1 in s_small # 40.4 ns ± 0.572 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %%timeit -1 in s_large # 39.4 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
The execution time does not change depending on the value to look for.
%%timeit 0 in s_large # 39.7 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %%timeit 5000 in s_large # 53.1 ns ± 0.974 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %%timeit 9999 in s_large # 52.4 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If you want to repeat in operation for a list with many elements, it is faster to convert it to a set in advance.
%%timeit for i in range(n_large): i in l_large # 643 ms ± 29.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %%timeit s_large_ = set(l_large) for i in range(n_large): i in s_large_ # 746 µs ± 6.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Note that it takes time to convert a list to a set, so it may be faster to keep it as a list if the number of in operations is small.
For the dictionary
Take the following dictionary as an example.
d = dict(zip(l_large, l_large)) print(len(d)) # 10000 print(d[0]) # 0 print(d[9999]) # 9999
As mentioned above, the in operation for the dictionary tests on keys.
The key of the dictionary is a unique value as well as the set, and the execution time is about the same as for sets.
%%timeit for i in range(n_large): i in d # 756 µs ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
On the other hand, dictionary values are allowed to be duplicated like a list. The execution time of in for values() is about the same as for lists.
dv = d.values() %%timeit for i in range(n_large): i in dv # 990 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Key-value pairs are unique. The execution time of in for items() is about set + α.
di = d.items() %%timeit for i in range(n_large): (i, i) in di # 1.18 ms ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
"in" in for statements and list comprehensions
The word in is also used in for statements and list comprehensions.
l = [0, 1, 2] for i in l: print(i) # 0 # 1 # 2
print([i * 10 for i in l]) # [0, 10, 20]
Note that the in operator may be used as a conditional expression in list comprehensions, which is confusing.
l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222'] l_in = [s for s in l if 'XXX' in s] print(l_in) # ['oneXXXaaa', 'twoXXXbbb']
The first in is in for the list comprehensions, and the second in is the in operator.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.