
Dictionaries
Dictionaries are extremely versatile and extensively used in the Python language. Dictionaries are implemented as hash maps and are very good at element insertion, deletion, and access; all these operations have an average O(1) time complexity.
A hash map is a data structure that associates a set of key-value pairs. The principle behind hash maps is to assign a specific index to each key so that its associated value can be stored in an array. The index can be obtained through the use of a hash function; Python implements hash functions for several data types. As a demonstration, the generic function to obtain hash codes is hash. In the following example, we show you how to obtain the hash code given the "hello" string:
hash("hello")
# Result: -1182655621190490452
# To restrict the number to be a certain range you can use
# the modulo (%) operator
hash("hello") % 10
# Result: 8
Hash maps can be tricky to implement because they need to handle collisions that happen when two different objects have the same hash code. However, all the complexity is elegantly hidden behind the implementation and the default collision resolution works well in most real-world scenarios.
Access, insertion, and removal of an item in a dictionary scales as O(1) with the size of the dictionary. However, note that the computation of the hash function still needs to happen and, for strings, the computation scales with the length of the string. As string keys are usually relatively small, this doesn't constitute a problem in practice.
A dictionary can be used to efficiently count unique elements in a list. In this example, we define the counter_dict function that takes a list and returns a dictionary containing the number of occurrences of each value in the list:
def counter_dict(items):
counter = {}
for item in items:
if item not in counter:
counter[item] = 1
else:
counter[item] += 1
return counter
The code can be somewhat simplified using collections.defaultdict, which can be used to produce dictionaries where each new key is automatically assigned a default value. In the following code, the defaultdict(int) call produces a dictionary where every new element is automatically assigned a zero value, and can be used to streamline the counting:
from collections import defaultdict
def counter_defaultdict(items):
counter = defaultdict(int)
for item in items:
counter[item] += 1
return counter
The collections module also includes a Counter class that can be used for the same purpose with a single line of code:
from collections import Counter
counter = Counter(items)
Speed-wise, all these ways of counting have the same time complexity, but the Counter implementation is the most efficient, as shown in the following table:
