Advanced Python Programming
上QQ阅读APP看书,第一时间看更新

Dictionaries

Dictionaries are extremely versatile and extensively used in the Python language. Dictionaries are implemented as hash maps and are very good at element insertion, deletion, and access; all these operations have an average O(1) time complexity. 

In Python versions up to 3.5, dictionaries are unordered collections. Since Python 3.6, dictionaries are capable of maintaining their elements by order of insertion.

A hash map is a data structure that associates a set of key-value pairs. The principle behind hash maps is to assign a specific index to each key so that its associated value can be stored in an array. The index can be obtained through the use of a hash function; Python implements hash functions for several data types. As a demonstration, the generic function to obtain hash codes is hash. In the following example, we show you how to obtain the hash code given the "hello" string:

    hash("hello")
# Result: -1182655621190490452

# To restrict the number to be a certain range you can use
# the modulo (%) operator
hash("hello") % 10
# Result: 8

Hash maps can be tricky to implement because they need to handle collisions that happen when two different objects have the same hash code. However, all the complexity is elegantly hidden behind the implementation and the default collision resolution works well in most real-world scenarios.

Access, insertion, and removal of an item in a dictionary scales as O(1) with the size of the dictionary. However, note that the computation of the hash function still needs to happen and, for strings, the computation scales with the length of the string. As string keys are usually relatively small, this doesn't constitute a problem in practice.

A dictionary can be used to efficiently count unique elements in a list. In this example, we define the counter_dict function that takes a list and returns a dictionary containing the number of occurrences of each value in the list:

    def counter_dict(items): 
counter = {}
for item in items:
if item not in counter:
counter[item] = 1
else:
counter[item] += 1
return counter

The code can be somewhat simplified using collections.defaultdict, which can be used to produce dictionaries where each new key is automatically assigned a default value. In the following code, the defaultdict(int) call produces a dictionary where every new element is automatically assigned a zero value, and can be used to streamline the counting:

    from collections import defaultdict
def counter_defaultdict(items):
counter = defaultdict(int)
for item in items:
counter[item] += 1
return counter

The collections module also includes a Counter class that can be used for the same purpose with a single line of code:

    from collections import Counter
counter = Counter(items)

Speed-wise, all these ways of counting have the same time complexity, but the Counter implementation is the most efficient, as shown in the following table: