There are many options available when implementing basic data structures, knowing at least some of them allows us to write better and more efficient code. In this article, we will cover some potentially unknown aspects of tuples, dictionaries, and lists.
Tuples are immutable. This means that once the contents of a tuple are declared, you can’t modify the contents of that tuple. Also tuples are the recommended data structure when the position of the element is information by itself, i.e coordinates (x, y).
Empty tuples act as singletons, so there is only one empty tuple. This means that if you create an empty tuple Python will return an already allocated one. This is possible because of the immutability of the tuples.
Here we can see that all the empty tuples have the same memory address.
Reusing old tuples
This is another optimization to prevent memory fragmentation, it also helps to make allocation faster.
When a tuple is no longer needed:
1- Check if it has less than 20 items
>>> a=(1, 2, 3) >>> len(a) 3
2- Developer assigns items tuple to a “free” list using the command list:
>>> a_list = list(a)
3- If you need to create a new tuple, first search the “free list”
This “free” list is divided into 20 groups, each group represents a list of tuples of length between 0 and 20, with a max of 2000 each. The first group has an empty tuple.
Here we create a tuple with a length of three and then remove it, but Python does not really remove it but move it to the “free list”, so when we create another tuple of that same length we get the same memory address of the one we just deleted.
List in Python is a contiguous array of references to other objects. The pointer to the list and its length is stored in a “list head” structure. As the length is stored in this structure, everytime a item is added or removed the list needs to be resized (reallocated).
Reusing empty deleted lists
Python also uses a free list but it’s used only for empty objects because lists can be modified.
As this is an expensive process, Python uses a method called exponential over-allocation while creating lists, so not every single operation will require a reallocation.
Every single list has a number of hidden and empty slots which are used for new items. Python will only reallocate when there is a need for more space. The new size is chosen according to the size of the list. Python uses a growth pattern like: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …
So if you have to append an item to a list of length 16 Python will resize it to 25 slots before adding the new item, the rest of the slots are hidden.
Dictionaries are implemented with hash tables using pseudo-random probing (https://research.cs.vt.edu/AVresearch/hashing/pseudo.php). For this reason the keys must be hashables (and thus immutable) and you can’t use mutable structures as keys.
Hash tables in dictionaries
A hash table is a sparse array (i.e., an array that always has empty cells). These cells are usually called “buckets.” In Python we have one bucket per item, which contains two fields: a reference to the key and a reference to the value of the item. All buckets have the same size, the access to buckets is done by offset.
Python always tries to keep at least 1/3 of the buckets empty; otherwise, the hash table is copied to a new location with room for more buckets.
To put an item in a hash table, the first step is to calculate the hash value of the item key, which is done with the hash() built-in function
Hash tables use a lot of space, so they should be sparse to work well. This is the main reason why it is not recommended to use a dictionary to store a large number of records.
If you need to store a large number of items, it is recommended it is recommended to use a database.
The hash table algorithm
Let’s say we want to access to my_dict[search_key], this is what Python does:
When a collision happens then Python does the following:
We use these data structures on a daily basis, but the inner workings of their implementation are not known by most Python developers. These data structures can help you optimize and reduce memory usage in order to improve code execution time.
This article was created with help from Andrés Escobar