In the indexed files seen in previous posts, the table of indexes is accessed sequentially with the drawbacks that I already mentioned this type of access has. However, there is a much more efficient indexing technique than sequential indexing and that is tree indexing, which significantly reduces search time. In this post, I will explain the aforementioned technique.
Using sequential access, when looking for data, it has to go through the index file sequentially until we find the searched one, obtaining the memory address of the beginning of the record in the data file. As index tables are much less heavy than files with all the data, we have a considerable improvement using indexes, even if the search in the table for these is sequential.
By changing the index table of sequential access to an organization of the indexes in a data tree, it will be able to access the indexes much more quickly. These types of structures are specially indicated to speed up searches, but let’s see how it does.
The tree structure
Actually, using indexing techniques, we are creating data structures that are going to help us working with data. Sequential indexing has key-direction pairs with a linear structure, one after the other, ordered according to some criteria. For example, in the case of the supplier file of our friend Antonio, I use the name of the provider as a key, and the structure is ordered alphabetically.
Then, the main characteristic of tree structures is that they are not linear. That is, after one element does not come another and then another and so on, but after an element can go several ones.
In the following figure you can see the difference between both data structures:
Some interesting terms
To explain how the tree structure works, it is convenient to introduce a series of commonly used terms. In addition, almost certainly all the documentation you can find on this topic, will use this terminology.
- Node: Each element of the data structure is called a node.
- Parent Node: As I explained before, and as can be seen in the figure above, one, several or no nodes can come out of a node. Then, a parent node is the one from which at least one node comes out, it would be the parent for that node.
- Root node: It is the first node in the structure, the only one that does not have a Parent node.
- Leaf node: That node that has no children
After this brief explanation, I think it is easy to understand why these data structures are called trees. Each node is like a knot of the tree, from which branches are born, until it reaches another knot, from which more branches are born, and so on.
Thus, for a given node, it would have a parent node that is the one that points to our node, and it could have several child nodes that would be those to which it points.
Mind you!!, a node can only have one parent node (except the root node that has no parent node), otherwise, we would enter into a loop structure that is no longer a tree structure.
Now, that we already know what a tree structure is, I will take the opportunity to introduce certain additional concepts that we will need to know, and that can be seen in the following figure:
- Level: The level of a node will be the number of parents it has above until it reaches the root node, including it. That is the jumps back that you would have to take until you reach the beginning of the tree.
- Tree depth: The maximum level of the tree (keep in mind the first level is 0)
- Order: The maximum number of children a node can have.
Indexing by tress
Now that we know the structure of data trees, we can understand how they help speed up searches. Every time we reach a node, we will make a decision about which child node to move to. In this way, we do not have to go through all the nodes of the tree until we find the one we are looking for.
The order of a tree is a critical factor. Normally, we want it to be a small number, to keep the structure simple, and grow the tree deep rather than wide. Low-order trees show better behaviour in searches, which is the main application of these data structures. Binary trees (order 2) are widely used in databases obtaining notable improvements in search processes.
This post is part of the collection “Data Access and Storage Systems”. You can see the index of this collection here