Binary search trees

A binary search tree is a data sorted structure where each record can be linked with a máximum of two other records. We are going to pay special attention to binary search trees, as they are very popular and widely used in databases. As I anticipated in the previous post, binary trees are of order 2, that is, each node can have a maximum of two children. Additionally, for being a search tree, all nodes have to meet the following conditions:

  • if the node has a left child, the value of it has to be smaller than the value of the father.
  • if the node has a right child, the value of it has to be higher than the value of the father.

This forces that the data we are saving are sortable, so we can determine if one is bigger or smaller than another. Given the conditions above that a binary search tree has to meet, duplicates are not allowed, they have to be unique data.

Let’s look at it with a simple example where data are integers. The binary search tree for our example is:

Binary search tree
Binary search tree

In the figure above, the root node records the value 55 and has two children: the left (53), smaller than the father, and the right (59) bigger than the father. Through the rest of the tree, all nodes meet the condition established above, that the left child is smaller and the right one is bigger than the father.

Binary search tree example

Let’s see how binary search trees would work in the case of our friend Antonio’s supplier file.

Step 1: The starting file

Antonio’s supplier file and the index structure we had created was as follows:

indexed structure
indexed structure

The key used to index was the provider name, from which we obtained the in-memory address of the entire data block of the corresponding provider. Now it’s time to organize that table as a binary search tree.

Step 2: Create the binary search tree

Our supplier file has 1,000 records, but in this exercise, we will use only the 7 that I have written, to keep the complexity under control and to be able to graphically present the binary tree. Surely it does not escape you that a binary tree of 1,000 nodes is a bit complicated to paint.

Each key-value pair would be a node. The key is the name of the provider and the value is the address where we can find all the data of the provider identified by the key.

Each node has two data: provider and address. The next thing to do is add to the nodes, two other data, two pointers, one that points to the left child node and another that points to the right child node. A pointer is a variable that points to a memory address. The address variable that points to the provider’s data block is a pointer.

This way, the structure of a node for our binary search tree has the following parameters:

  • key: Name of the provider.
  • Address: Pointer to the provider’s data block.
  • Left: Pointer to the left child of the node
  • Right: Pointer to the right child of the node.

The representation of our binary search tree would be:

Creating a binary search tree
Creating a binary search tree

We can observe that the tree meets the conditions of binary search trees: for any node, its left child is smaller than it, and its right child is bigger. We can build the tree because the values of the nodes (provider name) are sortable and do not support duplicates.

If a node has no left child, the left pointer of that node would be null. Similarly, if it does not have the right child, the right pointer on that node would be null.

The structure of a node could be expressed as follows:

Node{
    Provider name (text)
    Address (pointer)
    Left (pointer)
    Right (pointer)
}

With this pseudocode I simply intend to explain the structure, placing the type of data in parentheses.

Following this structure, we could define the tree recursively as follows:

Tree{
    Provider Name - Root Node (text)
    Provider Address - Root Node (pointer)
    Left child tree (pointer to tree)
    Right child tree (pointer to tree)
}

In other words, we can consider the two child nodes as the root nodes of two new binary search trees.

Step 3: Bringing all the data to the binary tree

The binary tree we have seen in the previous step is nothing more than a data structure that stores a value or key and three pointers. The key that I had chosen to sort the tree is the name of the provider and it will be the one that we use to find a provider when searching for data. It is a key that can be ordered and that does not admit duplicates. Then we had the Left and Right pointers, which are part of the tree structure, and the direction pointer, which points to the memory area where all the supplier data is.

The data of this structure are going to be in the memory of the System, in the RAM, so the search in the binary tree will be quite fast. However, when we find the provider sought in the tree structure, we just got a pointer to provider data. Then, if we have big data, likely we will have to go to the secondary memory, surely on the hard disk, to the position that points the pointer direction. This last access will be slower.

But, if the provider data does not need much memory to be stored, and we do not have many providers, it is possible that all the information can be managed in the memory of the System, so we could work directly with all the data in the RAM, without accessing the hard disk.

Our new binary search tree, for supplier data, would be:

Binary tree with all the data
Binary tree with all the data

And the structure of the tree expressed in pseudocode would be:

Tree{
    Provider Name-Root Node (text)
    Provider Contact-Root Node (text)
    Phone Provider-Root Node (text)
    Left child tree (pointer to tree)
    Right child tree (pointer to tree)
}

Step 4: Binary tree level

Antonio has a supplier file with 1,000 registered suppliers. And you want to bring it all into a binary search tree structure.

What minimum level would the binary tree has to reach?

Solution:

It would be needed up to level 9.

We have to consider that the tree is balanced since it asks for the minimum level, that is, the minimum depth.

Knowing that the root is considered level 0 and that if one level is complete, the nodes in that level are: 2 raised to the level number.

We build the following table:

LevelNodes in levelAcumulated nodes
011
123
247
3815
41631
53263
664127
7128255
8256511
95121023
Nodes per level

Where we see that we have to reach at least level 9 in order to have 1,000 nodes in our tree.

NOTE:

This post is part of the collection “Data Access and Storage Systems”. You can see the index of this collection here