Random access files

Random access files allow us to go directly to recover the desired record, without having to read all the previous ones before. In this way they solve the main limitation of sequential access files, which was precisely their method of accessing the data, the need to go through the entire file from the beginning until we reached the point that interested us. This limitation is high time consuming and became larger and larger as the size of our file increases. Working with sequential access files having a high number of records, the reading times could take too long, to the point of not being operative and making sense of a migration to a random access file.

It is important to note that our “database”, if it can be called that, is still a file.

I said before that in random access files, we could go directly to recover the desired registry, without having to read all the previous ones before. For this, it is necessary to know exactly in what position, within the file, is the record we are looking for. But.., How is it possible to know in what position a record is?

Basically, using two rules:

  • The registration length is strictly defined. That is, all the registers are the same length, occupied the same space in memory.
  • And the registration number is used (a kind of pointer) to position ourselves in the desired record.

By following these simple rules, it is very easy to know in which memory position a register began. Suppose that each record occupied 50 bytes, if we want to retrieve record 5 we had to go read to position 201 and ended up in 250 (record 1 would extend from position 1 to 50, record 2 from 51 to 100, 3 from 101 to 150,  4 from 151 to 200, and finally record 5, from position 201 to 250).

Features of random access files

After what was seen in the previous exercise, we already have to be able to point out the main characteristics of random access data files:

  • Fixed record length: We define a fixed data length for each field and by extension, for the record.
  • Instant positioning at the beginning of the record to be read: You do not have to go through the file from the beginning, as was the case with sequential access files.
  • They support mixed opening: We can open the file so that it supports the modes of writing and reading, at the same time.
  • Supports multi-user access: Although it requires that specific and limited zones be established so that the different accesses are controlled, and do not step on each other, this type of access allows several users to work on the file simultaneously.
  • Maximum dimensioning of the file: When working with a fixed length structure of records, it is usual to set the maximum number of records that the file can store, and thus define the maximum size that can be occupied in memory, and reserve that space in it.

Physical support

The physical support for this type of files is the hard disk, we can no longer store the information on magnetic tapes, since these do not physically support random access, and due to their own operation, they force you to sequential access.

Random access file example

After explaining how random access files work, to understand it better, I propose an example, a case study: we are going to help Antonio improve his productivity in his SuperGades store. The problem you have is that your provider file, which is a sequential access file, already has 1,000 records, and every time you look for a provider’s data it takes an average of 1 minute. Remember that we are still in the yester century and we do not have much storage or computing capacity, we will face another situation when SuperGades is grown and evolved for using Big Data.

Antonio knows that he would gain a lot of time if he got the provider data from the file in no more than 5 seconds, and someone has explained to him that he can get it if he migrates all his data to a random access file.

I propose the following exercise to walk the road with Antonio and understand how random access files work.

Step 1: The sequential file

Antonio’s supplier sequential file was:

Smith fruits
Peter Smith
607454545
<END>
Southern vegetables
Williams Doors
652854874
<END>
Green way Sugar
Elisabeth Roberts
622525885
<END>

As this file has grown to 1,000 suppliers, in this exercise we works with a “reduced” version to make it operative.

Suppose now our file is:

Smith fruits
Peter Smith
607454545
<END>
Southern vegetables
Williams Doors
652854874
<END>
Green way Sugar
Elisabeth Roberts
622525885
<END>
….
…..
…..
Wonderful eggs
Hirali Jansson
652879137
<END>
….
….
….
Gourgeous rice
Mary Cristofen
677889922
<END>

Where the dots indicate a space in which there would be several records, to complete the 1,000 records in total

Step 2: Defining the record structure

As we had seen, in order to implement random access, the length of the records had to be fixed. To define the length of a record, we have to define the lengths of its fields.

Working with our file, we use ANSI characters, so each of them will take a byte in memory.

We define the length in characters/bytes of the different fields:

  • Field nº1: 30 bytes (provider name)
  • Field nº2: 30 bytes (name of the contact in the provider)
  • Field nº3: 9 bytes (contact phone number)
  • Field nº4: 0 bytes . We eliminated the end-of-record mark, since we actually used it in a sequential access file for synchronism. Now, with random access, we are not going to use it, but we will place ourselves at the beginning of the record that we want to read, and we read as many characters as the length of the record, in our case: 69.

Visually, representing it only for the first record, it would look something like this:

Random access file structure
random access file structure

In those fields where I am not sure how long the data will be, I define it with a certain margin, so in most cases we will be underusing memory, reserving space for data that we do not really need. However, it must be done this way because we want to ensure that the longest data we can have can be recorded.

Step 3: From the sequential access file to the random access file

Once the field lengths were defined, Antonio would have to pass all the supplier data to the new data structure. This obviously would not be done manually but using a program that would perform this task.

Step 4: Accessing a random record

If we wanted to access the record nº 777, we would place ourselves in the 53,545 position and read 69 bytes.

How do we calculate the address? We simply multiply 69 bytes that each record occupies by 776, which gives us a value of 53,544 bytes. The record nº 777 would start right in the next byte.

Step 5: Comparison of accessing times

In a sequential file, to get to the record nº 777, we would have to read all the previous records. Those records would be shorter, since no extra space is reserved in memory that is not filled with data but we still have to read a lot of data before reaching the record nº 777

Suppose the average record length in the sequential access file is 50 bytes versus the 69 bytes record length of the random access file. In addition, the reading time of a character is 1 millisecond. By the time we got to record 777, we would have read: 50*776 = 38,800 characters, which would have taken us almost 39 seconds. An extremely high waiting time.

On the contrary, with random access, knowing where we have to position the reader, it is a matter of milliseconds, that is, practically perceived by the user as immediate.

This is where Antonio gets an improvement in productivity due to the considerable reduction of waiting times.

Step 6: Experimenting

We create a new database, we give it any name we want, for example: BBDD2

Step 7: Creating a new module

Go to the tab “DATABASE TOOLS”, click on “Visual Basic”.

Then, in the Insert/Module menu, we create a new module. In my case, I leave it with the default name: “Module1”, since we are only going to use it to practice.

Step 8: Programming a procedure to create the sequential starting file

The first thing we will do is create a file with the data. We will create the suppliers indicated in previous steps, and we will fill in the rest of the positions with generic supplier names, in the end we will have a file with 1,000 records.

The procedure that would perform this task would be:

Sub CreateSequentialFile()

Dim i As Integer
Dim supplier As String
Dim contact As String


On Error GoTo e

ChDrive ("C")
ChDir "C:\test"

Open "Suppliers.txt" For Output As #1

'Suppliers registration
Print #1, "Smith fruits"
Print #1, "Peter Smith"
Print #1, "607454545"
Print #1, "<END>"
Print #1, "Southern vegetables"
Print #1, "Williams Doors"
Print #1, "652854874"
Print #1, "<END>"
Print #1, "Green way Sugar"
Print #1, "Elisabeth Roberts"
Print #1, "622525885"
Print #1, "<END>"

For i = 4 To 766
supplier = "Supplier nº" & i
contact = "Contact in Supplier nº" & i
Print #1, supplier
Print #1, contact
Print #1, "xxxxxxxxx"
Print #1, "<END>"
Next

Print #1, "Wonderful eggs"
Print #1, "Hirali Jansson"
Print #1, "652879137"
Print #1, "<END>"

For i = 778 To 999
supplier = "Supplier nº" & i
contact = "Contact in Supplier nº" & i
Print #1, supplier
Print #1, contact
Print #1, "xxxxxxxxx"
Print #1, "<END>"
Next

Print #1, "Gourgeous rice"
Print #1, "Mary Cristofen"
Print #1, "677889922"
Print #1, "<END>"

'Close file:
Close #1
MsgBox ("Upload 1000 records")
Exit Sub
e:
MsgBox (Err.Description)
End Sub

Step 9: Transforming the supplier file to the new fixed structure of the random access file

Taking into account the structure of the random access file that we have defined in step 2. We have to go through the supplier file and extend the data in the different fields so that they occupy the length set in the new structure. For making it visible we will fill in the added spaces with a dot “.” character.

To do this we will use the following procedure:

Sub FromSequentialToRandom()

ChDrive ("C")
ChDir "C:\test"

Dim line As String
Dim writing As String
Dim MyChar As String
Dim registerType As Integer
Dim i As Integer
Dim j As Integer

Open "Suppliers.txt" For Input As #1
Open "SuppliersRandomAccess.txt" For Output As #2

line = ""
registerType = 1

Do While Not EOF(1)
    MyChar = Input(1, #1)
    If MyChar = Chr(13) Then
        If registerType = 4 Then
            registerType = 1
            line = ""
        Else
            Select Case (registerType)
                Case 1:
                    writing = line
                    j = Len(line)
                    For i = j + 1 To 30
                        writing = writing & "."
                    Next i
                    Debug.Print writing
                    Print #2, writing
                    line = ""
                Case 2:
                    writing = line
                    j = Len(line)
                    For i = j + 1 To 30
                        writing = writing & "."
                    Next i
                    Debug.Print writing
                    Print #2, writing
                    line = ""
                Case 3:
                    writing = line
                    j = Len(line)
                    For i = j + 1 To 9
                        writing = writing & "."
                    Next i
                    Debug.Print writing
                    Print #2, writing
                    line = ""
            End Select
            registerType = registerType + 1
        End If
    ElseIf (MyChar <> Chr(10)) Then
        line = line + MyChar
    Else
    End If

Loop


Close #1
Close #2

End Sub

Step 10: Reducing the file size

To work with the file in practice, we will reduce the number of suppliers to 20. The goal is to be able to work with integer variables, without having to handle large numbers, and not having overflow problems.

Step 11: Moving directly to a random record

We have to first calculate the position in which the record would begin, and then move to it and read a certain amount of data. All tied to a strict structure.

To do this, we use the following procedure:

Sub GoToRecord()


ChDrive ("C")
ChDir "C:\test"

Dim RecordNumber As Integer
Dim position As Integer
Dim counter As Integer
Dim MyChar As String
Dim i As Integer
Dim text As String


Open "SuppliersRandomAccess.txt" For Input As #1

RecordNumber = CInt(InputBox("Introduce the record number. Be sure it is between 1 and 20", "Record search"))
position = (30 + 30 + 9 + 6) * (RecordNumber - 1)
position = position + 1
counter = 0
text = ""

Do While Not EOF(1)
    MyChar = Input(1, #1)
    counter = counter + 1
    If position = counter Then
    For i = position To (position + 74)
        text = text + MyChar
        MyChar = Input(1, #1)
    Next i
    Debug.Print text
    Exit Do
    End If
Loop

Close #1
End Sub

NOTE:

This post is part of the collection “Data Access and Storage Systems”. You can see the index of this collection here