Random access files allow us to go directly to recover the desired record, without having to read all the previous ones before. In this way they solve the main limitation of sequential access files, which was precisely their method of accessing the data, the need to go through the entire file from the beginning until we reached the point that interested us. This limitation is high time consuming and became larger and larger as the size of our file increases. Working with sequential access files having a high number of records, the reading times could take too long, to the point of not being operative and making sense of a migration to a random access file.
It is important to note that our “database”, if it can be called that, is still a file.
I said before that in random access files, we could go directly to recover the desired registry, without having to read all the previous ones before. For this, it is necessary to know exactly in what position, within the file, is the record we are looking for. But.., How is it possible to know in what position a record is?
Basically, using two rules:
- The registration length is strictly defined. That is, all the registers are the same length, occupied the same space in memory.
- And the registration number is used (a kind of pointer) to position ourselves in the desired record.
By following these simple rules, it is very easy to know in which memory position a register began. Suppose that each record occupied 50 bytes, if we want to retrieve record 5 we had to go read to position 201 and ended up in 250 (record 1 would extend from position 1 to 50, record 2 from 51 to 100, 3 from 101 to 150, 4 from 151 to 200, and finally record 5, from position 201 to 250).
Features of random access files
After what was seen in the previous exercise, we already have to be able to point out the main characteristics of random access data files:
- Fixed record length: We define a fixed data length for each field and by extension, for the record.
- Instant positioning at the beginning of the record to be read: You do not have to go through the file from the beginning, as was the case with sequential access files.
- They support mixed opening: We can open the file so that it supports the modes of writing and reading, at the same time.
- Supports multi-user access: Although it requires that specific and limited zones be established so that the different accesses are controlled, and do not step on each other, this type of access allows several users to work on the file simultaneously.
- Maximum dimensioning of the file: When working with a fixed length structure of records, it is usual to set the maximum number of records that the file can store, and thus define the maximum size that can be occupied in memory, and reserve that space in it.
The physical support for this type of files is the hard disk, we can no longer store the information on magnetic tapes, since these do not physically support random access, and due to their own operation, they force you to sequential access.
Random access file example
After explaining how random access files work, to understand it better, I propose an example, a case study: we are going to help Antonio improve his productivity in his SuperGades store. The problem you have is that your provider file, which is a sequential access file, already has 1,000 records, and every time you look for a provider’s data it takes an average of 1 minute. Remember that we are still in the yester century and we do not have much storage or computing capacity, we will face another situation when SuperGades is grown and evolved for using Big Data.
Antonio knows that he would gain a lot of time if he got the provider data from the file in no more than 5 seconds, and someone has explained to him that he can get it if he migrates all his data to a random access file.
I propose the following exercise to walk the road with Antonio and understand how random access files work.
Step 1: The sequential file
Antonio’s supplier sequential file was:
Smith fruits Peter Smith 607454545 <END> Southern vegetables Williams Doors 652854874 <END> Green way Sugar Elisabeth Roberts 622525885 <END>
As this file has grown to 1,000 suppliers, in this exercise we works with a “reduced” version to make it operative.
Suppose now our file is:
Smith fruits Peter Smith 607454545 <END> Southern vegetables Williams Doors 652854874 <END> Green way Sugar Elisabeth Roberts 622525885 <END> …. ….. ….. Wonderful eggs Hirali Jansson 652879137 <END> …. …. …. Gourgeous rice Mary Cristofen 677889922 <END>
Where the dots indicate a space in which there would be several records, to complete the 1,000 records in total
Step 2: Defining the record structure
As we had seen, in order to implement random access, the length of the records had to be fixed. To define the length of a record, we have to define the lengths of its fields.
Working with our file, we use ANSI characters, so each of them will take a byte in memory.
We define the length in characters/bytes of the different fields:
- Field nº1: 30 bytes (provider name)
- Field nº2: 30 bytes (name of the contact in the provider)
- Field nº3: 9 bytes (contact phone number)
- Field nº4: 0 bytes . We eliminated the end-of-record mark, since we actually used it in a sequential access file for synchronism. Now, with random access, we are not going to use it, but we will place ourselves at the beginning of the record that we want to read, and we read as many characters as the length of the record, in our case: 69.
Visually, representing it only for the first record, it would look something like this:
In those fields where I am not sure how long the data will be, I define it with a certain margin, so in most cases we will be underusing memory, reserving space for data that we do not really need. However, it must be done this way because we want to ensure that the longest data we can have can be recorded.
Step 3: From the sequential access file to the random access file
Once the field lengths were defined, Antonio would have to pass all the supplier data to the new data structure. This obviously would not be done manually but using a program that would perform this task.
Step 4: Accessing a random record
If we wanted to access the record nº 777, we would place ourselves in the 53,545 position and read 69 bytes.
How do we calculate the address? We simply multiply 69 bytes that each record occupies by 776, which gives us a value of 53,544 bytes. The record nº 777 would start right in the next byte.
Step 5: Comparison of accessing times
In a sequential file, to get to the record nº 777, we would have to read all the previous records. Those records would be shorter, since no extra space is reserved in memory that is not filled with data but we still have to read a lot of data before reaching the record nº 777
Suppose the average record length in the sequential access file is 50 bytes versus the 69 bytes record length of the random access file. In addition, the reading time of a character is 1 millisecond. By the time we got to record 777, we would have read: 50*776 = 38,800 characters, which would have taken us almost 39 seconds. An extremely high waiting time.
On the contrary, with random access, knowing where we have to position the reader, it is a matter of milliseconds, that is, practically perceived by the user as immediate.
This is where Antonio gets an improvement in productivity due to the considerable reduction of waiting times.
Step 6: Experimenting
We create a new database, we give it any name we want, for example: BBDD2
Step 7: Creating a new module
Go to the tab “DATABASE TOOLS”, click on “Visual Basic”.
Then, in the Insert/Module menu, we create a new module. In my case, I leave it with the default name: “Module1”, since we are only going to use it to practice.
Step 8: Programming a procedure to create the sequential starting file
The first thing we will do is create a file with the data. We will create the suppliers indicated in previous steps, and we will fill in the rest of the positions with generic supplier names, in the end we will have a file with 1,000 records.
The procedure that would perform this task would be:
Sub CreateSequentialFile() Dim i As Integer Dim supplier As String Dim contact As String On Error GoTo e ChDrive ("C") ChDir "C:\test" Open "Suppliers.txt" For Output As #1 'Suppliers registration Print #1, "Smith fruits" Print #1, "Peter Smith" Print #1, "607454545" Print #1, "<END>" Print #1, "Southern vegetables" Print #1, "Williams Doors" Print #1, "652854874" Print #1, "<END>" Print #1, "Green way Sugar" Print #1, "Elisabeth Roberts" Print #1, "622525885" Print #1, "<END>" For i = 4 To 766 supplier = "Supplier nº" & i contact = "Contact in Supplier nº" & i Print #1, supplier Print #1, contact Print #1, "xxxxxxxxx" Print #1, "<END>" Next Print #1, "Wonderful eggs" Print #1, "Hirali Jansson" Print #1, "652879137" Print #1, "<END>" For i = 778 To 999 supplier = "Supplier nº" & i contact = "Contact in Supplier nº" & i Print #1, supplier Print #1, contact Print #1, "xxxxxxxxx" Print #1, "<END>" Next Print #1, "Gourgeous rice" Print #1, "Mary Cristofen" Print #1, "677889922" Print #1, "<END>" 'Close file: Close #1 MsgBox ("Upload 1000 records") Exit Sub e: MsgBox (Err.Description) End Sub
Step 9: Transforming the supplier file to the new fixed structure of the random access file
Taking into account the structure of the random access file that we have defined in step 2. We have to go through the supplier file and extend the data in the different fields so that they occupy the length set in the new structure. For making it visible we will fill in the added spaces with a dot “.” character.
To do this we will use the following procedure:
Sub FromSequentialToRandom() ChDrive ("C") ChDir "C:\test" Dim line As String Dim writing As String Dim MyChar As String Dim registerType As Integer Dim i As Integer Dim j As Integer Open "Suppliers.txt" For Input As #1 Open "SuppliersRandomAccess.txt" For Output As #2 line = "" registerType = 1 Do While Not EOF(1) MyChar = Input(1, #1) If MyChar = Chr(13) Then If registerType = 4 Then registerType = 1 line = "" Else Select Case (registerType) Case 1: writing = line j = Len(line) For i = j + 1 To 30 writing = writing & "." Next i Debug.Print writing Print #2, writing line = "" Case 2: writing = line j = Len(line) For i = j + 1 To 30 writing = writing & "." Next i Debug.Print writing Print #2, writing line = "" Case 3: writing = line j = Len(line) For i = j + 1 To 9 writing = writing & "." Next i Debug.Print writing Print #2, writing line = "" End Select registerType = registerType + 1 End If ElseIf (MyChar <> Chr(10)) Then line = line + MyChar Else End If Loop Close #1 Close #2 End Sub
Step 10: Reducing the file size
To work with the file in practice, we will reduce the number of suppliers to 20. The goal is to be able to work with integer variables, without having to handle large numbers, and not having overflow problems.
Step 11: Moving directly to a random record
We have to first calculate the position in which the record would begin, and then move to it and read a certain amount of data. All tied to a strict structure.
To do this, we use the following procedure:
Sub GoToRecord() ChDrive ("C") ChDir "C:\test" Dim RecordNumber As Integer Dim position As Integer Dim counter As Integer Dim MyChar As String Dim i As Integer Dim text As String Open "SuppliersRandomAccess.txt" For Input As #1 RecordNumber = CInt(InputBox("Introduce the record number. Be sure it is between 1 and 20", "Record search")) position = (30 + 30 + 9 + 6) * (RecordNumber - 1) position = position + 1 counter = 0 text = "" Do While Not EOF(1) MyChar = Input(1, #1) counter = counter + 1 If position = counter Then For i = position To (position + 74) text = text + MyChar MyChar = Input(1, #1) Next i Debug.Print text Exit Do End If Loop Close #1 End Sub
This post is part of the collection “Data Access and Storage Systems”. You can see the index of this collection here