NDCODE: Blog

Home > Blog > 14/01/2022

Log-structured JSON database—by Nick Downing

Normally I would write a blog post coincident with the release of a new version of the public website you are reading; in this case there is some heavy refactoring going on and so I have decided to just document the process of the work and let it be viewed and interacted with later when ready for release.

NoSQL discussion

One of the early decisions I made when teaching myself web development was that I would not be using SQL for the backend. I simply believe the disadvantages outweigh the advantages for what I am trying to do. And I’m probably not the only person with this view, because the NoSQL movement has been gaining traction.

SQL pros:

Complex queries are possible: range queries, joins, and the like.
SQL server can serve multiple webservers, increasing scalability.

SQL cons:

Deployment is a hassle, need to make sure the SQL server process is configured and running before launching the webserver process.
SQL queries are submitted as code and need to be tokenized and parsed at the server for every request, which is not efficient.
Administration has a steep learning curve, for example you cannot just view or edit the database file—you need to craft queries for troubleshooting and activities like migration or structure upgrades.
Relational database capabilities are overkill for many trivial applications, such as storing a username against a session key.

I want my application to be as self-contained and self-configuring as possible, so basically I would just check out the website’s repository on the intended Virtual Private Server (VPS) and launch it. In a traditional configuration you have a long list of other services to configure first, such as PHP, PHP-FPM, MySQL, etc, etc, and setting up or migrating to a new server could be days of work with tedious troubleshooting. There are of course cases where these activities have a payoff (such as if you require a highly scalable website), but my case is not one of them.

First try—all JSON approach

Therefore, for a first cut I simply stored my website’s data as JSON files in a hidden folder of the site’s directory. The hidden folder and JSON files were created as required if not already existent. And a few sophisticated features were built around this to make things workable:

After first access, each JSON file was cached in RAM for the lifetime of the webserver process.
After modification, the JSON files were not written back immediately, but a timer was set to guarantee modified data was written in 5 seconds or less.
Writeback was done to a temporary file followed by a renaming step, and if the site crashed or lost power during renaming it could recover.

Admittedly, I could have constructed a similar system using SQLite, and I would then have access to complex queries which I do not with JSON, but I felt the added complexity was not worthwhile. If I need to do complex queries, I will simply write them in Javascript, including the maintenance of any indices needed to carry out the queries efficiently. This will give me complete control, rather than simply hoping the SQL server optimizes the indices and queries well.

My JSON database was sufficient for building the prototype blog and online store that you can interact with now, but it did suffer from a serious problem which is that upon successful launch, there could be thousands of customers and/or orders and the database could potentially grow to a huge size. Then it would not be feasible to hold the entire file in memory or write it every 5 seconds.

Second try—log-structured JSON approach

To solve the scalability issue, my plan has always been to allow the JSON tree to be read into memory lazily as required (and cached, with some kind of cache policy implemented), and for modifications to be written to the end of the file. Thus the file would be log-structured and simply keep growing forever, except for the intervention of a daily cleaning routine which would copy all the live (still reachable) parts of the log into a new log to reduce the size.

The database I created with this approach is called LogJSON. It means a database that appears to the calling code like a huge JSON tree, but one which can be accessed efficiently, and which has transaction ability to avoid concurrent updates corrupting the tree. The log-structuring is more or less hidden from the calling code, which provides the illusion of a mutable tree, even though data once written is immutable in the backend.

Why log-structured? Mainly for simplicity and efficiency. If you allow deletion in a data store, then you now have to track empty space and you have fragmentation to worry about. In a filesystem such as ext4, a good deal of the complexity arises from this. Another example of an allocator with free-space management is malloc() in C. But malloc() is slow, as it needs to scan the free list at each allocation (at least in the traditional implementation), and memory-inefficient, as it does not support fragmented allocations in the way that ext4 does.

A more efficient approach to dealing with fragmentation and dead objects, that was popularized with object-oriented languages like Java and Smalltalk, is to allocate new objects contiguously from a region of spare memory instead of scanning for a free spot every time, and perform a periodic garbage collection to reclaim free space. It was controversial at the time because of garbage collection pauses, but if you can perform garbage collection concurrently with normal processing, there should be no significant impact.

For my application, it seemed simpler to implement a daily copy from old to new log than to deal with free-space management. And there are certain other benefits, such as having a series of snapshots of the database to aid in trouble-shoooting. I also liked the idea of the log being human-readable (traditional for web developers, who like ASCII formats such as JSON rather than binary formats), and this would have complicated the free-space management considerably.

`LogJSON` implementation

Root object

The structure of the log is extraordinarily simple, it basically just writes the new root JSON object to the end of the log each time, encoded by Buffer.from(JSON.stringify(...), 'utf-8'), and then surrounded by angle brackets ‘<’ and ‘>’. The purpose of the angle brackets is to let us resynchronize to the latest written root object after the system is rebooted (including for power loss) or the server is restarted. Here is a very simple example of a LogJSON file:

<"Hello, world">
<"Go away">

In the above example, two transactions were undertaken on an empty file. The first one initialized the root to "Hello, world" (a Javascript string, which is valid JSON), the second one overwrite the root with "Go away" (also valid JSON). Now suppose the server had crashed during the writing of the second transaction, leaving the file looking something like:

<"Hello, world">
<"Go aw

Then, upon restarting the server, it would resynchronize to the last occurrence of ‘>’ that is followed by a newline, and the root object would appear to be "Hello, world". The requirement that ‘>’ be followed by a newline, prevents it being tricked by occurrences of ‘>’ inside strings, where newline has to be encoded as ‘\n’ rather than a newline character itself.

References to objects

Whilst the above method of writing the root object allows for a log-structured JSON file and transaction-based atomic updates to that file, we still need a method to allow lazy access to the required parts of the tree and to allow updates without rewriting the entire tree. This is done by having a special way of writing the JSON Array and Object types (basically those JSON elements that can contain further JSON).

We will use Array as an example. Suppose a root element of ["Hello", "Goodbye"] needs to be written in an empty file. It will be done in two parts. Firstly, the element itself will be written, without angle brackets. And then a new root element will be written, with angle brackets, that contains a reference to the just-written element, rather than the full text of the element itself:

[
  "Hello",
  "Goodbye"
]
<[
  0,
  26
]>

A reference is always constructed as a 2-element list, here [0, 26], containing the file pointer and length of the previously written JSON.

So in the above example, it’s possible to open the file, synchronize to the root element using the angle brackets, and then read the root element as a reference to an unread portion of the file, without actually having to read the contents of the root element until they are required.

Nested objects

To present a more realistic example, we will consider a deeper tree. The root of the tree will be a JSON Object type which is keyed by a person's nickname. This in turn leads to a JSON Object type containing some details about the person. The plain JSON file to be encoded in LogJSON format is:

{
  "jane": {
    "name": "Jane Doe",
    "address": "123 Main St, Smallville",
    "age": 30
  },
  "mike": {
    "name": "Mike Rho",
    "address": "125 Main St, Smallville",
    "age": 32
  }
}

And the encoded version in LogJSON is:

{
  "name": "Jane Doe",
  "address": "123 Main St, Smallville",
  "age": 30
}
{
  "name": "Mike Rho",
  "address": "125 Main St, Smallville",
  "age": 32
}
{
  "jane": [
    0,
    77
  ],
  "mike": [
    78,
    77
  ]
}
<[
  156,
  65
]>

We can see that the process of looking up a user can now be more efficient than plain JSON. First we read the root object [156, 65] and the referred-to object containing the keys "jane" and "mike". Then, we find the person we are looking for, let’s say Mike, and the reference [78, 77] to Mike’s record. And we only need to read Mike’s record to proceed, we do not need Jane’s record at this stage. We do know there is a user jane, but her details are irrelevant to the current query.

Modifying the tree

Now suppose Mike’s address is changing. Recalling that the LogJSON file, once written, is immutable, we need to write a new record for Mike containing the changed address, as follows:

{
  "name": "Mike Rho",
  "address": "20 Another St, Smallville",
  "age": 32
}

And then we need to rewrite the parent object so that the nickname "jane" will still point to her original record, but the nickname "mike" will redirect to his newly written record:

{
  "jane": [
    0,
    77
  ],
  "mike": [
    240,
    79
  ]
}

Finally, we need to rewrite the root object to point to the newly written subtree:

<[
  320,
  66
]>

And of course the file will not be considered modified (the old root will be active) until the new root has been fully written.

Modified tree‒full example

After the LogJSON file creation and then modification described in the previous section, the entire file looks like this:

{
  "name": "Jane Doe",
  "address": "123 Main St, Smallville",
  "age": 30
}
{
  "name": "Mike Rho",
  "address": "125 Main St, Smallville",
  "age": 32
}
{
  "jane": [
    0,
    77
  ],
  "mike": [
    78,
    77
  ]
}
<[
  156,
  65
]>
{
  "name": "Mike Rho",
  "address": "20 Another St, Smallville",
  "age": 32
}
{
  "jane": [
    0,
    77
  ],
  "mike": [
    240,
    79
  ]
}
<[
  320,
  66
]>

We can see that it contains a snapshot of how the file looked after each modification, since if you read the previous root element [156, 65] it would lead to Mike’s old address, but if you read the latest root element [320, 66] it would lead to Mike’s new address. In either case Jane’s address will be the same. A disadvantage of this scheme is that the supporting elements such as the Object containing the list of nicknames, and the root, must be rewritten each time, which wastes space. But the garbage collector tidies this periodically.

Database maintenance

Although the database is human-readable, it’s still somewhat difficult to follow, especially given that file viewers generally do not display the file positions in decimal alongside the text. Therefore, for database maintenance, you can use the tools logjson_to_json and json_to_logjson provided within the proposed node.js LogJSON package. These tools give you a manual way to view and edit the tree, and perform garbage collection if you need to‒but note that they cannot be run on a live database, you need to stop your webserver first.

For example, running logjson_to_json on the complete file of the previous example gives this JSON:

{
  "jane": {
    "name": "Jane Doe",
    "address": "123 Main St, Smallville",
    "age": 30
  },
  "mike": {
    "name": "Mike Rho",
    "address": "20 Another St, Smallville",
    "age": 32
  }
}

And running it back through json_to_logjson into a clean log file gives a compacted version of the log:

{
  "name": "Jane Doe",
  "address": "123 Main St, Smallville",
  "age": 30
}
{
  "name": "Mike Rho",
  "address": "20 Another St, Smallville",
  "age": 32
}
{
  "jane": [
    0,
    77
  ],
  "mike": [
    78,
    79
  ]
}
<[
  158,
  65
]>

Garbage collection

I have implemented the garbage collection in a rather sophisticated way, taking advantage of the immutability of the log file once written in order that garbage collection can proceed in the background without disturbing the operation of the website.

A garbage collection is triggered every midnight in Universal Time Coordinated (UTC), that is, when the date rolls over. A new log file is constructed by copying all the live data from the old log file into it. And then transaction processing is briefly paused whilst we atomically rename the old log file out of the way to a dated name (contains the year, month and day in UTC) and rename the new log file over the old one. If the system crashes during renaming it will recover at the next restart.

The synchronization magic happens by atomically grabbing the current root object and copying everything reachable from it into the new log file, but allowing more transactions to be written into the old log file simultaneously. It then atomically grabs the updated root and copies that as well, and continues to do so until both files are up-to-date and it can perform the atomic replacement.

Concurrency

For the time being I have made transactions to the database mutually exclusive, by implementing a mutex in the running webserver. Much improvement would be possible here, for instance by allowing read transactions to be concurrent and only using the mutex for write transactions, or by an even more sophisticated concurrency management that can consider different elements or keys of a JSON Array or Object as not conflicting.

Currently there is a bit of a risk that if I do not implement good try/catch handling in my code, then a webserver request will crash, leaving the webserver running but the mutex in ‘held’ state. This would hang the webserver, requiring a manual restart. I could implement some sort of disaster recovery, but there is probably no point at this time, I will simply use try/catch and make sure transactions are rolled back if not completed.

Conclusion

The log-structured format provides a useful way of maintaining huge JSON files. It remains to complete the documentation and release the source code and NPM package. It also remains to implement a stress test of my website, once up and running, to simulate thousands of user accounts and multiple logins to the same account all performing transactions simultaneously. Therefore, things are still experimental at this stage. I enjoy writing this blog to update on those experiments.