Fleshing out cloud-based backups

I find myself distracted by the need to get wallet backups working. I don’t want to be designing and implementing solutions for everything, and in an ideal world I wouldn’t have to. But in this one I quite often do.

This article is intended to take the little bits and pieces at the back of mind, and from past discussions on the Metanet.ICU slack where all the best BSV development discussions happen, and write them up and flesh them out.

If you want a high level overview that touches on why not some philosophically pure system of on-chain backups, refer to my last article:

Overview

The overall problem of cloud-based backups can be broken into several mostly distinct sub-problems.

  • Ensuring the wallet user is in control and has a clear picture of where things stand.
  • Changes to the local wallet data are backed up in an incremental manner to the remote cloud storage.
  • All backup data whether incremental or full is stored on the remote storage in a way where it retains integrity.
  • Some form of background management is necessary to ensure the incremental remote storage is regularly consolidated to keep it manageable in event of data loss or restoration.

Fleshing out sub-problems

I do not want to commit to solutions for these things at this stage, but I can elaborate a little on my current thoughts.

Perhaps this is a solved problem somewhere else? If it is let me know.

At the least, there should be an indicator that shows if there is data that has to be backed up. It should also be possible for the user to toggle between manual and automatic backups — even if just to allow the user control over when their bandwidth is used. These two things are not that complicated to do.

There is another aspect I find myself wondering about. I wonder if there is a way to help the user keep a mental picture of what the data that has not been backed up, is related to. After all, an angry icon in the user interface that highlights non-backed-up data doesn’t give the user much nuance about what has not been backed up. Adding highlights to the user interface on the elements that have not been backed up might be a bit much, but perhaps there are other options like the angry icon providing the ability to view a summary of what is queued for backup. But unless they have a clear mental picture of what they did recently that is not in the backup, they’re going to have to try and work it out themselves in the event of data loss.

Do you use other products with a similar approach to backup? Do they have some good ideas we should integrate into this? If so, please let us know.

Incremental backup is pretty much a protocol where each change made by the user is put onto the remote storage. These changes can then be played back and applied in order to recreate the full wallet state.

Defining this protocol sounds like a large amount of work. Additionally it might be necessary to ensure that it is done in a way where the outstanding changes can be mapped to UI elements in the wallet for enhanced display of what has not been backed up, or generating a summarised list of some sort. Beyond that this protocol may need to change as the wallet changes, and this may mean supporting all past variations indefinitely in order to allow old backups to be restored.

A possible solution — SQL

However, we have a possible solution at hand where we do not need to do any extra work at all to have this incremental protocol. ElectrumSV uses the SQLite database software for wallet storage, and beyond that because of problems with database writes happening on different threads and conflicting with SQLite limitations all the database writes pass through a dispatcher on a single thread.

We should be able to just write these as the incremental backup for the wallet to the cloud storage. One concern would be if the statements were recorded out of order, which would mean that this would complicate replay when backups were restored. But the current nature of the SQLite writer dispatcher ensures that if the order works for the wallet itself, that would be the order we would be storing the changes in.

Batching

If I had to guess how we would put these into cloud storage, we would likely just serialise all the SQL statements to a local file until a certain point is reached at which point it would be copied to the remote storage.

There are two properties I see as important for the data on the cloud storage device — or even the local batch file that is pending synchronisation. That we know the backup data is complete, and that we know it has not been tampered with.

With regard to tampering, the data can be encrypted sequentially based on some derivation from the wallet’s master private key. This can be done at the point it is batched, making the cloud copy just that — a dumb copy. It might be that there is some interesting variations on how this could be done.

With regard to completeness at point of synchronisation, the simplest path forward is that ElectrumSV just ensures it successfully copies all batched data to the cloud storage. Beyond that, we can ensure some level of redundancy by allowing the user to specify multiple destination locations for backup to go to.

  • Somewhere on another drive locally to them.
  • To their local Dropbox, Google Drive, OneDrive or whatever mirroring solution. This is not our concern, and we’d consider this a local storage option. I doubt it gives us any way to know if and when it had been copied completely to the mapped remote storage. This is perhaps the least appealing option.
  • Via an API with an access key, to one of the major cloud storage solutions. We wouldn’t want to be integrating support for multiple services — I am a big believer in limiting what we have to maintain to what we actually use ourselves and not turning into some open source dumping ground for support for obscure personal preferences of whoever wants to contribute code. This is preferable in a sense that we can know for sure that batches have been “committed” to the remote storage.
  • We could always write the batches to the blockchain as well as a form of on-chain storage that was ElectrumSV-specific. This would be something that would definitely not be in the initial implementation, and maybe we wouldn’t ever do it.

There are several things we might want to do for house-keeping of backup data.

Consistency

We should at the least offer to do periodic or ongoing consistency checking. This would check that all expected files were present in the backup and that there was no bit-rot or tampering. For API-accessed cloud storage, like Azure Blob Storage, or S3, maybe it is enough to just ensure files are present and the advertised hash from the API is the same.

If the storage is not on media that has hashes, then we may have to read it all and hash it ourselves. This may have bandwidth costs, depending on the cloud storage. Even if the user’s storage does have hashes in it’s queryable metadata, we may still wish to offer the user the option anyway to explicitly have ElectrumSV hash and verify it to be sure there isn’t at the least bit-rot.

Consolidation

As the amount of batches that are copied to the backup storage adds up this extends the amount of time it takes to restore that data. At some point we may wish to consider how to consolidate it and reduce the mass of the stored data, and the work required to do a restoration. This however isn’t a priority, at least for the initial implementation.

Final thoughts

If you have any suggestions about the above, please let me know. This is a little further down in the backlog of tasks and I am not quite ready to commit to it. Partly because there are other things that need to be done, and partly because I need to be sure exactly how it is going to be implemented.

Written by

ElectrumSV developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store