If you’re developing a Bitcoin application, and you need to build your own transactions and broadcast them, you should ideally find some kind of wallet library with good coin management. Unfortunately, we’re in the early days, and you’re likely on what software developers call the bleeding edge. This means you might very well need to suck it up and do it yourself.
This is not a set of best practices, but maybe it is better than what you are currently doing. Or it puts you on the right path. If you have any suggestions for what could be done better that might be added to this document, please let me know.
I do not know what you are working on. Maybe it is state of the art IPv6-based P2P payments, or more likely you’re just spending from a wallet with that has funds in, and broadcasting those transactions yourself. It does not matter, there is one thing you need to know, your UTXOs.
A UTXO (or unspent transaction output) is also referred to as a coin. This does not mean that it is a whole Bitcoin, but rather that it is a spendable value in a wallet and can be considered to be a single unit. Much like each coin in your pocket is a single unit, whether a five cent coin or fifty cent coin. Each transaction output that pays to the user’s wallet in some way, perhaps to a public key, a public key hash or some other way, is a new coin or UTXO in the wallet. It is now part of the available funds in a wallet, until the wallet spends it in a transaction input. And that’s basically what a wallet is, an application that manages keys and funds, and tracks available unspent transaction outputs and allows the user to spend them.
Let’s look at the case of a team that is building an application that is writing data on chain. Who knows what this data is, it doesn’t matter. It’s valuable to people and they’re paying the team to do it for some reason. So the team has to manage funds in a wallet effectively. But it’s not one that’s intended for every day people to use to manage their funds, rather it just manages user funds for the team and spends them to put the user’s data on chain.
A user’s funds
For each user they generate a master private key, this will be used to store the users funds. Generally this is done with BIP32 paths, using the BIP44 standard, and through this a wallet typically has a receiving derivation path and a change derivation path. So incoming funds from other people will go to public keys generated from the receiving path, and the wallet’s own change will go to public keys generated from the change path.
As the team is not doing anything particularly challenging, all of their transactions will be P2PKH (pay to public key hash), and a public key hash is otherwise known as an address. But forget addresses, that’s just something you’ll use for display/debugging purposes. Or if you use some limited library that only takes addresses, and you have to transform your public keys into addresses to use it.
In this application there will be two types of transaction. The funding transaction, and the change transactions. Either the user, or the service will fund the wallet, and as the wallet creates data transactions their only cost will be the miners fees. But it is rarely the case they will have coins/UTXOs that are just the right size amount to pay the miners fee— so most data transactions will have a change output back to the wallet.
The team’s service knows about all these transactions. It can extract, store and index the UTXOs. When it spends a UTXO as an input to a transaction, it can mark that UTXO as spent and it can add the change UTXO to the index as a new unspent entry. There is no reason why it can’t have a readily accessible balance for the user’s wallet, with almost no work. With either a real-time tally, or maybe there’s a simple database query that tallies the unspent UTXOs.
A user’s wallet may have thousands of UTXOs, and this is okay. If this is a problem, you’re probably doing it wrong. You can segregate them into buckets depending on size ranges, and can pick from buckets to put together the right amount you need. Most databases have the ability to index over expressions, you can even have an index per-bucket, and do some fancy coin selection in your queries. You probably even want to continually pick larger values and split them as you go, in order to maintain the number of available UTXOs in any given bucket.
You’re effectively making a wallet for your application. Here are some concepts that you’ll need to know about and work around, or that may be useful should you flesh out the wallet later on.
Unconfirmed transaction chain limit
As part of their rule as the central controlling party of Bitcoin Core, the Bitcoin Core developers added two different limits. Both of which commonly affect developers building Bitcoin SV applications. The first is that there cannot be a chain of greater than 25 unconfirmed transactions. And the second is that the total size of those 25 unconfirmed transactions cannot be larger than 101KB.
These limits are enforced by nodes settings, so are not considered part of the protocol in the sense they are not consensus rules. But even if you can detect whether a node has them disabled, you have to hope that the node’s path through the network to miners, is through other nodes that also have them disabled, in addition to the miner themselves having them disabled! Like most such things that people seem to think are somehow better for “not being consensus rules”, but in effect they may as well be because normal developers have to deal with them all the time.
It’s not that bad though. It just means you have to do an initial split and fill those buckets up, and from then on, your wallet should enable it’s normal coin management to ensure the buckets are full and the UTXOs are maintained in appropriate sizes from confirmed transactions.
There is the chance that the nodes you are using happen to be configured all the way over to the miner, with these limits changed. It might be that the best way to proceed is to still have a sufficient pool of spendable confirmed UTXOs on hand to begin with, and if you encounter one of these limits fall back to a more reserved way of operating that works with them in place.
The miner’s fee for a transaction, is what remains after the total value of the outputs is subtracted from the total value of the inputs. At this time, it is loosely expected to be something like one satoshi per byte, given the size of the final transaction. But once a UTXO goes below a certain value, it becomes more expensive to spend it, than it is to worth in and of itself. This is not dust per se, but it is kind of useless. So you can take all the dust, and the UTXOs that cost money to use, and just ignore them. Put them in the bucket you never look at.
In theory, come February 2020, lots of things will change. And it’s possible that there will be some recourse for wallet owners to redeem all these UTXOs, along the lines of it being in the miner’s interest to reduce the size of the UTXO set. I’m not sure what this will be, or how it will work, but it makes a lot of sense for this to happen.
Addresses are a concept that Bitcoin Core effectively created because they did not understand Bitcoin. So they decided that everything had to be an address, and people needed to be prevented from not using addresses by all the changes they made from the protocol. So the options were P2SH (pay to script hash), P2PKH (pay to public key hash), P2PK (pay to public key) and bare multisig. How they let bare multi-signature stay around, I don’t know, but no-one really used it and most wallets likely did multi-signature through P2SH. But P2SH is going to be disabled for new transactions sometime soon, so most multi-signature will be in a non-addressable form once again.
Of course, in the context of the theoretical data publishing service described above that maintains user wallets, chances are they’ll be only doing P2PKH transactions. But still, addresses should generally be considered a display convenience for debugging purposes. You should understand enough to get by without building around them as a fundamental building block.
BIP32, BIP44 and whitepaper 42
Most wallets until recently at least, have used BIP32 derivation paths. I have mentioned BIP44 above. You can sum up BIP44 as a way to turn a master private key into a magical place where you can derive separate deterministic sequences of keys for any compatible alt-coin. It also subdivides the derivation path for any given alt-coin into those change and receiving sequences.
Deterministic address generation is useful, and having a standard like BIP32 that provides a common way to do it has been invaluable. But as people say, when all you have is a hammer, everything looks like a nail. Add to your tool belt by reading whitepaper 42, and think about some of the possibilities it opens up.
Let’s get into BIP32 some more. What BIP32 gave us was a way to recover all the transactions a wallet ever sent and received. The way it did this was that a wallet could process the whole blockchain and look at every transaction, and see if the inputs or outputs made use of a key from the wallet. This might be in the form of a public key either in a P2PK transaction output or a bare multi-signature transaction, in the form of the hash of that public key in a P2PKH script, or in the form of a hash of a script in a P2SH script.
Note that there is no context for any of those recovered transactions. A wallet should know who the transactions it’s owner made, were with. It should quarantine, and even refund transactions that it does not know about. But every transaction in that recovered wallet is effectively unknown. Any link to Paymail addresses, or identities, is lost. The metadata that is not included in those transactions, is valuable information. And perhaps even essential if the IRS comes calling and wants to audit.
BIP32 recovery is a thing of the past. And BIP44, was only ever a way to build out an eco-system of dark or alt coins (otherwise colloquially known as shit coins). In the future it is very likely that wallet metadata will be either be persisted on-chain in an easily accessible way, or exportable as CSV, JSON, or other common formats.
One example of metadata might be from use of the techniques in whitepaper 42. This involves taking a key and making use of a shared secret to derive further key sequences. If your wallet does build on this, or has to interact with other services that do, it will need to associate that shared secret with the relevant parent key, in order to be able to identify any keys generated from it. This does not fit in the BIP32 restoration model, and requires wallet metadata management. Token access chains from whitepaper 591 are another tool that will also require metadata.
Coin management has privacy implications, and at best for the purposes of a data publishing service, this might just be not reusing addresses (which you should be doing anyway) and perhaps ensuring that when multiple transactions from the same UTXO pool, or wallet, are being broadcast they are done so in way that does not associate them. For a known service, that publishes identifiable data perhaps using a known protocol, it might not even matter if the transactions or spending can be linked. But it might matter if the person funding the wallet can be linked to the use of the service.
There are useful things to know about what people can infer from transactions where UTXOs are spent without taking this into account. One concept is called merge avoidance, and is worth looking into to help user’s avoid any of their UTXOs being associated with any of their other UTXOs — like what your boss pays you and funds received from a co-worker. Another is Benford’s law, which also relates to the same concern, but focuses on choosing sizes of spent coins so that they blend as random usage.
Some other stuff
Key reuse (formerly known as address reuse)
There is no reason to reuse keys (or addresses). If you are reusing them, then you should consider why you are reusing them. If I saw key reuse, I would assume it was from a bug or sloppy coding. Needless to say, ElectrumSV does not do a good enough job of preventing people from reusing addresses, but that’s something I hope to change as we refactor more and more of our code.
How Bitcoin Core does it
There is a paper on coin selection that was written with the assistance of Bitcoin Core developers. What you have to keep in mind is that Bitcoin Core does not want people to actually use the blockchain, so many of the decisions they make follow from that and don’t make a lot of sense or don’t quite align with what is best on the Bitcoin SV blockchain.
One of these things is that their coin management often centers around reducing the number of UTXOs, something that makes it harder to maintain privacy. It is one thing to help wallets cash in all the useless UTXOs, but it is another to have a general policy of reducing UTXOs. We actually want more usable UTXOs on hand in a Bitcoin SV wallet, because it means that when you want to spend them, you have them. And of course, it gives the wallet options for ensuring privacy for it’s users.
Wallets for users
Most of this article has focused on the simpler case of wallets used by a service that is publishing data. A user’s personal wallet should be a lot more complicated. It will likely have to engage in privacy best practices. It will likely have to support SPV. It will likely have to support Paymail. It will likely have to integrate with the metanet in some way. It will likely have to do many many things, and much like coin management there’s little usable information or documentation about how to do those things. This, again, is the bleeding edge.
The core implementation of coin management outside of privacy enhancing techniques, probably requires a decent amount of work, but given a decent understanding of the Bitcoin protocol it shouldn’t be too much of a challenge. Add in privacy and other user-facing features beyond those required for a data publishing service — like SPV and P2P support, and it will get more complicated.
I don’t think there is any existing reference which brings together all these concepts, and for the most part people who already know about them likely linked any necessary concepts together themselves. Perhaps this article can be a basic starting reference point for people who might be interested in some pointers on the subject of coin management.
Maybe someone will write a book that covers these topics and more, and get O’Reilly or a similar publisher to sell it. Despite the bagging it receives for containing a lot of questionable Bitcoin Core related information, Andreas Antonopoulos’ Mastering Bitcoin is still the only real reference out there. I’ve seen newly employed Bitcoin SV developers walking around with copies, I think having been given them to use on beginning work. If I didn’t have two jobs already, I’d write “The Bitcoin Developer’s Reference Manual” myself. You can make find out more about making a preorder here!
- BIP32, Hierarchical Deterministic Wallets, https://github.com/bitcoin-sv-specs/bips/blob/master/bip-0032.mediawiki.
- BIP44, Multi-Account Hierarchy for Deterministic Wallets, https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki.
- Indexes On Expressions, https://www.sqlite.org/expridx.html, https://www.postgresql.org/docs/current/indexes-expressional.html.
- WP #0042, Secret Value Distributions v2, https://electrumsv.io/download/papers/WP0042%20Secret%20Value%20Distribution%20V02.docx.
- WP #0591, Token access chains and linked keys, https://electrumsv.io/download/papers/591%20-%20Token%20access%20chains%20and%20linked%20keys%20v1.3.pdf
- Merge avoidance, https://medium.com/@octskyward/merge-avoidance-7f95a386692f.
- Benford’s wallet, https://nchain.com/en/blog/benfords-wallet/.
- An Evaluation of Coin Selection Strategies, http://murch.one/wp-content/uploads/2016/11/erhardt2016coinselection.pdf.