Goal: massively reduce the overhead from transaction relay. Non-goal: Get the absolute lowest transaction relay latency. Each node elects one or two preferred relayouts. One if the node has fewer inbound connections than outbound. Two otherwise. A preferred relayout is preferentially a long uptime outbound peer who claims to be a full node, claims to relay txn, and passes various heuristic health checks (e.g. it notifies us of current blocks). When a node receives a transaction, via whatever means, it immediately announces it to its preferred relayouts via the current INV process. This is the primary mechanism of transaction relay in the network. The low fanout conserves bandwidth, but broken nodes and cycles in the relayout graph will cause transactions to get stuck in cliques and not reach the whole network. Every transaction in the mempool gets an 8 byte short-id defined, using the hash of last block as a salt: short_id = siphash(blockhash||wtxid) The use of a keyed short_id is to allow very short identifiers but make computing collisions harder (though not impossible). All transactions in the top 12MB of the mempool get their short IDs added to a one of 16 BCH sets; the set is selected based on the first 4 bits of the short-id. We will assume we are using BCH sets of size T=1000. BCH codes are completely linear, adding a new entry X involves computing X^2, X^4, X^6, ... X^(t*2), and adding each term to each of t accumulators. (In F(2^m), 2^64 in our case). At random times, a node will pick a peer and request a sync operation. It will send its current tip block hash, and how many changes (adds/deletes) were made to the top 12 mb of it's mempool since the last sync with that peer, excluding deletes as a result of accepting a new block. The peer will get that message and check if the blocks match, and reject the sync otherwise. If they do match, it will compute the number of changes to the top 12MB of its mempool. It will then find the target transaction count C = abs(them-us) + Q * min(them,us). Where Q is a coefficient 0 .. 1 to be discussed in a moment. If C is less terms of our BCH code (1000) then we sum all 16 sets, and send the first C terms to the peer, along with our estimate of C, and which sets were used. If C is over the limit, we recursively split the set until the C/parts + ceil(log2(parts)) < 1000. We then sends parts set messages, with C/parts + ceil(log2(parts)) terms each. The far end can then attempts to construct the set difference(s). If it is successful, it INVs any transactions it has that you don't, and getdatas any that you have that it doesn't (this needs a shortID getdata/inv). If it fails, and you send more then 1000 terms, it asks you for more terms (double the last amount). When it is successful, it computes what Q should have been and asks you to update Q. Newly received transactions are relayed via the primary mechanism. Questions: The size 1000 in the BCH code was random. I don't know how much data would normally be required. Making it too big just wastes cpu cycles computing higher powers of the ID. It's possible to incrementally compute higher terms just by remembering the highest power computed so for for each txn. ... this way nodes could just adapt their BCH order on the fly. (e.g. if they get peers with high C's they could go and compute the higher order terms. The Q scheme above is .. uh. guesswork. I have no idea how to predict the inconsistency in the network. The addition of ceil(log(parts)) is a rough guess at the efficiency loss from splitting the sets. When I am not feeling lazy I can compute the actual expected loss analytically. Suitable BCH implementation in C++ using NTL is here: https://github.com/hbhdytf/PinSketch It's GPL (NTL is GPL). Presumably we can get sipa drunk and get him to implement the required stuff to get a non-GPL version. whatever magic NTL is doing for polynomial factoring is beyond me. Pinsketch seems reasonably fast: Constructing a BCH set for 40,000 elements takes 20 seconds on my slow laptop. Decoding a set difference of size 1000 takes 5.3 seconds. The way pinsketch works, the construction is no slower done incrementally. I believe there is a faster way perform a batch update. This all assumes the ID is based on the latest block, so nodes with differing latest blocks can't sync. This could be improved with a bit of memory cost by just keeping BCH data for some older blocks, and using the latest in common. like my efficient block relay writeup, recoverable IDs could be used. Some more complex stuff involving cycle detection could probably be done to help further prevent cycles in the network. e.g. support carrying a small set of "rider pubkey hashes" on invs, and randomly replace the nonces with one of your own with some low probability. When you see one come back to you, prove knowledge of its discrete log to tell the peer to not use you as a relayout. this would help keep Q down, and make the reconcile data quite small.