Goal: get a block to a peer, with minimum bandwidth, and without having much/any more per-peer state. This would reduce bandwidth spikes around new blocks and make Bitcoin more usable on limited connections... as well as limit the worst case fairness impacts of larger blocks, Non-goal: minimization of latency; though reducing bandwidth will often reduce latency a lot. Lets define a short txid: A short transaction ID is siphash(blockhash || wtxid) truncated to 64-bits. The purpose of the short ID is to be short but strongly collision resistant in an adversarial setting. Siphash() could be siphash or any other very fast (for small inputs) cryptographic hash. Lets define a block sketch: A block sketch is the coinbase txn of a block, the count of txn, then then N_txn 8-byte short-ids. Given a block, anyone would compute the same block sketch. A block shows up, and every node who receives a whole block sends to their peers a blockheader and a hash of the block sketch for the block. A client collects these advertisements, picks a peer, asks it for a block sketch. Peer responds with the sketch (or timeout), if sketch is inconsistent with advertisement, ban peer. Client uses the blockhash, to generate short-IDs from its own mempool (potentially only the top N transactions worth, if its mempool is big-- up to the client; it can start doing this as soon as the header shows up hiding it in the sketch request roundtrip) Client determines which transactions from the sketch it already has in its mempool, and which tx indexes (offsets in a block) it is missing. Client requests missing indexes from peers that have offered the same block sketch using differential coding and variable length integers. E.g. say the client wants indexes 2,4,5,9,13 it would send a string of varints 2,2,1,4,4. (maybe bitcoin varints or a more efficient scheme like sipa's uint). Server(s) return transactions, short-IDs are checked, peers banned if they don't mach. (this is why the sketch is hashed in the advertisement) When all transactions received, block reconstructed. If it failed, fall back to an ordinary block request. Conclusion: Block transfer with around 9-10 bytes of overhead per transaction using the mempool, in 2.5 round trip times; supporting parallel fetch of the bulk of the block from multiple peers (reducing load spikes). Potential rocket-science improvements: (1) Use of "recoverable" short IDs. Imagine AES-CBC-ing a transaction twice, once front to back, once back to front with a static well known key. Now all the bytes of the ciphertext depend strongly on the whole of the transaction. You can now use the first N bytes of this data as an N byte transaction ID; send someone the ID, then when they request the TXN, send the rest. This avoids N bytes of 'overhead' that arise when using a hash as an ID. Use of very short IDs (under 20 bytes or so) requires a fiat-shamir trick to provide collision resistance; thats the above short ID scheme. One could change the AES key for this, but recomputing IDs for the whole mempool when a block shows up would be computationally prohibitive. Instead we can compute a non-salted 256 bit recoverable ID for each txn. And at blocktime then compute an 8 byte salted recoverable ID for that ID, which is fast. then the gather requests send the len-8 bytes of remaining transaction data. Now we have maybe only 2-3 bytes per transaction of overhead in the case where the transaction wasn't already in the mempool... but still 9-10 bytes for the common case of known ones. Cost: code, cpu time, and additional memory to store txn in recoverable form. (2) Use of set reconciliation in the block sketch (UPDATED) Set reconciliation schemes allow you to transmit the difference between your set A and an unknown set B with data proportional to the size of the symmetric difference. There are several choices for schemes such as IBLT or polynomial code based. IBLT is inefficient, in that it requires some small constant multiple more data than the size of the difference. IBLT also doesn't work well incrementally if an IBLT fails to decode you cannot just send a bit more data to make it successful. But IBLT decodes faster. Since we don't care about the lowest possible latency, and because polynomial is fast for the kind of small difference numbers expected here, I propose we use polynomial and not IBLT. There is a lovely BCH code based implementation at https://github.com/hbhdytf/PinSketch so I'll assume it now. To use this, lets define a compressed sketch. When a new block comes in, after checking POW but before accepting the block you do the same tx selection that CreateNewBlock would do, and record the list of tx that were in the block that weren't in the CNB result, call this the surprises. If the surprises are more than X% of the block or when the block has under 100 transactions remaining after the surprises, just send a non-compressed sketch. When sending a compressed sketch, send the header/coinbase/txn as in a normal sketch. Then the blocksize, and the sum of the tx sizes of the surprises. Then send a varint count of the surprises, and the surprises in the same order they were in the block; just like in the non- compressed sketch. Then send 100 terms of a BCH set reconciliation code. The receiver will take the list of surprises, and the size of the block and will CreateNewBlock to generate a list of transactions for a block of the actual size, but with the added constraint that the target size is reduced by the sum of surprise sizes, and the sizes of known surprises are skipped when enforcing that limit. [This is a best guess at everything not in the surprise set]. Receiver then extracts a candidate set from the mempool, adding in all unknown surprises and attempts to recover the unknown values. Then take the recovered set, sort it, and remove all the signaled surprises. Code the permutation of the block by efficiently coding an integer on the range [0, remaining_txn+1) where the last value is special cased to take the next entry from the surprises. (as they were sent in block order). If its successful, the hash of the full sketch will match and things can continue like normal. If it fails, it could request additional sketch data, which would accumulate with the last... Not sure if it wouldn't just be better to fall back to an uncompressed sketch there. The goal with the 'surprises' is to take the transactions which are very likely to be set differences out of the reconciliation process in order to keep it small and fast. For rusty's collected mempool data, this looks like it would make the sketches <2048 bytes for basically all blocks, compared with about 8kb for uncompressed. Combining recoverable IDs can be combined with this overall overheads of only a couple bytes per transaction. Cost: code, cpu time, performance depends more strongly on block/mempool agreement. (3) Round-trip elimination. Since the sketch is small (esp with compression in use). You could ask the last peer who was the first to offer and successfully deliver a sketch to next time just send you the sketch instead of just the offer. This would save one round trip. Not part of main proposal, because it optimizes latency; but it would likely 'pay' for the compressed sketch decoding cpu time. (4) ~elimination of time-outs via FEC coding. Taking the protocol above (with or without (1)/(2) enhancements). Servers can use forward error correction (e.g. a RS code) to take their sketch and code it into (say) 32 parts. such that any 16 of them is enough to recover the sketch. The sketch hash is merkelized over all 32 parts. Clients can then round-robbin request one part at a time from offering peers. If the client finds itself waiting on a single peer, it requests one more part from a different peer. Then when either respond, the reconstruction can complete. Similar could be done for the gather requests. E.g. make a request for the same missing txn to all peers, and ask them to give you a 1/Nth share of it (along with a hash of the shares). Cost: Code (though sipa has a nice RS code that would be suitable for this), additional latency in the common case (but much less latency in the adversarial case). Cpu time in the block offering server to code up the sketch.. though the sketches are small. FEC encoding for gather requests could be a CPU DOS for servers. This has some mild incompatibilities with the above compressed sketch scheme, since the surprise set would differ from peer to peer.