Goals: Transfer a new block as fast as possible, (as near as possible to
the one way delay) in real use almost all the time, tolerate adversarial
peers.

Non-goals: minimizing bandwidth, except in so far as not wasting bandwidth
reduces latency.

This derives from some of the older work with https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding

Over the normal p2p protocol peers offer to stream blocks, nodes pick some
peers they want to stream blocks from based on whichever peers offer headers
fastest... selecting, e.g. the 6 fastest peers to receive streamed blocks
from. The rest assumes this negotiation has already happened.

A client wanting to receive streamed blocks will listen on some UDP port.
During neg some challenge response verifies that UDP actually makes it 
through. 

While doing the setup the client will assign the server an index, and
perhaps a requested level of redundancy.


Lets define a short txid:
 A short transaction ID is siphash(blockhash || wtxid) truncated to 64-bits.
 The purpose of the short ID is to be short but strongly collision resistant
 in an adversarial setting. Siphash() could be siphash or any other very
 fast (for small inputs) cryptographic hash.

Lets define an extended short txid:
 An extended short ID is a short ID prefixed by a transaction length,
 efficiently encoded in a self-delimiting way.

Lets define an extended block sketch:
 An extended block sketch contains a block header, coinbase txn,
 and extended short txids for every transaction in the block.

The rest of this protocol will use forward error correction.
 I recommend we use the LDPC staircase scheme from openfec.
 It is not as space efficient as an RS code, but it is very
 fast to encode and decode, which is critical for preventing
 the cpu from being a bottleneck.

[http://openfec.org/]

When a node successfully receives the data for a block, it immediately
computes an extended sketch and OpenFEC codes it to add a dozen packets
of redundancy. The node immediately sends the original data packets (systematic
part of the code) in random order, along with a few randomly selected
redundant packets (amount of redundancy sent could be signaled by the peer)
, round robbin to all peers that asked in advance for block data from you.
(note, if a peer starts streaming you data, stop streaming data to it.)

Concurrently, the serving node takes the block data and FEC codes it to (say) 8
times its original size.  The first 1/8th is the systemic part (unmodified), and is 
ignored. The remaining redundant data is split into 7 groups, and each
client is sent it's requested group. If the client wants more than one
group of redundancy additional random packets are picked from the 7th group.

Concurrently, The receiving node accepts sketch data and attempts to recover a sketch.
When it recovers a sketch it uses the lengths and its mempool to drop the
txn it knows into a template block and packetizes... these packets act as a 
virtual additional data source of FEC coded block data.

They are combined with the incoming FEC coded block data. And as soon as
enough data is provided the block is recovered.

Transmission happens in the time of a one way delay and cpu overheads
which experimentally look like they would be no more than a couple
milliseconds.

Potential rocket-science improvements: 

(1) as in efficient.block.xfer.txt recoverable IDs could be used to
reduce overheads.

(2) as in efficient.block.xfer.txt  IBLT could be used for the
sketch to reduce its size assuming a large mempool match. Risk of
failure suggests not, polynomial based is right out due to delay,
and even IBLT decode is likely too slow. I think this is much
less attractive for latency mitigation.

(3) The mempool's usefullness is degraded by the fact that 
a packet may be 95% known from the mempool, but have 5% missing
data and the FEC scheme can only deal with whole-packet erasures.
If transaction boundaries are totally oblivious to packet
boundaries then a missing transaction will often kill two
packets instead of one.

Since the sketch includes the lengths, the FEC transferred block
could have padding added so that fewer transactions cross 
boundaries.  A simple scheme that takes a sequence of lengths
and then deterministically under-fills packets when doing so
would only waste two dozen bytes or less, would add about 1%
overhead, and probably make the mempool much more useful.

Probably worth doing but some experimentation would tell
how much overhead was justified.

(4) if the miner generates the FEC coded data for the block
and includes a hash tree commitment, then the coded data could
be passed through before being decoded (or even before enough
data to decode it is received).  This would avoid serializing 
the receipt and decode across multiple nodes, so the whole
network could decode in roughly the time it took to get the
first couple peers.


Where my handwaving is worst
-----

I'm assuming you may combine data from mutiple peers in the FEC.
This is great unless there is a byzantine peer.
To avoid this the peers sketches could commit to hashtrees of
the coded data (and don't intermix hashroots), but this might
be too slow. 

Otherwise you're stuck with trying and if you fail, use a single
peer. Once you do have the block you can figure out what peer
gave you bad data and ban them.

Interaction with weak-blocks
-------

If in the future weak blocks are used, a high latency
scheme would be used for the weak blocks themselves.

(e.g. efficient.block.xfer.txt with some modifications to 
differentially code against weak blocks).

The reason for this is that miners would not use the latest
weakblock to base their actual block on.. but one that was
old enough to be well propagated... so there should be
no rush in forwarding a weak block, and minimizing bandwidth
is critical.

This streaming protocol  would only be used to transmit the
differences to a cited weakblock.-- the sketch would have a
list of removed tx offsets, and the protocol above would be
used for additions. If any. if the block was the same
as a weakblock, this would be reduced to saying so, providing
the header, and coinbase txn, and some redundant packets to
avoid packet loss.