I2Speed Technical Overview
This is a high-level overview of our work, detailed descriptions
for interested devs to follow.
- Math libraries on the Mac and on Linux X86_64
were built using latest GMP 6.2.0 and latest compilers.
Support for Skylake and Nehalem architectures on the Mac
added. 32-bit support and debug stuff stripped from all Mac
components.
- Do not fall back to short encrytion keys when
using ARM64 quadcore oder ARM32 8-core.
- Pre-generation of random numbers and crypto
keys is now evenly distributed over time. Now it is made sure
that buffers do not run dry under high load, delaying packets
by on-the-fly generation. Makes processing of local traffic
25% faster under high load. There even is an auto-turbo mode
geared towards massively parallel ARM-boards.
- Ultrafast lockless operation for many queues,
caches and popular random number types.
- All message processing is done in parallized
threads, was mostly sequential before.
- Numerous tweaks to the NTCP transport to cut
out some ms latency.
- Several improvements to the multi-threaded
design. Threads along with context switches eliminated where
easily possible. Thread count scales to platform, processor
count and type. Threads wait more than 50% less idle on the
run queue (Linux).
- UDP send strategy employs parallel resends and
a very fastly tuned send window, aimed at minimizing message
loss on failing connections. Tolerates high levels of packet
loss (limit 5% for fast connections, more when slower). Helps
with WiFi and when running high bandwidth. Top speeds
averaging > 200 KBps (single connection) outperforming TCP.
- excessive locking resolved within the UDP transport, cutting down
noticably on CPU.
- resolved high frequency of iterations over UDP
messages in transit. During lifetime the average message is
visited < 10 times versus > 1,000 times before. Clear
CPU reduction.
- Once a message gets in over UDP it is reliably
ACKed twice. Saves 1% inbound bandwidth by bringing the
duplicates received down to near zero. The real winner here is
the avoidance of connection stalls on the remote end. If
network is correctly configured, UDP will account for around
80% of inbound traffic.
- High frequency of calls to the system clock
eliminated. Clear CPU
reduction.
- Chooses among twice as many
network peers when building tunnels. Useful if you run 100s
of local tunnels as we do.
- Lossless compression for graphics cutting down
28% average on memory usage.
updated 200316
<-- Back