I2Speed Development Overview


Here we detail the changes we made for curious developers, hinting to where to look for the enhancements. There are many changes to the multithreaded design as well as functional enhancements. Will take some days to fill.

All changes have been designed in a way that no bottleneck will exist under high load on quadcore or octacore ARMv7.

Multi-Threading: Every OS imposes penalties when it sees an app like I2P with many threads running short time. Java thread pools make things worse by going round robin, defeating any caching or processor reuse. Mac and Win have power saving features that extend the sleeping time between thread activations (we have seen a 1ms sleep taking more than a minute on a Mac). Linux has a waiting penalty for thread reactivations. So our design goal is simple: Do not give up the CPU, unless you must.

Code area
Description
# of thread activations/sec saved
NTCP Reader
Eliminated. Transport layer encryption is removed right in the Pumper, then individual messages are processed on the job queue.
100s
NTCP Pumper, NTCP Reader, NTCP Writer, NTCP SendFinisher
Combine 13 threads into one. Optimize execution order. Execute afterSend and delayed close in tight loops. Do not activate SimpleTimer.
1000s
Tunnel GW Pumper
Eliminate thread pool, running code highly parallel from calling threads. Intelligently batch together traffic for same tunnel. Always flush tunnels if no further traffic concurrently queued while pumping. Keep delayed flush through the SimpleTimer with its 100 ms wait for local traffic to an absolute minimum.
100s
Job Queue System, Client message pool
Run outbound local traffic as parallized priority jobs. Standard jobs run after them. Standard jobs activate a thread only if no outbound running. Raised to priority after 2 ms, if no priority comes in. Timed jobs go as piggybacks only on threads that would go idle otherwise.
0
UDP inbound
Eliminate PacketHandler thread. Run code directly from the UDP Receiver, forwarding into the central job queue. Eliminate UDP Message Receiver.
1000
SystemVersion.java
Tune thread count and other properties to OS and hardware capabilities.
varies
Fragment Handler
Handle fragment expiration inline while tunnel is live. Reuse expire timers. Restrict SimpleTimer use to end of tunnel lifetime where possible. Completely lockless message reception. Debugged.
100s
SimpleTimer2.java
Use setRemoveOnCancelPolicy(true) to prevent timers firing that have their tasks canceled.
see previous
UDP outbound
Bypass UDP Sender when no current BW limits in effect
100s


Functional enhancements

PRNG, DH / XDH / YK Precalc
Run all crypto generators often enough that all buffers are always filled and no worker thread has to wait for precalc stuff. Wait times are calculated in a way that on average every run generates 1 item or a bit more.
UDP transport
Complete overhaul. Less retransmissions and higher connection speeds. Contact us for detailed questions.
NTCP Transport
Several bottlenecks removed together with changes above result in clearly higher throughput.
Profile Organizer
Double max. number of fast and high capacity peers
Tunnel Pool
Substitute random tunnel selection by round robin.
Clock.java, RouterClock.java
Access the system clock only 2 times per ms on average, instead of 50+ times before. There is too much talk about precision timing. I2P survives a 1000ms heapdump without logging any error, so it does not hurt if the clock is off by some µs.
AsyncFortuna
Do not lock down the random number generator (major bottleneck). Retrieve random numbers lockfree. Use spinlocks to solve concurrency.
Queues and caches, various places
Changed to lockfree ring buffers, where spotted as bottleneck. Use spinlocks to solve concurrency.

Enhancements to build system and support files

t.b.c.





See our technical overview.

updated 200416.

<-- Back