We are calling time_now() a huge amount, and it is a major consumer of
CPU cycles, but we don't need it: most of the time the current event
loop time is enough.
We have a few cases where we're making an extra shared_ptr which we copy
into a lambda, which then results in an extra unnecessary refcount
decrement in the parent; this changes them to give an rvalue reference
to the lambda to avoid the extra incr/decr instead.
The one in Session::Pump is particularly noticeable and shows up in
profiling.
- Replace m_FlushWakeup with a call to the router's god mode pump
method. m_FlushWakeup apparently isn't enough to get things out, and
we can end up with incoming packets that don't get properly handled
right away without it.
- The shared_ptr around the ihophandler queues isn't needed and is just
adding a layer of obfuscation; instead just exchange the list directly
into the lambda.
- Use std::exchange rather than swap
- A couple other small code cleanups.
The TriggerPump just below this is *already* going to trigger a flush,
so the extra flush call here can't do anything useful (and in
particular, it won't clear up the queue *immediately*, which is what
this code looks like it was aimed at doing).
- Make the main PumpLL also pump hidden services, rather than using
separate wakers in each TunEndpoint. It seems there is some
interactions that just one or the other is not enough.
- Eliminate TunEndpoint send queue -- it isn't needed as we can just
send directly.
If something needs to wake up the event loop it should be using an
async, as we are now with PumpLL(); but we had various code triggering a
wakeup, expecting that PumpLL gets called on every wakeup, which isn't
true anymore.
We trigger a pump immediately, but this is racey because we add to our
plaintext data in a worker thread; if the worker thread runs after the
pump then it ends up leaving plaintext to be handled, but there's no
wakeup until the next one.
This was the cause of seeing a random +1s and bunching added to ping
responses sometimes: it wasn't until the *next* ping goes through the
network that the plaintext queue gets processed, at which point it
flushes the old one and often the new one together.
The fix here gets rid of the map of sessions needing wakeups and instead
adds an atomic flag to all of them to let us figure out which ones
need to be flushed.