- libsodium calls streamlined and moved away from stupid typedefs
- buffer handling taken away from buffer_t and towards ustrings and strings
- lots of stuff deleted
- team is working well
- re-implementing message handling in proper link_manager methods
- bumped version to latest main branch commit
- wired up callbacks to set RPC request stream on creation
- methods for I/O of control and data messages through link_manager
- llarp/router/router.hpp, route_poker, and platform code moved to libquic Address types
- implementing required methods in link_manager for connection establishment
- coming along nicely
- `::handle_message` is transposed; Rather than the message calling the method and taking a reference to the router, the router should have a handle_message method and take a reference to the message
- `::EndcodeBuffer` takes a string reference, to which the result of `::bt_encode()` is assigned
- routing messages and surrounding code
- shim code in place for iteration and optimization after deciding what to do with buffer, string handling, and subsequent function calls
Note: this is compilation-fixing only. Behavior fixing will come later with combining
the earlier efforts on liblokinet with the new wire protocol changes.
TODO:
- set up all the callbacks for libquic
- define control message requests, responses, commands
- plug new control messages into lokinet (path creation, network state, etc)
- plug connection state changes (established, failed, closed, etc.) into lokinet
- lots of cleanup and miscellanea
- oxen-logging updated to bump fmt version
- version bump oxen-logging to fix fmt version
- version bump oxen-mq to solve uniform distribution error
- misc errors introduced by above version bumps
- clang-format 14 -> 15
when setting libunbound's upstream dns, we need to not pass in the square braces of an ipv6 address.
we also net udp handles have ipv6 address for the local ip.
in rpc client, contention on a null lock happened.
fix this by making the sending of pings always done in the logic
thread. this is done by wrapping the lambda we made with EventLoop::make_caller()
-- Moved all RPCServer initialization logic to rpcserver constructor
-- Fixed config logic, fxn binding to rpc address, fxn adding rpc cats
-- router hive failed CI/CD resulting from outdated reference to rpcBindAddr
-- ipc socket as default hidden from windows (for now)
refactored config endpoint
- added rpc call script (contrib/omq-rpc.py)
- added new fxns to .ini config stuff
- added delete .ini file functionality to config endpoint
- added edge case control for config endpoint
add commented out line in clang-form for header reorg later
* Updated RpcServer Initialization and Logic
-- Moved all RPCServer initialization logic to rpcserver constructor
-- Fixed config logic, fxn binding to rpc address, fxn adding rpc cats
-- router hive failed CI/CD resulting from outdated reference to rpcBindAddr
-- ipc socket as default hidden from windows (for now)
Previously oxen-logging was erroneously hard-coded to use the target
"lokinet" for system logs. Obviously this is wrong for anything else
which uses oxen-logging and the system log. This changes our call to
add_sink to pass "lokinet" as the target rather than the config
filename, and updates oxen-logging to use that argument correctly.
Default & Required makes no sense: if we have a default it makes no
sense to make it required. The previous behaviour when this was
specified was to force an (uncommented) value in the config with the
value, but this was only used in the test suite.
Required & Hidden makes no sense either: if it's required to be
specified we definitely don't want to hide it from the generated config
file.
These are now compile-time failures.
When running as a service node we can't do anything without a lokid rpc
URL, and we don't necessarily have a good default for it.
This makes it required so that we fail with an appropriate error message
(rather than connect timeouts) if it is not specified.
At some point between 0.9.9 and 0.9.10 we removed the printing of option
names when a value doesn't have a default, but this means the config is
littered with things like:
# This option sets the greater foo value.
with no actual option name printed out when there is no default.
This fixes it by always printing the option name in such a case, just
with an empty value, e.g.:
# This option sets the greater foo value.
#big-foo=
certain files needed to include either fstream and our shim for std::filesystem.
this includes fstream into our shim and includes this shim in places
that require fstream. this is done because some toolchains (cough
cough broke af arch linux amalgums) can have weird subsets of the
requirements of C++17 that overlap, except when they dont, denoted by
unknowable undisclosed circumstances.
this issue was reported by a user in the wild, and this fixes it.
previously we had a checking style function that passes in an optional
defaulting to nullopt as a micro optimzation, this makes the code
unnessarily obtuse.
simplify this by splitting up into 2 functions,
one for getting the unique endpoints and one for checking if the
number of them is above the minimum.
add overload for ReadyToDoLookup() that checks against constant but
can do more in the future if desired to reduce the burden on future contributors.
Query->Cancel() will remove the Query, but that introduces a race
condition where unbound may still try to invoke the callback (with a
no-longer-valid pointer) if we do it before the ub_ctx_delete call.
Move to it afterwards so that we only cancel things that unbound didn't
Occasionally during shutdown windivert will crash because a thread tries
sending after we've called wd::shutdown, which isn't allowed. Add an
atomic bool to prevent this.
we were calling llarp::Context::HandleSignal from a non mainloop
thread when running as a win32 service. this caused issues with a non
clean destruction.
call our signal handler instead of llarp::Context::HandleSignal
If wintun fails it seems to take about 15s, so extend the startup
timeout so that it can fail gracefully (and let us clean up before
exiting).
Also refactors the timeouts to chrono constants.
Fixes windows shutdown crashes:
- windivert wasn't handling an ERROR_NO_DATA, which it gets when
finished handling everything after a shutdown.
- wintun ReadPacket still gets invoked after end_session is called, but
shouldn't be. This adds an atomic<bool> to early return.
- fixes up some settings we send for windows service manager notify
- win32_platform.cpp is dead
- win32_platform.hpp is useless
Style changes from clang-tidy warnings:
- remove `virtual` from some definitions that already have `override`
- remove virtual destructor from NetworkInterface because it already has
a virtual destructor via the base type (and clang-tiny warns about it)
the win32 and sd_notify components provided a disjointed set of
similar high level functionality so we consolidate these duplicate
code paths into one that has the same lifecycle regardless of platform
to reduce complexity of this feature.
this new component is responsible for reporting state changes to the
system layer and optionally propagating state change to lokinet
requested by the system layer (used by windows service).
We're defining formats for std::chrono types, which feels wrong (because
fmt itself also has these), so just replace them with functions:
short_time_from_now(...) gives a short "in 14m12s" or "5.123s ago" time
span relative to now, given a time point. Precision gets reduced for
larger deviations from now (e.g. "4h12m ago").
ToString(Duration_t) gives a string such as "-3h22m02.123s" for a
duration.
The time_delta<T> was using the wrong duration type when formatting, so
was outputting millisecond precision in the systemd status string which
is pointless (and unintended).
The iterator here to skip an obsolete bootstrap wasn't properly
reassigning the iterator, so "didn't work" (though why it was hanging
for me is entirely non-obvious).
Also refactored it to simplify/clarify it a bit.
- Move logging initialization to early in Configure rather than at the
end of FromConfig so that we can add debug logging inside
Configure/FromConfig/etc.
- add said debug logging to Configure/FromConfig/etc.
Without this, old config (with now-irrelevant settings) won't work in
newer lokinet, making lokinet fatal error on startup if one of the
no-longer-used options is still present.
when read/writing a .loki privkey file we dont rewind a llarp_buffer_t
after use. this is an argument in favor of just removing that type
from the code entirely.
fixes by using 2 distinct locally scoped llarp_buffer_t, one for read,
one for write.
Currently (from a recent PR) we aren't pinging oxend if not active, but
that behaviour ended up being quite wrong because lokinet needs to ping
even when decommissioned or deregistered (when decommissioned we need
the ping to get commissioned again, and if not registered we need the
ping to get past the "lokinet isn't pinging" nag screen to prepare a
registration).
This considerably revises the pinging behaviour:
- We ping oxend *unless* there is a specific error with our connections
(i.e. we *should* be establishing peer connections but don't have any)
- If we do have such an error, we send a new oxend "error" ping to
report the error to oxend and get oxend to hold off on sending uptime
proofs.
Along the way this also changes how we handle the current node state:
instead of just tracking deregistered/decommissioned, we now track three
states:
- LooksRegistered -- which means the SN is known to the network (but not
necessarily active or fully staked)
- LooksFunded -- which means it is known *and* is fully funded, but not
necessarily active
- LooksDecommissioned -- which means it is known, funded, and not
currently active (which implies decommissioned).
The funded (or more precisely, unfunded) state is now tracked in
rc_lookup_handler in a "greenlist" -- i.e. new SNs that are so new (i.e.
"green") that they aren't even fully staked or active yet.
This aligns service node updating logic a bit closer to what happens in
storage server, and should make it a bit more resilient, hopefully
tracking down the (off-Github) reported issue where lokinet sometimes
doesn't see itself as active.
- Initiate a service node list update in the 30s timer lokinet ping
timer (in case we miss a block notify for some reason); although this
is expensive, the next point mitigates it:
- Retrieve the block hash with the SN state update, and feed it back
into the next get_service_nodes call (as "poll_block_hash") so that
oxend just sends back a mostly-empty response when the block hasn't
changed, allowing both oxend and lokinet to skip nearly all of the
work of a service node list update when the block hasn't changed since
the last poll. (This was already partially implemenated--we were
already looking for "unchanged"--but without a block hash to get from
and pass back to oxend we'd never actually get an "unchanged" result).
- Tighten up the service node list handling by moving the "unchanged"
handling into the get_service_nodes response handler: this way the
HandleNewServiceNodeList function is only handling the list but not
the logic as to whether there actually is a new list or not.
Lots and lots of places in the code had broken < operators because they
are returning something like:
foo < other.foo or bar < other.bar;
but this breaks both the strict weak ordering requirements that are
required for the "Compare" requirement for things like
std::map/set/priority_queue.
For example:
a = {.foo=1, .bar=3}
b = {.foo=3, .bar=1}
does not have an ordering over a and b (both `a < b` and `b < a` are
satisfied at the same time).
This needs to be instead something like:
foo < other.foo or (foo == other.foo and bar < other.bar)
but that's a bit clunkier, and it is easier to use std::tie for tuple's
built-in < comparison which does the right thing:
std::tie(foo, bar) < std::tie(other.foo, other.bar)
(Initially I noticed this in SockAddr/sockaddr_in6, but upon further
investigation this extends to the major of multi-field `operator<`'s.)
This fixes it by using std::tie (or something similar) everywhere we are
doing multi-field inequalities.
If we get back an IPv6 address as the first gateway then we won't have
the expected IPv4 gateway that the route poker needs to operate.
This iterates through them separately so that we treat the IPv4 and IPv6
sides of an address as separate interfaces which should allow the route
poker to find the one it wants (and just skip the IPv6 one).
DRY a chunk of repeated code for finding a free private range.
Also fix it so that it will consider 10.255.0.1/16 and 192.168.255.1/24
(previously it would only check up to octet 254).
If running as a service node, we ping core on a regular interval to
inform it we're running and in a good state. If we're an active
(not decommissioned or deregistered) service node and have too few
peers and thus we're not actually connected to lokinet, we should skip
that ping so core doesn't think we're ok.
Adds a fallback bootstrap file path parameter to CMake, specify
-DBOOTSTRAP_SYSTEM_PATH="/path/to/file" to use.
Adds a list of (currently 1) obsolete bootstrap RouterIDs to check
bootstrap RCs against. Will not use bootstrap RCs if they're on that
list.
Log an error periodically if we appear to be an active service node but
have fewer than a set number (5) known peers.
Bumps oxen-logging version for literal _format.
No more llarp_buffer_t here!
(I was tracking down a segfault which led me in here and it was easier
to rewrite this to use bt_dict_{consumer,producer} than to decipher all
the cursed llarp_buffer_t and bencode callback nest).
We have basically this same bit of code in tons of places; consolidate
it into llarp::util::slurp_file/llarp::util::dump_file.
Also renames all the extra junk that crept into llarp/util/fs.hpp out of
there into llarp/util/file.hpp instead.
- Accept empty string or `null` for token to mean "no token."
- Accept `null` for range to mean "default range."
- Don't use a default range (::0/0) in lokinet-vpn because this will
fail if IPv6 ranges aren't supported on the platform (e.g. on
Windows), and isn't necessary: if we omit it then the rpc code already
uses ::0/0 or 0.0.0.0/0 by default, as needed.
- ReconfigureDNS wasn't returning the old servers; made it void instead
(the Apple code can just store a copy of the original upstream
servers instead).
- Reconfiguring DNS reset the unbound context but didn't replace it, so
a Down()/Up() would crash.
- Simplify Resolver() destructor to just call Down(), and make it final
just so that no one tries to inherit from us (so that calling a
virtual function from the destructor is safe).
- Rename CancelPendingQueries() to Down(); the former cancelled but also
shut down the object, so the name seemed a bit misleading.
- Rename SetInternalState in Resolver_Base to ResetResolver, so that we
aren't conflicting with ResetInternalState from Endpoint (which was a
problem because TunEndpoint inherited from both; it could be resolved
through the different argument type if we removed the default, but
that seems gross).
- Make Resolver use a bare unbound context pointer rather than a
shared_ptr; since Resolver (now) entirely manages it already we don't
need an extra management layer, and it saves a bunch of `.get()`s.
On Apple, the network extension is outside the tunnel routing, so we
cannot have libunbound talk directly to upstream (it would leak DNS when
exit mode is enabled). Instead unbound *always* talks to a localhost
port where we have a "dns trampoline" that takes UDP packets and shoves
them through the tunnel.
We were doing that already, but recent changes here were overwriting the
libunbound settings with.
This also moves the upstream DNS configuration part of `Up()` into its
own method.
We don't have a resolver on macos, so we were running through this loop
with fails == 0 == m_Impls.size() and throwing, crashing the process.
Early return to avoid the failure and fix macos crash.
Apple supports anything here that Clang supports and should have them
set the same as everywhere else.
Most importantly this gives apple the -Wno-deprecated-declarations flag
which has been driving me nuts on macos.
This also version-gates the -Wno-deprecated-declarations so that it
will turn on again when we bump the version beyond .10.