-- Moved all RPCServer initialization logic to rpcserver constructor
-- Fixed config logic, fxn binding to rpc address, fxn adding rpc cats
-- router hive failed CI/CD resulting from outdated reference to rpcBindAddr
-- ipc socket as default hidden from windows (for now)
refactored config endpoint
- added rpc call script (contrib/omq-rpc.py)
- added new fxns to .ini config stuff
- added delete .ini file functionality to config endpoint
- added edge case control for config endpoint
add commented out line in clang-form for header reorg later
* Updated RpcServer Initialization and Logic
-- Moved all RPCServer initialization logic to rpcserver constructor
-- Fixed config logic, fxn binding to rpc address, fxn adding rpc cats
-- router hive failed CI/CD resulting from outdated reference to rpcBindAddr
-- ipc socket as default hidden from windows (for now)
Previously oxen-logging was erroneously hard-coded to use the target
"lokinet" for system logs. Obviously this is wrong for anything else
which uses oxen-logging and the system log. This changes our call to
add_sink to pass "lokinet" as the target rather than the config
filename, and updates oxen-logging to use that argument correctly.
Default & Required makes no sense: if we have a default it makes no
sense to make it required. The previous behaviour when this was
specified was to force an (uncommented) value in the config with the
value, but this was only used in the test suite.
Required & Hidden makes no sense either: if it's required to be
specified we definitely don't want to hide it from the generated config
file.
These are now compile-time failures.
When running as a service node we can't do anything without a lokid rpc
URL, and we don't necessarily have a good default for it.
This makes it required so that we fail with an appropriate error message
(rather than connect timeouts) if it is not specified.
At some point between 0.9.9 and 0.9.10 we removed the printing of option
names when a value doesn't have a default, but this means the config is
littered with things like:
# This option sets the greater foo value.
with no actual option name printed out when there is no default.
This fixes it by always printing the option name in such a case, just
with an empty value, e.g.:
# This option sets the greater foo value.
#big-foo=
certain files needed to include either fstream and our shim for std::filesystem.
this includes fstream into our shim and includes this shim in places
that require fstream. this is done because some toolchains (cough
cough broke af arch linux amalgums) can have weird subsets of the
requirements of C++17 that overlap, except when they dont, denoted by
unknowable undisclosed circumstances.
this issue was reported by a user in the wild, and this fixes it.
previously we had a checking style function that passes in an optional
defaulting to nullopt as a micro optimzation, this makes the code
unnessarily obtuse.
simplify this by splitting up into 2 functions,
one for getting the unique endpoints and one for checking if the
number of them is above the minimum.
add overload for ReadyToDoLookup() that checks against constant but
can do more in the future if desired to reduce the burden on future contributors.
Query->Cancel() will remove the Query, but that introduces a race
condition where unbound may still try to invoke the callback (with a
no-longer-valid pointer) if we do it before the ub_ctx_delete call.
Move to it afterwards so that we only cancel things that unbound didn't
Occasionally during shutdown windivert will crash because a thread tries
sending after we've called wd::shutdown, which isn't allowed. Add an
atomic bool to prevent this.
we were calling llarp::Context::HandleSignal from a non mainloop
thread when running as a win32 service. this caused issues with a non
clean destruction.
call our signal handler instead of llarp::Context::HandleSignal
If wintun fails it seems to take about 15s, so extend the startup
timeout so that it can fail gracefully (and let us clean up before
exiting).
Also refactors the timeouts to chrono constants.
Fixes windows shutdown crashes:
- windivert wasn't handling an ERROR_NO_DATA, which it gets when
finished handling everything after a shutdown.
- wintun ReadPacket still gets invoked after end_session is called, but
shouldn't be. This adds an atomic<bool> to early return.
- fixes up some settings we send for windows service manager notify
- win32_platform.cpp is dead
- win32_platform.hpp is useless
Style changes from clang-tidy warnings:
- remove `virtual` from some definitions that already have `override`
- remove virtual destructor from NetworkInterface because it already has
a virtual destructor via the base type (and clang-tiny warns about it)
the win32 and sd_notify components provided a disjointed set of
similar high level functionality so we consolidate these duplicate
code paths into one that has the same lifecycle regardless of platform
to reduce complexity of this feature.
this new component is responsible for reporting state changes to the
system layer and optionally propagating state change to lokinet
requested by the system layer (used by windows service).
We're defining formats for std::chrono types, which feels wrong (because
fmt itself also has these), so just replace them with functions:
short_time_from_now(...) gives a short "in 14m12s" or "5.123s ago" time
span relative to now, given a time point. Precision gets reduced for
larger deviations from now (e.g. "4h12m ago").
ToString(Duration_t) gives a string such as "-3h22m02.123s" for a
duration.
The time_delta<T> was using the wrong duration type when formatting, so
was outputting millisecond precision in the systemd status string which
is pointless (and unintended).
The iterator here to skip an obsolete bootstrap wasn't properly
reassigning the iterator, so "didn't work" (though why it was hanging
for me is entirely non-obvious).
Also refactored it to simplify/clarify it a bit.
- Move logging initialization to early in Configure rather than at the
end of FromConfig so that we can add debug logging inside
Configure/FromConfig/etc.
- add said debug logging to Configure/FromConfig/etc.
Without this, old config (with now-irrelevant settings) won't work in
newer lokinet, making lokinet fatal error on startup if one of the
no-longer-used options is still present.
when read/writing a .loki privkey file we dont rewind a llarp_buffer_t
after use. this is an argument in favor of just removing that type
from the code entirely.
fixes by using 2 distinct locally scoped llarp_buffer_t, one for read,
one for write.
Currently (from a recent PR) we aren't pinging oxend if not active, but
that behaviour ended up being quite wrong because lokinet needs to ping
even when decommissioned or deregistered (when decommissioned we need
the ping to get commissioned again, and if not registered we need the
ping to get past the "lokinet isn't pinging" nag screen to prepare a
registration).
This considerably revises the pinging behaviour:
- We ping oxend *unless* there is a specific error with our connections
(i.e. we *should* be establishing peer connections but don't have any)
- If we do have such an error, we send a new oxend "error" ping to
report the error to oxend and get oxend to hold off on sending uptime
proofs.
Along the way this also changes how we handle the current node state:
instead of just tracking deregistered/decommissioned, we now track three
states:
- LooksRegistered -- which means the SN is known to the network (but not
necessarily active or fully staked)
- LooksFunded -- which means it is known *and* is fully funded, but not
necessarily active
- LooksDecommissioned -- which means it is known, funded, and not
currently active (which implies decommissioned).
The funded (or more precisely, unfunded) state is now tracked in
rc_lookup_handler in a "greenlist" -- i.e. new SNs that are so new (i.e.
"green") that they aren't even fully staked or active yet.
This aligns service node updating logic a bit closer to what happens in
storage server, and should make it a bit more resilient, hopefully
tracking down the (off-Github) reported issue where lokinet sometimes
doesn't see itself as active.
- Initiate a service node list update in the 30s timer lokinet ping
timer (in case we miss a block notify for some reason); although this
is expensive, the next point mitigates it:
- Retrieve the block hash with the SN state update, and feed it back
into the next get_service_nodes call (as "poll_block_hash") so that
oxend just sends back a mostly-empty response when the block hasn't
changed, allowing both oxend and lokinet to skip nearly all of the
work of a service node list update when the block hasn't changed since
the last poll. (This was already partially implemenated--we were
already looking for "unchanged"--but without a block hash to get from
and pass back to oxend we'd never actually get an "unchanged" result).
- Tighten up the service node list handling by moving the "unchanged"
handling into the get_service_nodes response handler: this way the
HandleNewServiceNodeList function is only handling the list but not
the logic as to whether there actually is a new list or not.
Lots and lots of places in the code had broken < operators because they
are returning something like:
foo < other.foo or bar < other.bar;
but this breaks both the strict weak ordering requirements that are
required for the "Compare" requirement for things like
std::map/set/priority_queue.
For example:
a = {.foo=1, .bar=3}
b = {.foo=3, .bar=1}
does not have an ordering over a and b (both `a < b` and `b < a` are
satisfied at the same time).
This needs to be instead something like:
foo < other.foo or (foo == other.foo and bar < other.bar)
but that's a bit clunkier, and it is easier to use std::tie for tuple's
built-in < comparison which does the right thing:
std::tie(foo, bar) < std::tie(other.foo, other.bar)
(Initially I noticed this in SockAddr/sockaddr_in6, but upon further
investigation this extends to the major of multi-field `operator<`'s.)
This fixes it by using std::tie (or something similar) everywhere we are
doing multi-field inequalities.
If we get back an IPv6 address as the first gateway then we won't have
the expected IPv4 gateway that the route poker needs to operate.
This iterates through them separately so that we treat the IPv4 and IPv6
sides of an address as separate interfaces which should allow the route
poker to find the one it wants (and just skip the IPv6 one).
DRY a chunk of repeated code for finding a free private range.
Also fix it so that it will consider 10.255.0.1/16 and 192.168.255.1/24
(previously it would only check up to octet 254).
If running as a service node, we ping core on a regular interval to
inform it we're running and in a good state. If we're an active
(not decommissioned or deregistered) service node and have too few
peers and thus we're not actually connected to lokinet, we should skip
that ping so core doesn't think we're ok.
Adds a fallback bootstrap file path parameter to CMake, specify
-DBOOTSTRAP_SYSTEM_PATH="/path/to/file" to use.
Adds a list of (currently 1) obsolete bootstrap RouterIDs to check
bootstrap RCs against. Will not use bootstrap RCs if they're on that
list.
Log an error periodically if we appear to be an active service node but
have fewer than a set number (5) known peers.
Bumps oxen-logging version for literal _format.
No more llarp_buffer_t here!
(I was tracking down a segfault which led me in here and it was easier
to rewrite this to use bt_dict_{consumer,producer} than to decipher all
the cursed llarp_buffer_t and bencode callback nest).
We have basically this same bit of code in tons of places; consolidate
it into llarp::util::slurp_file/llarp::util::dump_file.
Also renames all the extra junk that crept into llarp/util/fs.hpp out of
there into llarp/util/file.hpp instead.
- Accept empty string or `null` for token to mean "no token."
- Accept `null` for range to mean "default range."
- Don't use a default range (::0/0) in lokinet-vpn because this will
fail if IPv6 ranges aren't supported on the platform (e.g. on
Windows), and isn't necessary: if we omit it then the rpc code already
uses ::0/0 or 0.0.0.0/0 by default, as needed.
- ReconfigureDNS wasn't returning the old servers; made it void instead
(the Apple code can just store a copy of the original upstream
servers instead).
- Reconfiguring DNS reset the unbound context but didn't replace it, so
a Down()/Up() would crash.
- Simplify Resolver() destructor to just call Down(), and make it final
just so that no one tries to inherit from us (so that calling a
virtual function from the destructor is safe).
- Rename CancelPendingQueries() to Down(); the former cancelled but also
shut down the object, so the name seemed a bit misleading.
- Rename SetInternalState in Resolver_Base to ResetResolver, so that we
aren't conflicting with ResetInternalState from Endpoint (which was a
problem because TunEndpoint inherited from both; it could be resolved
through the different argument type if we removed the default, but
that seems gross).
- Make Resolver use a bare unbound context pointer rather than a
shared_ptr; since Resolver (now) entirely manages it already we don't
need an extra management layer, and it saves a bunch of `.get()`s.
On Apple, the network extension is outside the tunnel routing, so we
cannot have libunbound talk directly to upstream (it would leak DNS when
exit mode is enabled). Instead unbound *always* talks to a localhost
port where we have a "dns trampoline" that takes UDP packets and shoves
them through the tunnel.
We were doing that already, but recent changes here were overwriting the
libunbound settings with.
This also moves the upstream DNS configuration part of `Up()` into its
own method.
We don't have a resolver on macos, so we were running through this loop
with fails == 0 == m_Impls.size() and throwing, crashing the process.
Early return to avoid the failure and fix macos crash.
Apple supports anything here that Clang supports and should have them
set the same as everywhere else.
Most importantly this gives apple the -Wno-deprecated-declarations flag
which has been driving me nuts on macos.
This also version-gates the -Wno-deprecated-declarations so that it
will turn on again when we bump the version beyond .10.
We were requiring `->Next` be true, which means we skipped the last (and
often only) entry of the linked lists and so never properly found the
gateway.
- We need to pass a flag to get Windows to include gateway info.
- Refactor it to use microsoft's recommended magic default 15000 buffer
size and repeat in a loop a few times until it works. Developers,
developers, developers, developers!
- a `static` is less verbose and otherwise identical to an empty
namespace for a single declaration like this.
- operator== on two optionals already does exactly what the `is_equal`
lambda here is doing.
- formatting
- windivert was being set up *before* DNS is set up, so the DNS port was
nullopt and thus we couldn't properly identify upstream DNS traffic.
- close() doesn't close a socket on Windows, so the socket-bind-close
approach to get a free UDP port wasn't actually closing, and thus
unbound upstream constrained to the given port were completely
failing.
- The unbound thread was accessing the same shared_ptr instance as the
outer code, which isn't thread-safe; changed it to copy a weak_ptr
into the lambda instead.
- Exclude upstream DNS traffic in the filter rather than capturing and
reinjecting it.
The inner lambda here wasn't keeping the `Query` (`this`) alive, so
`src` wasn't valid anymore. This changes it to copy the `src`
shared_ptr into the lambda instead of capturing `this`, and fixes it.
The current code isn't working and gives a 0 (which then fails unbound
initialization). This replaces it by doing a socket+bind to find a free
port then immediately closes (but passes the port we got into unbound).
- Replaces RAII handling of DLLs with global function pointers. (We
don't unload the dll this way, but that seems unnecessary anyway).
- Simplifies code by just needing to call an init function, but not
needing to pass around an object holding the function pointers.
- Adds a templated dll loader that takes the dll and a list of
name/pointer pairs to load the dll and set the pointers in one shot.
ip_header wasn't 20 bytes on windows compilations for some unholy
reason. This restructures it to avoid the template and just use two
different structs for le/be with a condition_t for the ifdef, which
resolves it (and *also* apparently avoids the need for the pack).
Also add a static_assert to check the size.
Also do the same for ipv6.
Cast via an ordinary function pointer rather than a function pointer
reference to avoid the warning.
Also make the pointer in `Func_t` explicit rather than implicit (deduced
into the `Func_t` type) to make it clearer what is going on here.
Lots of tools struggle with non-default DNS port, so keep a listener on
127.3.2.1:53 (by default).
This required various changes to the config handling to hold a vector
(instead of an optional) of defaults and values, and now allows passing
in an array of defaults instead of just a single default.
It didn't do equality, it did "does the remaining space start with the
argument" (and so the replacement in the previous commit was broken).
This renames it to avoid the confusion and restores to what it was doing
on dev.
errno is only set if read returns < 0 and won't be set to 0 if read
succeeds, so we were bailing here frequently on successful reads
(whenever errno happened to be non-0).
This class is cursed, but also broken under gcc-12. Apply some lipstick
to get it moving again (but we really need to refactor this because it
is a mess).