The race was detected in CI and locally when running with -race.
It happened between the following calls:
WARNING: DATA RACE
Write at 0x00c0003e6638 by goroutine 1374:
runtime.racewrite()
<autogenerated>:1 +0x1e
github.com/lightninglabs/loop/sweepbatcher.(*batch).Wait()
sweepbatcher/sweep_batch.go:463 +0x6e
github.com/lightninglabs/loop/sweepbatcher.(*Batcher).Run.func1()
sweepbatcher/sweep_batcher.go:272 +0x10e
Previous read at 0x00c0003e6638 by goroutine 1388:
runtime.raceread()
<autogenerated>:1 +0x1e
github.com/lightninglabs/loop/sweepbatcher.(*batch).monitorConfirmations()
sweepbatcher/sweep_batch.go:1144 +0x285
github.com/lightninglabs/loop/sweepbatcher.(*batch).handleSpend()
sweepbatcher/sweep_batch.go:1309 +0x10e4
github.com/lightninglabs/loop/sweepbatcher.(*batch).Run()
sweepbatcher/sweep_batch.go:526 +0xb04
github.com/lightninglabs/loop/sweepbatcher.(*Batcher).spinUpBatch.func1()
sweepbatcher/sweep_batcher.go:455 +0xbd
The race was caused because wg.Add(1) and wg.Wait() were running from different
goroutines (one goroutine was running batch.Run() and another - batcher.Run()).
To avoid this scenario, wg.Wait() call was moved into batch.Run() call, so it
waits itself for its children goroutines, after which the channel b.finished
is closed, and it serves a signal for external waiters (the batcher, calling
batch.Wait()).
Also the channel batch.stopped was renamed to batch.stopping to better reflect
its nature.
Added TestSweepBatcherCloseDuringAdding to make sure adding a sweep during
shutting down does not cause a crash. The test did not catch the original
race condition.
Changed argument of function NewBatcher from LoopOutFetcher to SweepFetcher
(returning new public type SweepInfo).
This change is backwards-incompatible on the package layer, but nobody seems
to use the package outside of Loop.
To use NewBatcher inside Loop, turn loopdb into SweepFetcher using
function NewSweepFetcherFromSwapStore.
Previously we may have stored negative costs for some loop out swaps
which this commit attempts to correct by fetching all completed swap,
calculating the correct costs and overriding them in the database.
This data is not used by Batcher since commit
"sweepbatcher: load swap from loopdb, not own store".
Now sweepbatcher.Store depends only on tables sweeps and sweep_batches
and does not depend on swaps, loopout_swaps and htlc_keys, making it
easier to reuse.
Method Store.GetBatchSweeps provides data from tables outside of sweepbatcher:
swaps, loopout_swaps, htlc_keys. It makes it harder to reuse. Batcher already
has a straightforward way to get swap data: LoopOutFetcher interface (loopdb).
In this commit I switch the source of data from the field returned by Store
(LoopOut) to loading independently by calling LoopOutFetcher.FetchLoopOutSwap.
This failure became normal recently:
=== RUN TestAutoLoopInEnabled
autoloop_testcontext_test.go:318:
Error Trace: /home/runner/work/loop/loop/liquidity/autoloop_testcontext_test.go:318
/home/runner/work/loop/loop/liquidity/autoloop_test.go:804
Error: Not equal:
expected: 80000
actual : 160000
Test: TestAutoLoopInEnabled
autoloop_testcontext_test.go:318:
Error Trace: /home/runner/work/loop/loop/liquidity/autoloop_testcontext_test.go:318
/home/runner/work/loop/loop/liquidity/autoloop_test.go:804
Error: Not equal:
expected: 160000
actual : 80000
Test: TestAutoLoopInEnabled
autoloop_testcontext_test.go:343:
Error Trace: /home/runner/work/loop/loop/liquidity/autoloop_testcontext_test.go:343
/home/runner/work/loop/loop/liquidity/autoloop_test.go:804
Error: Should be true
Test: TestAutoLoopInEnabled
The root cause is them the order of items in c.quoteRequestIn depends on the
order of loopInBuilder.buildSwap calls, which depends on the order of channel
handling in Manager.SuggestSwaps, which depends on the order of map traversal,
which is not determitistic.
Since in the test all the amounts are different, I used amount as a key and
put the expected calls into a map using amount as a key. When I extract an
item from c.quoteRequestIn channel, I find the corresponding item in the map
and remove it. All other logic is preserved.
It used to be set to default (defaultBatchConfTarget = 12) which
could in theory affect fee rate if updateRbfRate() and publish()
were not called before the batch was saved. (Unlikely scenario.)
Method sweepbatcher.Store.FetchBatchSweeps (implementation using real DB) runs
JOIN query to load LoopOut from swaps table. Now the mock does the same.
It is needed to test store and load scenarios in tests.
If the sweep was successfully updated in the batch, no need to
try to add it to all other batches.
Added a test reproducing adding a sweep to both batches without this change.