lnbook/path-finding.asciidoc

.Chapter overview:
  * How path finding works in the network

Relevant questions to answer:
  * What is packet switching? What is circuit switching? Which one does LN use today?
  * In the abstract what is path finding?
  * What is dijkstra's? What modifications need to be made to apply it to this domain?
  * Why must path finding happen backwards (receiver to sender)?
  * How is the information contained in a channel update used in path finding?
  * How can errors sent during payment routing help the sender to narrow their search space?
  * What is payment splitting? How does it work? What alternatives exist?
  * What information can be sent to intermediate and the final node aside from the critical routing data?
  * What are multi-hop locks? What addition privacy and security guarantees to they offer?
  * How can the flexible onion space be used to enabled packet switching in the network?

==== Finding a path

Payments on the Lightning Network are forwarded along a path of channels from one participant to another.
Thus, a path of payment channels has to be selected.
If we knew the exact channel balances of every channel we could easily compute a payment path using any of the standard path finding algorithms taught in any computer science program.
Actually when we consider multipath payments it is rather a flow problem than a path finding problem.
Since flows consist of several paths we conveniently talk about path finding.
With exact information about channel balances available we could solve those problems in a way to optimize the fees that would have to be paid by the payer to the nodes that kindly forward the payment.
However, as discussed the balance information of all channels is and cannot be available to all participants of the network.
Thus, we need to have one or more innovative path finding strategy.
These strategies must relate closely to the routing algorithm that is used.
As we will see in the next section, the Lightning Network uses a source based onion routing protocol for routing payments.
This means in particular that the sender of the payment has to find a path through the network.
With only partial information about the network topology available this is a real challenge and active research is still being conducted into optimizing this part of the Lightning Network implementations.
The fact that the path finding problem is not fully solved for the case of the Lightning Network is a major point of criticism towards the technology.
The path finding strategy currently implemented in Lightning nodes is to probe paths until one is found that has enough liquidity to forward the payment.
While this is not optimal and certainly can be improved, it should be noted that even this simplistic strategy works well.
This probing is done by the Lightning node or wallet and is not directly seen by the user of the software.
The user might only realize that probing is taking place if the payment is not going through instantly.
The algorithm currently also does not necessarily result in the path with the lowest fees.


=== What is "Source-Based" routing and why does the Lightning Network use it?

Source-based routing is a method of path-finding where the sender (i.e. the source) plans the path from itself, through the intermediary nodes, and finally to the destination.
Once a path has been selected, the sender sends the payment to the first intermediary node, who sends it to the second intermediary node and so on, until it reaches the destination.
While a payment is in-flight along a path, the path typically does not get changed by any of the intermediary nodes, even if a shorter path or a cheaper path (in terms of routing fees) exists.

The Lightning Network uses source-based routing at the time of writing in order to protect user privacy.
As discussed in the chapter on Onion Routing, the intermediary nodes transmitting the payment are not aware of the full path of the payment; they only know the node they received it from and the node they are sending it to.

We also cannot expect the destination to find a path.
Even if it specifies a path in the invoice, that path may no longer be viable by the time the invoice is paid, which could be several minutes or several days later.
The recipient can, however, specify "routing hints" in the invoice that may assist the sender in finding a possible path.

Furthermore, source-based routing comes with some inherent drawbacks.
The sender chooses the path based on their current understanding of the topological map of the Lightning network.
As discussed in previous chapters, this map is necessarily incomplete; the sender may not be aware of all the channels, and even if they are they will almost certainly not know the latest balances in each of the channels.
And even if the sender did have a complete topological map at one point in time, the balances of channels change with every payment, and so in a short space of time the map would become obsolete.
The standard path finding mechanism with source based onion routing that is implemented in all Lightning Network implementations is the following:

. Select an arbitrary path of payment channels which connects sender and receiver of the payment and for which all channels have a capacity of at least the payment amount and accept HTLCs of this amount.
. Construct the onion from destination to sender according to the meta data (basefee, feerate, CLTV delta) of the channels.
. Send out the onion and see if the payment settles by nodes sending back preimages or if the payment fails.
. If the payment fails use this knowledge to select a different path by starting at step 1.

This means that with every attempted payment nodes actually probe the network and will also learn some information about how balances are distributed.
Implementations will usually prioritise cheaper paths or exclude channels which recently have failed.
In that sense the selection is not completely arbitrary.
Still even with such heuristics in place it could still be considered as a random process or random walk through the channel graph.
There can be several reasons why the payment may fail along the way.
For example one of the nodes becomes unreachable, doesn't have the channel balance, can't accept new htlcs, has updated its fees to a higher amount, or the channel is closed in the interim.
Furthermore, there is no guarantee that the route chosen was the cheapest in terms of fees, or if a shorter path existed.
As at the time of writing, this is a design trade-off made to protect user privacy.

=== Paths are constructed from Destination to Source
Let us assume our standard example in which Alice wants to send a payment of 100k satoshi on a path via Bob and Wei to Gloria.
The Path obviously looks like (Alice)-->(Bob)-->(Wei)-->Gloria.
However Bob and Wei will charge routing fees to forward the onion.
As you already know nodes can charge two types of fees.
First the base fee that will be charged for any successfull forwarding and settlement of and htlc.
This fee is constant and does not depend on the amount that the node is supposed to forward.
Additionally nodes might charge a freerate.
The name rate alredy indicates that this fee depends on the amount that a node is supposed to forward.
Let us for the simplicity of assume that the feerate of Bob and Wei is very expensive with 1% for Bob and 2% for Wei.
However Bob and Wei will not take a base fee to keep things simple in our example.
If Alice constructs the Onion she has to include the routing fees as the differnce of the incoming htlc and the outgoing htlc.
Let us assume she falsely computes the following to construct the onions with the routing fees.
Alice knows that 1% of 100k satoshi is 1k satoshi which she belives she should include in Bob's onion.
Similarly she knows that 1% of 100k satoshi is 2k satoshi which she belives she should include in Wei's onion.
She would find out that Bob would not forward the onion, if she believed that she would have to pay a total of 3k satoshi and construct an onion that looks like this:

----
"route": [
      {
         "id": "Bob",
         "channel": "357",
         "direction": 1,
         "satoshi": 103000,
         "dealy": 187,
      },
      {
         "id": "Wei",
         "channel": "74",
         "direction": 1,
         "satoshi": 102000,
         "dealy": 183,
      },
      {
         "id": "Gloria",
         "channel": "452",
         "direction": 0,
         "satoshi": 100000,
         "dealy": 153,
      }
   ]
}
----

The reason for Bob to not forward the onion is that he expects the incoming amount to be 1% larger then the amount he is supposed to forward.
Thus he would like to have an incoming ammount of `103020` satoshi which is 20 satoshi more than Alice sent him.
According to his fee schedual Bob will have to reject the Onion.
If Alice constructed the onion from she would have computed 1% of the to forward amount correctly as 1% from 102k satoshi which is 1020 sat.
Adding 1020 to the 102000 satoshi that Wei needs to have on his incoming channel will result in the right value of 103020 satoshi that Bob requires.
As the routing fees can increase the amount that is being forwarded even beyond the capacity of small channels it makes sense to start the construction of the onion and the pathfinding from the destination to the sender.

[NOTE]
====
While onions are also constructed from inside to outside and thus start with the destination this is not the reason why pathfinding has to start with the destination node.
====

=== Fundamentals about path finding
Finding a path through a graph is a problem modern computers can solve rather efficiently.
Mostly developers choose bredth first search (if the edges are all of equal weight) or the Dijkstra Algorithm in cases where the edges are not of equal weight.
In our case the weights of the edges could be the routing fees and only edges will be included where the capacity is larger than the amount that is to be sent.
In this basic form pathfinding on the lightning network is very simple and straight forward.
However as we have already discussed in the introduction channel balances cannot be shared with every participant every time a payment takes place as the system would not scale.
Thus our easy computer science problem for which we know a solution turns into a rather complex problem.
We now have to solve a path finding problem with only partial knowledge.
For example we know which edges might be able to forward a payment because their capacity seems big enough but we can't know it for sure
unless we try it out or ask the channel owners directly.
Even if we where able to ask the channel owners directly their balance might change by the time we have asked others, computed a path, constructed an onion and send it along.
Thus we do not only have partial information but the information we have is highly dynamic and might change before we can use it.

One general observation that everyone can easily make is that if every node along a path is able to forward a certain amount of satoshis these nodes will also be able to forward a lower amount of satoshis.
This is why many people intuitively belive that multipath payments might be a good strategy.
Instead of finding one path where every node owns a large amount of liquidity the task is split into smaller ones.
Another reason is of course that the sender of a payment might just not have the amount they wish to send in one single channel but split over several channels.
We leave it to later sections of this chapter to discuss the strengths and weaknesses of multipath payments.
However we note that multi path payments are equivalent to finding a flow between the source and the destination.
While finding flows in a graph with full knowledge that is static is computationally a little bit more heavy than computing a shortest path it still seams to be a feasable problem.
Given the reality of the Lightning Network and the fact that we do not need to compute a max flow we currently do not know if the problem is more or less difficult as finding a path.
It seems to be about equally difficult and the problems are somewhat connect as we will see in the following sections.

=== Probing based path finding algorithm on the Lightning Network
As discussed in order to reliably find a path nodes would need to know the balance of remote payment channels and the balances would have to be static.
As both is not given nodes currently use a probing based algorithm.
In its most basic form the algorithm works as follows:

. Select a random path to the destination node
. Construct and send the onion
. wait for the response of the onion
. If response == preimage -> success
. If response == feilure -> start over from step one.

Nodes will use various sources of information to improve the selection of a random path.
The main source of information is the gossip protocol.
From the gossip protocol a node learns which other nodes exist and which channels have been opened.
This will basically provide a network view that can be used to run graph algorithms that generate plausible paths.
For example a breadth first seach traversal.
The graph algorithm will usually be constrained to channels that have at least the capacity of the amount to be sent.
In practice due to channel reserve and the assumption that the capacity in the channel will not be sitting completely on one side it is saver to prefer larger channels.

The second source of information is the blockchain itself.
If channels are closed this is not announced via the gossip protocol.
However as the funding transaction is encoded by the short channel id of the channel and as it will be spent if the channel is closed nodes have to use this information to update their knowledge about the network of channels.

Another source of information are the past payments themselves.
Onions can return with errors.
Knowing for example that the third hop along a path returns an error means that the first two channels had enough balance and that the third channel - depending on the error - did not have enough balance.
Such edges can be removed from the set of edges similarly to the edges that do not have enough capacity.
Similarly nodes could use such information from previous payment attempts.
It is important that nodes are carefull with this data.
As the capacity information of channels from the gossip protocol and blockchain data is verifiably correct the data from our third source of information can be incorrect.
Nodes might just send an error back because they do not want to reveal balance information.
Also the data might just change over time as the balances in the lightning network are not static and changing with every payment attempt that is being made.
Thus nodes should only use such data if it is not to far in the past or use it only with a certain confidence.

The fourth source of information that the node will use are the routing hints in the BOLT 11 invoice.
Remember that a regular payment process starts with the person who wants to receive money coming up with a random secret and hashing it as the payment hash.
This hash is usually transported to the sender via an invoice.
Invoices usually contain some meta data and in particular routing hints.
This is necessary if the person who wants to be paid does not have announced channels. In that case it will speciefie some unannounced channels within the invoice.
Otherwise the payer would not even be able to find a path to the "hidden" node.
Routing hints might also be used by the receiving node to indicate which public channels have enough inbound capacity for the payment and thus the ability to receive funds.
In general the further away from the originating peer the payment goes the more likely it becomes to select a channel with insufficient balance.
Thus indicating on which channels a node wishes to receive funds would actually be quite nice for the sender.


=== Improvements on Source based onion routing

The probing based approach that is used in the Lightning Network has several flaws.
Sending out an onion usually takes a certain amount of time.
The time depends on how many hops the onion is supposed to be forwarded and of course on the speed of nodes processing the onion and the topology on the web.
In the following diagram you can see how the time for onions to return in general increase with the amount of hops that the onion has encoded.

[[pathfinding-probing]]
.Research showing the times that onions take to return depending on the distance (CC-BY-SA Tikhomirov, Sergei & Pickhardt, Rene & Biryukov, Alex & Nowostawski, Mariusz. (2020). Probing Channel Balances in the Lightning Network.)
image:images/probingtimes.ppm[]
Of course this diagram was just a snapshot from an experiment in early 2020 and things might change.
We can learn from the Diagram that payments can take several seconds while the node tries to probe several paths.
This is due to the fact that the fact that single onions can easily take a few seconds to return and a sender might have to send several onions in a row while probing for a sucessfull path.
in generall this will still be much faster than waiting for confirmations on a bitcoin block but it is not sufficient in an environment where payments need to settle fast.
If people stand in a line at the cash register for their groceries this would be such a setting.
Thus lightnign developers


==== Probing based improvements
The last source of information that nodes could use is to probe the network themselves.
Instead of making the actual payment nodes could send out many fake payments which are onions to a random payment hash.
Given the properties of the hashfunction it is save to assume that noone would know the preimage.
In that sense the payment will only fail at the destination and nodes can learn a lot about the balances.
Of course this produces spam and heavy load on the network and it is not recommended that nodes do this.
However participants cannot really be stopped from doing this.
unless channel partners see a lot of traffic coming on a channel which always fails and never settles.
In this case channel partners could decide to close the channel.

[Note]
====
We want you to understand that Lightning Network by design does not have perfect privacy.
While a lot of information is not easily accessible every time a path is probed the node learns something about the state of the network at that point in time.
====

We note that one should not send two onions at the same time with the same payment hash for which the recipient knows the preimage.
As long as the onion is being processed and routed the payment is out of controll of the sender.
In case two onions are sent at the same time the recipient could very well release the preimage twice and get paid twice.
That was the reason why probing should be conducted with a fake payment hash.
in that case the sender can probe concurrently as long as the sender has enough funds to pay for all the HTLCs.
However there is a problem.
Assume an onion returns indicating that the payment hash was unknown to the recipient but otherwise the path would have been possible.
The sender would now use this exact path to send the payment with the corrent payment hash.
Meanwhile the balances of some channels along the path might have changed and the path does not exist anymore.
In this case the sender would have to start from the beginning all over again.
Admittedly the risk for this to happen is rather small but there is a chance.

A better way and potential improvement for the future of the Lightning network are stuckless payments.
There is a proposal for a system called stuckless payments that receives high appriciation by developers.
This proposal will probably not be implemented before the lightning network switches from Hashed Timelocked Contracts to Point time locked contracts which won't come before Schnorr Signatures are activated on the Bitcoin Network.
What stuckless payments can do is to give controll back to the sender of an Onion.
Without explaining the details here we just say that the sender can now cancle an onion.
This is great for redundant and concurrent pathfinding.
The sender could send out several real onions.
The first ones that arrives at the recipient will be settled.
All others will be cancled.
This increases the usuability of the Lightning Network on several levels.
One advantage is that the sender can try several paths at the same time.
The second advantage is that the path is locked after it is found and until it is settled.
This means that the sender can either cancle the onion or help to release the preimage (as senders have to do with the stuckless payment construction)
In particular the probed path cannot change or used by other routing requests between probing and setting up the htlcs that are used to fullfill the request.
The time for a a successfull payment will reduce drastically.
The distadvante is that the sender has to lock more bitcoin during the path finding process.
Due to timeouts these bitcoin can be locked for a couple of days before they can be used again.
This should not happen too often.
Also it utilizes more resources of other nodes.


==== Multipath payments
Everyone can easily make the following observation:

----
Let's say your node has discovered a path along which a certain amount of Satoshis for example 100k could be routed.
Then any onion along that path on the same time with an lower amount of Satoshis would also have been successfull.
One can easily conclude that lower amounts have a higher likelyhood to be routed successfully to the destination than larger amounts.
----

Researchers and developers have already tested and confirmed this emperically over and over again.

With this assumption in mind it seems natural to split a payment amount and send several smaller payments along various paths.
With if a small payment fails it will be retried and probed just as one would do with a single larger payment.
While the main idea is very easy to understand we want to discuss the details, advantages and disadvantages of this mechanism in the following.

Usually a receiving node will see an incoming HTLC for a certain payment hash.
If the onion signals that the node is the final recipient and that the amount of the HTLC is less than the one specified in the invoice the node would not accept the HTLC and send back an erring onion.
However with the TLV format of onions a sender can specify the total_amount of the payment which can be bigger than the HTLC.
The recipient can safely accept the HTLC and wait for more HTLCs to arrive.
In this way all parts of the payment will use the same payment hash.
The recipient will only release the preimage if the sum of all incoming HTLCs is at least the speciefied payment amount.

[Note]
====
**Multi path or multi part payments?** You might have realized that we named the chapter multipath payments but mentioned in the last paragraph that such a payment consists of several parts.
The protocol specification uses the abbrivation MPP for multi part payments.
This is in fact always correct as all parts could technically - though this would not make much sense - be delivered over the same path.
As we are introducing MPP in the pathfinding section of the book and as they are also used for path finding we take the liberty to also abbriviate multi path payments with MPP.
====

It is important to recognize that a node that forwards HTLCs via onions does not have to bother if the payment is a single payment or one of several multi part payments.
The only node who needs to be ready to accept multi part payments is the receiving node.
In the BOLT 11 invoice there is space for feature bits.
If ia node wishes to accept multipart payments it has to signal this by setting the corresponding feature bit (16 / 17).
If a node wishes to send a multi part payment it can also do so if the receiving node has signaled their willingess to accept such payments.
Currently there is no way for routing nodes to split the payment amount and onion into several parts or merge several incoming HTLCs into a single path.

Besides the potentially better chances to find smaller routes the sender might want to use a multipart payment because it does not have enough balance in a single payment channel.
If the channel had enough capacity this could be resolved with a circular rebalancing - which we will discuss in the next section.
However if the payment amount is bigger than the largest capacity of a channel that the sender has the sender can only pay the invoice if the recipient allows and supports multipart payments.
Similarly a recipient might not be able to receive a single payment of the requested amount and would have the interest of signaling multi part payments.
Luckily nodes will do this automatically and practially always signal the support for multi part payments if the implementation supports this feature.
The standard Lightning Network implementations which follow BOLT 1.1 all support this feature.

Multipart payments will almost always be more expensive than a single payment.
You will remember that the fees that routing nodes charge consist of a fee rate and of a base fee.
The total fee rate of a multipart payment stays roughly the same as a single payment.
However the base fee is added independent of the amount making multipart payments in most cases more expensive.
As the sender pays the fees the sender will not necessarily have the interest of splitting the payment in too many parts.
Thus implementations usually integrate multi part payments into the probing based approach.
For example after a single payment would not got through the node might split the amount into two payments and try a multipart payment with smaller amounts.
Those mulitpart payments could again be split down if they are not successfull along a route.

The advantages of multi part payments are quite obvious:

. bigger payment sizes
. higher success rates

On the other side we have a couple of downsides:

. Higher fees
. More HTLCs locked / more load on the network
. Potentially longer times. If only a single part gets stuck all the other HTLCs in flight have to wait locking liquidity of many nodes for a potentially longer time
. Leaks more information as the network is practically probed more heavily.


==== Rebalancing
* (proactive / reactive) Rebalancing
* Imbalance measures
* goals for rebalancing (low Gini coefficient and not 50 / 50)
* optimization problem / game theory
* JIT Routing

==== Optimizations for Multi path payments
The rebalancing goal with local channel balance coefficients could actually be integrated into multi path payments.
Thus if a node decides to send a payment along several paths it could very well use this opportunity to split the payment in a way that it improves the imbalance of its own channels.
So instead of splitting payments by 2 in a divide and conquorer strategy the node could use the following formula ...