mirror of
https://github.com/JGRennison/OpenTTD-patches.git
synced 2024-11-16 00:12:51 +00:00
269 lines
12 KiB
Plaintext
269 lines
12 KiB
Plaintext
Some explanations about Desyncs
|
|
Last updated: 2014-02-23
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
Table of contents
|
|
-----------------
|
|
1.0) Desync theory
|
|
* 1.1) OpenTTD multiplayer architecture
|
|
* 1.2) What is a Desync and how is it detected
|
|
* 1.3) Typical causes of Desyncs
|
|
2.0) What to do in case of a Desync
|
|
* 2.1) Cache debugging
|
|
* 2.2) Desync recording
|
|
3.0) Evaluating the Desync records
|
|
* 3.1) Replaying
|
|
* 3.2) Evaluation the replay
|
|
* 3.3) Comparing savegames
|
|
|
|
|
|
1.1) OpenTTD multiplayer architecture
|
|
---- --------------------------------
|
|
OpenTTD has a huge gamestate, which changes all of the time.
|
|
The savegame contains the complete gamestate at a specific point
|
|
in time. But this state changes completely each tick: Vehicles move
|
|
and trees grow.
|
|
|
|
However, most of these changes in the gamestate are deterministic:
|
|
Without a player interfering a vehicle follows its orders always
|
|
in the same way, and trees always grow the same.
|
|
|
|
In OpenTTD multiplayer synchronisation works by creating a savegame
|
|
when clients join, and then transferring that savegame to the client,
|
|
so it has the complete gamestate at a fixed point in time.
|
|
|
|
Afterwards clients only receive 'commands', that is: Stuff which is
|
|
not predictable, like
|
|
- player actions
|
|
- AI actions
|
|
- GameScript actions
|
|
- Admin Port command
|
|
- rcon commands
|
|
- ...
|
|
|
|
These commands contain the information on how to execute the command,
|
|
and when to execute it. Time is measured in 'network frames'.
|
|
Mind that network frames to not match ingame time. Network frames
|
|
also run while the game is paused, to give a defined behaviour to
|
|
stuff that is executing while the game is paused.
|
|
|
|
The deterministic part of the gamestate is run by the clients on
|
|
their own. All they get from the server is the instruction to
|
|
run the gamestate up to a certain network time, which basically
|
|
says that there are no commands scheduled in that time.
|
|
|
|
When a client (which includes the server itself) wants to execute
|
|
a command (i.e. a non-predictable action), it does this by
|
|
- calling DoCommandP resp. DoCommandPInternal
|
|
- These functions first do a local test-run of the command to
|
|
check simple preconditions. (Just to give the client an
|
|
immediate response without bothering the server and waiting for
|
|
the response.) The test-run may not actually change the
|
|
gamestate, all changes must be discarded.
|
|
- If the local test-run succeeds the command is sent to the server.
|
|
- The server inserts the command into the command queue, which
|
|
assigns a network frame to the commands, i.e. when it shall be
|
|
executed on all clients.
|
|
- Enhanced with this specific timestamp, the command is send to all
|
|
clients, which execute the command simultaneously in the same
|
|
network frame in the same order.
|
|
|
|
1.2) What is a Desync and how is it detected
|
|
---- ---------------------------------------
|
|
In the ideal case all clients have the same gamestate as the server
|
|
and run in sync. That is, vehicle movement is the same on all
|
|
clients, and commands are executed the same everywhere and
|
|
have the same results.
|
|
|
|
When a Desync happens, it means that the gamestates on the clients
|
|
(including the server) are no longer the same. Just imagine
|
|
that a vehicle picks the left line instead of the right line at
|
|
a junction on one client.
|
|
|
|
The important thing here is, that no one notices when a Desync
|
|
occurs. The desync client will continue to simulate the gamestate
|
|
and execute commands from the server. Once the gamestate differs
|
|
it will increasingly spiral out of control: If a vehicle picks a
|
|
different route, it will arrive at a different time at a station,
|
|
which will load different cargo, which causes other vehicles to
|
|
load other stuff, which causes industries to notice different
|
|
servicing, which causes industries to change production, ...
|
|
the client could run all day in a different universe.
|
|
|
|
To limit how long a Desync can remain unnoticed, the server
|
|
transfers some checksums every now and then for the gamestate.
|
|
Currently this checksum is the state of the random number
|
|
generator of the game logic. A lot of things in OpenTTD depend
|
|
on the RNG, and if the gamestate differs, it is likely that the
|
|
RNG is called at different times, and the state differs when
|
|
checked.
|
|
|
|
The clients compare this 'checksum' with the checksum of their
|
|
own gamestate at the specific network frame. If they differ,
|
|
the client disconnects with a Desync error.
|
|
|
|
The important thing here is: The detection of the Desync is
|
|
only an ultimate failure detection. It does not give any
|
|
indication on when the Desync happened. The Desync may after
|
|
all have occurred long ago, and just did not affect the checksum
|
|
up to now. The checksum may have matched 10 times or more
|
|
since the Desync happened, and only now the Desync has spiraled
|
|
enough to finally affect the checksum. (There was once a desync
|
|
which was only noticed by the checksum after 20 game years.)
|
|
|
|
1.3) Typical causes of Desyncs
|
|
---- -------------------------
|
|
Desyncs can be caused by the following scenarios:
|
|
- The savegame does not describe the complete gamestate.
|
|
- Some information which affects the progression of the
|
|
gamestate is not saved in the savegame.
|
|
- Some information which affects the progression of the
|
|
gamestate is not loaded from the savegame.
|
|
This includes the case that something is not completely
|
|
reset before loading the savegame, so data from the
|
|
previous game is carried over to the new one.
|
|
- The gamestate does not behave deterministic.
|
|
- Cache mismatch: The game logic depends on some cached
|
|
values, which are not invalidated properly. This is
|
|
the usual case for NewGRF-specific Desyncs.
|
|
- Undefined behaviour: The game logic performs multiple
|
|
things in an undefined order or with an undefined
|
|
result. E.g. when sorting something with a key while
|
|
some keys are equal. Or some computation that depends
|
|
on the CPU architecture (32/64 bit, little/big endian).
|
|
- The gamestate is modified when it shall not be modified.
|
|
- The test-run of a command alters the gamestate.
|
|
- The gamestate is altered by a player or script without
|
|
using commands.
|
|
|
|
|
|
2.1) Cache debugging
|
|
---- ---------------
|
|
Desyncs which are caused by improper cache validation can
|
|
often be found by enabling cache validation:
|
|
- Start OpenTTD with '-d desync=2'.
|
|
- This will enable validation of caches every tick.
|
|
That is, cached values are recomputed every tick and compared
|
|
to the cached value.
|
|
- Differences are logged to 'commands-out.log' in the autosave
|
|
folder.
|
|
|
|
Mind that this type of debugging can also be done in singleplayer.
|
|
|
|
2.2) Desync recording
|
|
---- ----------------
|
|
If you have a server, which happens to encounter Desyncs often,
|
|
you can enable recording of the gamestate alterations. This
|
|
will later allow the replay the gamestate and locate the Desync
|
|
cause.
|
|
|
|
There are two levels of Desync recording, which are enabled
|
|
via '-d desync=2' resp. '-d desync=3'. Both will record all
|
|
commands to a file 'commands-out.log' in the autosave folder.
|
|
|
|
If you have the savegame from the start of the server, and
|
|
this command log you can replay the whole game. (see Section 3.1)
|
|
|
|
If you do not start the server from a savegame, there will
|
|
also be a savegame created just after a map has been generated.
|
|
The savegame will be named 'dmp_cmds_*.sav' and be put into
|
|
the autosave folder.
|
|
|
|
In addition to that '-d desync=3' also creates regular savegames
|
|
at defined spots in network time. (more defined than regular
|
|
autosaves). These will be created in the autosave folder
|
|
and will also be named 'dmp_cmds_*.sav'.
|
|
|
|
These saves allow comparing the gamestate with the original
|
|
gamestate during replaying, and thus greatly help debugging.
|
|
However, they also take a lot of disk space.
|
|
|
|
|
|
3.1) Replaying
|
|
---- ---------
|
|
To replay a Desync recording, you need these files:
|
|
- The savegame from when the server was started, resp.
|
|
the automatically created savegame from when the map
|
|
was generated.
|
|
- The 'commands-out.log' file.
|
|
- Optionally the 'dmp_cmds_*.sav'.
|
|
Put these files into a safe spot. (Not your autosave folder!)
|
|
|
|
Next, prepare your OpenTTD for replaying:
|
|
- Get the same version of OpenTTD as the original server was running.
|
|
- Uncomment/enable the define 'DEBUG_DUMP_COMMANDS' in
|
|
'src/network/network_func.h'.
|
|
(DEBUG_FAILED_DUMP_COMMANDS is explained later)
|
|
- Put the 'commands-out.log' into the root save folder, and rename
|
|
it to 'commands.log'.
|
|
- Run 'openttd -D -d desync=3 -g startsavegame.sav'.
|
|
This replays the server log and creates new 'commands-out.log'
|
|
and 'dmp_cmds_*.sav' in your autosave folder.
|
|
|
|
3.2) Evaluation the replay
|
|
---- ---------------------
|
|
The replaying will also compare the checksums which are part of
|
|
the 'commands-out.log' with the replayed gamestate.
|
|
If they differ, it will trigger a 'NOT_REACHED'.
|
|
|
|
If the replay succeeds without mismatch, that is the replay reproduces
|
|
the original server state:
|
|
- Repeat the replay starting from incrementally later 'dmp_cmds_*.sav'
|
|
while truncating the 'commands.log' at the beginning appropriately.
|
|
The 'dmp_cmds_*.sav' can be your own ones from the first reply, or
|
|
the ones from the original server (if you have them).
|
|
(This simulates the view of joining clients during the game.)
|
|
- If one of those replays fails, you have located the Desync between
|
|
the last dmp_cmds that reproduces the replay and the first one
|
|
that fails.
|
|
|
|
If the replay does not succeed without mismatch, you can check the logs
|
|
whether there were failed commands. Then you may try to replay with
|
|
DEBUG_FAILED_DUMP_COMMANDS enabled. If the replay then fails, the
|
|
command test-run of the failed command modified the game state.
|
|
|
|
If you have the original 'dmp_cmds_*.sav', you can also compare those
|
|
savegames with your own ones from the replay. You can also comment/disable
|
|
the 'NOT_REACHED' mentioned above, to get another 'dmp_cmds_*.sav' from
|
|
the replay after the mismatch has already been detected.
|
|
See Section 3.2 on how to compare savegames.
|
|
If the saves differ you have located the Desync between the last dmp_cmds
|
|
that match and the first one that does not. The difference of the saves
|
|
may point you in the direction of what causes it.
|
|
|
|
If the replay succeeds without mismatch, and you do not have any
|
|
'dmp_cmd_*.sav' from the original server, it is a lost case.
|
|
Enable creation of the 'dmp_cmd_*.sav' on the server, and wait for the
|
|
next Desync.
|
|
|
|
Finally, you can also compare the 'commands-out.log' from the original
|
|
server with the one from the replay. They will differ in stuff like
|
|
dates, and the original log will contain the chat, but otherwise they
|
|
should match.
|
|
|
|
3.2) Comparing savegames
|
|
---- -------------------
|
|
The binary form of the savegames from the original server and from
|
|
your replay will always differ:
|
|
- The savegame contains paths to used NewGRF files.
|
|
- The gamelog will log your loading of the savegame.
|
|
- The savegame data of AIs and the Gamescript will differ.
|
|
Scripts are not run during the replay, only their recorded commands
|
|
are replayed. Their internal state will thus not change in the
|
|
replay and will differ.
|
|
|
|
To compare savegame more semantically, there exist some ugly hackish
|
|
tools at:
|
|
http://devs.openttd.org/~frosch/texts/zpipe.c
|
|
http://devs.openttd.org/~frosch/texts/printhunk.c
|
|
|
|
The first one decompresses OpenTTD savegames. The second one creates
|
|
a textual representation of an uncompressed savegame, by parsing hunks
|
|
and arrays and such. With both tools you need to be a bit careful
|
|
since they work on stdin and stdout, which may not deal well with
|
|
binary data.
|
|
|
|
If you have the textual representation of the savegames, you can
|
|
compare them with regular diff tools.
|