Commit Graph

9 Commits (dd677d9e76bf9b21f48c9562e9267b6abc56d86a)

Author SHA1 Message Date
Alexander Borzunov 6e4ebb94d2
Fix deadlocks in MemoryCache (#396)
- Fix deadlocks in MemoryCache
- Set default --alloc_timeout to 1 until the MemoryCache update
11 months ago
Alexander Borzunov 2c8959e713
Share more info about a server in DHT (#355) 11 months ago
Max Ryabinin 3e7ae5116d
Remove unused imports and attributes (#324)
* Remove unused imports and attributes
12 months ago
Alexander Borzunov fee19e9b9b
Use get_logger(__name__) instead of get_logger(__file__) (#265) 1 year ago
justheuristic ae9e71fe8e
Add local tensor-parallel fwd/bwd (#143)
This pull request adds an option to run Petals server on multiple local GPUs. It uses https://github.com/BlackSamorez/tensor_parallel

- 8bit approximation error same as in main (mean~=2% q0.9~=5%)
    - TP=1, 2, 3 (see screenshots above)
- forward, grad w.r.t. input and inference exact match with main with TP=1
- `>=`80% GPU utilization with 3x 1080ti, batch = 8 tokens
- throughput measured with and without TP
- TP on 1080Tis has near-linear speedup comparable to the benchmarks (see first message)


Co-authored-by: Iaroslav Lisniak <yalisnyak@nes.ru>
Co-authored-by: Andrei Panferov <andrei@blacksamorez.ru>
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
1 year ago
Alexander Borzunov 9997ada3bb
Shield alloc & free from cancellation (#163)
A handler's RPC code may be cancelled due to a request timeout or a client closing the connection. Before this PR:

- If `.cancel()` happens while waiting for `hivemind.utils.enter_asynchronously()`, the lock will never be released.
- If `.cancel()` happens while doing that before freeing memory, the memory will never be freed.

This PR fixes it by deferring the cancellation with [asyncio.shield()](https://docs.python.org/3/library/asyncio-task.html#asyncio.shield). Now, the cancellation will happen only when all locks are released and alloc/free has completed.
1 year ago
Alexander Borzunov 668b736031
Fix logging: do not duplicate lines, enable colors in Colab (#156) 1 year ago
Alexander Borzunov 73df69a117
Reset MemoryCache during rebalancings (#154)
Before this PR, if there were open inference sessions right when rebalancing is triggered, their cache was never properly destroyed.
1 year ago
Alexander Borzunov e99bf36647
Use common folder for all caches, make it a volume in Dockerfile (#141) 2 years ago