nvidia-trt[patch]: Invoke callback prior to yielding token (#18446)

## PR title
nvidia-trt[patch]: Invoke callback prior to yielding

## PR message
- Description: Invoke on_llm_new_token callback prior to yielding token
in
_stream method.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
pull/18466/head
William De Vena 7 months ago committed by GitHub
parent 275877980e
commit a63cee04ac
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -176,9 +176,9 @@ class TritonTensorRTLLM(BaseLLM):
result_queue = self._invoke_triton(self.model_name, inputs, outputs, stop_words)
for token in result_queue:
yield GenerationChunk(text=token)
if run_manager:
run_manager.on_llm_new_token(token)
yield GenerationChunk(text=token)
self.client.stop_stream()

Loading…
Cancel
Save