nvidia-trt[patch]: Invoke callback prior to yielding token (#18446)

## PR title
nvidia-trt[patch]: Invoke callback prior to yielding

## PR message
- Description: Invoke on_llm_new_token callback prior to yielding token
in
_stream method.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
This commit is contained in:
William De Vena 2024-03-03 23:15:11 +01:00 committed by GitHub
parent 275877980e
commit a63cee04ac
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -176,9 +176,9 @@ class TritonTensorRTLLM(BaseLLM):
result_queue = self._invoke_triton(self.model_name, inputs, outputs, stop_words) result_queue = self._invoke_triton(self.model_name, inputs, outputs, stop_words)
for token in result_queue: for token in result_queue:
yield GenerationChunk(text=token)
if run_manager: if run_manager:
run_manager.on_llm_new_token(token) run_manager.on_llm_new_token(token)
yield GenerationChunk(text=token)
self.client.stop_stream() self.client.stop_stream()