Fast load conversationsummarymemory from existing summary (#7533)

- Description: Adds an optional buffer arg to the memory's
from_messages() method. If provided the existing memory will be loaded
instead of regenerating a summary from the loaded messages.
 
Why? If we have past messages to load from, it is likely we also have an
existing summary. This is particularly helpful in cases where the chat
is ephemeral and/or is backed by serverless where the chat history is
not stored but where the updated chat history is passed back and forth
between a backend/frontend.

Eg: Take a stateless qa backend implementation that loads messages on
every request and generates a response — without this addition, each
time the messages are loaded via from_messages, the summaries are
recomputed even though they may have just been computed during the
previous response. With this, the previously computed summary can be
passed in and avoid:
  1) spending extra $$$ on tokens, and 
2) increased response time by avoiding regenerating previously generated
summary.

Tag maintainer: @hwchase17
Twitter handle: https://twitter.com/ShantanuNair

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
Shantanu Nair 2023-08-01 06:44:11 +05:30 committed by GitHub
parent ec40ead980
commit 53f3793504
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -60,7 +60,7 @@ memory.predict_new_summary(messages, previous_summary)
</CodeOutputBlock>
## Initializing with messages
## Initializing with messages/existing summary
If you have messages outside this class, you can easily initialize the class with ChatMessageHistory. During loading, a summary will be calculated.
@ -73,7 +73,11 @@ history.add_ai_message("hi there!")
```python
memory = ConversationSummaryMemory.from_messages(llm=OpenAI(temperature=0), chat_memory=history, return_messages=True)
memory = ConversationSummaryMemory.from_messages(
llm=OpenAI(temperature=0),
chat_memory=history,
return_messages=True
)
```
@ -89,6 +93,17 @@ memory.buffer
</CodeOutputBlock>
Optionally you can speed up initialization using a previously generated summary, and avoid regenerating the summary by just initializing directly.
```python
memory = ConversationSummaryMemory(
llm=OpenAI(temperature=0),
buffer="The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.",
chat_memory=history,
return_messages=True
)
```
## Using in a chain
Let's walk through an example of using this in a chain, again setting `verbose=True` so we can see the prompt.