mirror of
https://github.com/hwchase17/langchain
synced 2024-11-18 09:25:54 +00:00
040597e832
This PR improves on the `CassandraCache` and `CassandraSemanticCache` classes, mainly in the constructor signature, and also introduces several minor improvements around these classes. ### Init signature A (sigh) breaking change is tentatively introduced to the constructor. To me, the advantages outweigh the possible discomfort: the new syntax places the DB-connection objects `session` and `keyspace` later in the param list, so that they can be given a default value. This is what enables the pattern of _not_ specifying them, provided one has previously initialized the Cassandra connection through the versatile utility method `cassio.init(...)`. In this way, a much less unwieldy instantiation can be done, such as `CassandraCache()` and `CassandraSemanticCache(embedding=xyz)`, everything else falling back to defaults. A downside is that, compared to the earlier signature, this might turn out to be breaking for those doing positional instantiation. As a way to mitigate this problem, this PR typechecks its first argument trying to detect the legacy usage. (And to make this point less tricky in the future, most arguments are left to be keyword-only). If this is considered too harsh, I'd like guidance on how to further smoothen this transition. **Our plan is to make the pattern of optional session/keyspace a standard across all Cassandra classes**, so that a repeatable strategy would be ideal. A possibility would be to keep positional arguments for legacy reasons but issue a deprecation warning if any of them is actually used, to later remove them with 0.2 - please advise on this point. ### Other changes - class docstrings: enriched, completely moved to class level, added note on `cassio.init(...)` pattern, added tiny sample usage code. - semantic cache: revised terminology to never mention "distance" (it is in fact a similarity!). Kept the legacy constructor param with a deprecation warning if used. - `llm_caching` notebook: uniform flow with the Cassandra and Astra DB separate cases; better and Cassandra-first description; all imports made explicit and from community where appropriate. - cache integration tests moved to community (incl. the imported tools), env var bugfix for `CASSANDRA_CONTACT_POINTS`. --------- Co-authored-by: Erick Friis <erick@langchain.dev>
43 lines
1.2 KiB
Python
43 lines
1.2 KiB
Python
import os
|
|
|
|
import cassio
|
|
import langchain
|
|
from langchain_community.cache import CassandraCache
|
|
from langchain_community.chat_models import ChatOpenAI
|
|
from langchain_core.messages import BaseMessage
|
|
from langchain_core.prompts import ChatPromptTemplate
|
|
from langchain_core.runnables import RunnableLambda
|
|
|
|
use_cassandra = int(os.environ.get("USE_CASSANDRA_CLUSTER", "0"))
|
|
if use_cassandra:
|
|
from .cassandra_cluster_init import get_cassandra_connection
|
|
|
|
session, keyspace = get_cassandra_connection()
|
|
cassio.init(
|
|
session=session,
|
|
keyspace=keyspace,
|
|
)
|
|
else:
|
|
cassio.init(
|
|
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
|
|
database_id=os.environ["ASTRA_DB_ID"],
|
|
keyspace=os.environ.get("ASTRA_DB_KEYSPACE"),
|
|
)
|
|
|
|
# inits
|
|
langchain.llm_cache = CassandraCache(session=None, keyspace=None)
|
|
llm = ChatOpenAI()
|
|
|
|
|
|
# custom runnables
|
|
def msg_splitter(msg: BaseMessage):
|
|
return [w.strip() for w in msg.content.split(",") if w.strip()]
|
|
|
|
|
|
# synonym-route preparation
|
|
synonym_prompt = ChatPromptTemplate.from_template(
|
|
"List up to five comma-separated synonyms of this word: {word}"
|
|
)
|
|
|
|
chain = synonym_prompt | llm | RunnableLambda(msg_splitter)
|