mirror of
https://github.com/hwchase17/langchain
synced 2024-11-06 03:20:49 +00:00
a23c719c8b
3 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Liang Zhang
|
7306600e2f
|
community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the transform_input_fn and transform_output_fn. After saving and loading, the LLM will be broken. This PR serialize these functions into a hex string using pickle, and saving the hex string in the yaml file. Using pickle to serialize a function can be flaky, but this is a simple workaround that unblocks many use cases. If more sophisticated SerDe is needed, we can improve it later. Test: Added a simple unit test. I did manual test on Databricks and it works well. The saved yaml looks like: ``` llm: _type: databricks cluster_driver_port: null cluster_id: null databricks_uri: databricks endpoint_name: databricks-mixtral-8x7b-instruct extra_params: {} host: e2-dogfood.staging.cloud.databricks.com max_tokens: null model_kwargs: null n: 1 stop: null task: null temperature: 0.0 transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e transform_output_fn: null ``` @baskaryan ```python from langchain_community.embeddings import DatabricksEmbeddings from langchain_community.llms import Databricks from langchain.chains import RetrievalQA from langchain.document_loaders import TextLoader from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import FAISS import mlflow embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en") def transform_input(**request): request["messages"] = [ { "role": "user", "content": request["prompt"] } ] del request["prompt"] return request llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input) persist_dir = "faiss_databricks_embedding" # Create the vector db, persist the db to a local fs folder loader = TextLoader("state_of_the_union.txt") documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents) db = FAISS.from_documents(docs, embeddings) db.save_local(persist_dir) def load_retriever(persist_directory): embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en") vectorstore = FAISS.load_local(persist_directory, embeddings) return vectorstore.as_retriever() retriever = load_retriever(persist_dir) retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever) with mlflow.start_run() as run: logged_model = mlflow.langchain.log_model( retrievalQA, artifact_path="retrieval_qa", loader_fn=load_retriever, persist_dir=persist_dir, ) # Load the retrievalQA chain loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri) print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}])) ``` |
||
Liang Zhang
|
6479aab74f
|
community[patch]: Add param "task" to Databricks LLM to work around serialization of transform_output_fn (#14933)
**What is the reproduce code?** ```python from langchain.chains import LLMChain, load_chain from langchain.llms import Databricks from langchain.prompts import PromptTemplate def transform_output(response): # Extract the answer from the responses. return str(response["candidates"][0]["text"]) def transform_input(**request): full_prompt = f"""{request["prompt"]} Be Concise. """ request["prompt"] = full_prompt return request chat_model = Databricks( endpoint_name="llama2-13B-chat-Brambles", transform_input_fn=transform_input, transform_output_fn=transform_output, verbose=True, ) print(f"Test chat model: {chat_model('What is Apache Spark')}") # This works llm_chain = LLMChain(llm=chat_model, prompt=PromptTemplate.from_template("{chat_input}")) llm_chain("colorful socks") # this works llm_chain.save("databricks_llm_chain.yaml") # transform_input_fn and transform_output_fn are not serialized into the model yaml file loaded_chain = load_chain("databricks_llm_chain.yaml") # The Databricks LLM is recreated with transform_input_fn=None, transform_output_fn=None. loaded_chain("colorful socks") # Thus this errors. The transform_output_fn is needed to produce the correct output ``` Error: ``` File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6c34afab-3473-421d-877f-1ef18930ef4d/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation text str type expected (type=type_error.str) request payload: {'query': 'What is a databricks notebook?'}'} ``` **What does the error mean?** When the LLM generates an answer, represented by a Generation data object. The Generation data object takes a str field called text, e.g. Generation(text=”blah”). However, the Databricks LLM tried to put a non-str to text, e.g. Generation(text={“candidates”:[{“text”: “blah”}]}) Thus, pydantic errors. **Why the output format becomes incorrect after saving and loading the Databricks LLM?** Databrick LLM does not support serializing transform_input_fn and transform_output_fn, so they are not serialized into the model yaml file. When the Databricks LLM is loaded, it is recreated with transform_input_fn=None, transform_output_fn=None. Without transform_output_fn, the output text is not unwrapped, thus errors. Missing transform_output_fn causes this error. Missing transform_input_fn causes the additional prompt “Be Concise.” to be lost after saving and loading. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - **Description:** a description of the change, - **Issue:** the issue # it fixes (if applicable), - **Dependencies:** any dependencies required for this change, - **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below), - **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Bagatur <baskaryan@gmail.com> |
||
Bagatur
|
ed58eeb9c5
|
community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463)
Moved the following modules to new package langchain-community in a backwards compatible fashion: ``` mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community ``` Moved the following to core ``` mv langchain/langchain/utils/json_schema.py core/langchain_core/utils mv langchain/langchain/utils/html.py core/langchain_core/utils mv langchain/langchain/utils/strings.py core/langchain_core/utils cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py rm langchain/langchain/utils/env.py ``` See .scripts/community_split/script_integrations.sh for all changes |