langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Liang Zhang 7306600e2f community[patch]: Support SerDe transform functions in Databricks LLM (#16752 ) Description: Databricks LLM does not support SerDe the transform_input_fn and transform_output_fn. After saving and loading, the LLM will be broken. This PR serialize these functions into a hex string using pickle, and saving the hex string in the yaml file. Using pickle to serialize a function can be flaky, but this is a simple workaround that unblocks many use cases. If more sophisticated SerDe is needed, we can improve it later. Test: Added a simple unit test. I did manual test on Databricks and it works well. The saved yaml looks like: ``` llm: _type: databricks cluster_driver_port: null cluster_id: null databricks_uri: databricks endpoint_name: databricks-mixtral-8x7b-instruct extra_params: {} host: e2-dogfood.staging.cloud.databricks.com max_tokens: null model_kwargs: null n: 1 stop: null task: null temperature: 0.0 transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e transform_output_fn: null ``` @baskaryan ```python from langchain_community.embeddings import DatabricksEmbeddings from langchain_community.llms import Databricks from langchain.chains import RetrievalQA from langchain.document_loaders import TextLoader from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import FAISS import mlflow embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en") def transform_input(**request): request["messages"] = [ { "role": "user", "content": request["prompt"] } ] del request["prompt"] return request llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input) persist_dir = "faiss_databricks_embedding" # Create the vector db, persist the db to a local fs folder loader = TextLoader("state_of_the_union.txt") documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents) db = FAISS.from_documents(docs, embeddings) db.save_local(persist_dir) def load_retriever(persist_directory): embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en") vectorstore = FAISS.load_local(persist_directory, embeddings) return vectorstore.as_retriever() retriever = load_retriever(persist_dir) retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever) with mlflow.start_run() as run: logged_model = mlflow.langchain.log_model( retrievalQA, artifact_path="retrieval_qa", loader_fn=load_retriever, persist_dir=persist_dir, ) # Load the retrievalQA chain loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri) print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}])) ```		2024-02-08 13:09:50 -08:00
..
adapters	community[patch]: Add safe lookup to OpenAI response adapter (#14765 )	2023-12-20 01:17:23 -05:00
agent_toolkits	community[patch]: fix agent_toolkits mypy (#17050 )	2024-02-05 11:56:24 -08:00
callbacks	community[patch]: MLflow callback update (#16687 )	2024-02-05 15:46:46 -08:00
chat_loaders	Do not issue beta or deprecation warnings on internal calls (#15641 )	2024-01-07 20:54:45 -08:00
chat_message_histories	community[patch]: chat message history mypy fixes (#17059 )	2024-02-05 13:13:25 -08:00
chat_models	community[patch]: chat model mypy fixes (#17061 )	2024-02-05 13:42:59 -08:00
docstore	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 )	2023-12-11 13:53:30 -08:00
document_loaders	core[patch], community[patch]: link extraction continue on failure (#17200 )	2024-02-07 14:15:30 -08:00
document_transformers	community[minor]: Adding asynchronous function implementation for Doctran (#15941 )	2024-01-15 10:39:25 -08:00
embeddings	community[patch]: octoai embeddings bug fix (#17216 )	2024-02-07 22:25:52 -05:00
example_selectors	refactor `langchain.prompts.example_selector` (#15369 )	2024-02-01 12:05:57 -08:00
graphs	langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239 )	2024-02-08 12:05:43 -08:00
indexes	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 )	2023-12-11 13:53:30 -08:00
llms	community[patch]: Support SerDe transform functions in Databricks LLM (#16752 )	2024-02-08 13:09:50 -08:00
output_parsers	langchain[patch], community[minor]: move `output_parsers.ernie_functions` (#16057 )	2024-01-17 10:06:18 -08:00
retrievers	community[minor]: Breebs docs retriever (#16578 )	2024-02-05 15:51:08 -08:00
storage	community: revert SQL Stores (#16912 )	2024-02-01 16:37:40 -08:00
tools	community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191 )	2024-02-07 22:23:43 -05:00
utilities	community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191 )	2024-02-07 22:23:43 -05:00
utils	openai[minor]: implement langchain-openai package (#15503 )	2024-01-05 15:03:28 -08:00
vectorstores	community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178 )	2024-02-08 12:06:42 -08:00
__init__.py	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 )	2023-12-11 13:53:30 -08:00
cache.py	langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817 )	2024-02-07 22:06:09 -05:00
py.typed	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 )	2023-12-11 13:53:30 -08:00