mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Go to file

RUO 2b87e330b0 community: fix issue with nested field extraction in MongodbLoader (#22801 ) Description: This PR addresses an issue in the `MongodbLoader` where nested fields were not being correctly extracted. The loader now correctly handles nested fields specified in the `field_names` parameter. Issue: Fixes an issue where attempting to extract nested fields from MongoDB documents resulted in `KeyError`. Dependencies: No new dependencies are required for this change. Twitter handle: (Optional, your Twitter handle if you'd like a mention when the PR is announced) ### Changes 1. Field Name Parsing: - Added logic to parse nested field names and safely extract their values from the MongoDB documents. 2. Projection Construction: - Updated the projection dictionary to include nested fields correctly. 3. Field Extraction: - Updated the `aload` method to handle nested field extraction using a recursive approach to traverse the nested dictionaries. ### Example Usage Updated usage example to demonstrate how to specify nested fields in the `field_names` parameter: ```python loader = MongodbLoader( connection_string=MONGO_URI, db_name=MONGO_DB, collection_name=MONGO_COLLECTION, filter_criteria={"data.job.company.industry_name": "IT", "data.job.detail": { "$exists": True }}, field_names=[ "data.job.detail.id", "data.job.detail.position", "data.job.detail.intro", "data.job.detail.main_tasks", "data.job.detail.requirements", "data.job.detail.preferred_points", "data.job.detail.benefits", ], ) docs = loader.load() print(len(docs)) for doc in docs: print(doc.page_content) ``` ### Testing Tested with a MongoDB collection containing nested documents to ensure that the nested fields are correctly extracted and concatenated into a single page_content string. ### Note This change ensures backward compatibility for non-nested fields and improves functionality for nested field extraction. ### Output Sample ```python print(docs[:3]) ``` ```shell # output sample: [ Document( # Here in this example, page_content is the combined text from the fields below # "position", "intro", "main_tasks", "requirements", "preferred_points", "benefits" page_content='all combined contents from the requested fields in the document', metadata={'database': 'Your Database name', 'collection': 'Your Collection name'} ), ... ] ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>		2024-06-24 19:29:11 +00:00
.devcontainer	docs: typo in dev container documentation (#22630 )	2024-06-06 16:04:48 -04:00
.github	infra: run CI on large diffs (#23192 )	2024-06-19 19:30:56 +00:00
cookbook	docs: update azure_container_apps_dynamic_sessions_data_analyst.ipynb (#22718 )	2024-06-10 13:33:40 -07:00
docker	community[minor]: Add VDMS vectorstore (#19551 )	2024-03-28 03:12:11 +00:00
docs	docs[patch]: Adds callout in LLM concept docs, remove deprecated code (#23361 )	2024-06-24 12:03:18 -07:00
libs	community: fix issue with nested field extraction in MongodbLoader (#22801 )	2024-06-24 19:29:11 +00:00
templates	templates: remove lockfiles (#22920 )	2024-06-14 21:42:30 +00:00
.gitattributes	Update dev container (#6189 )	2023-06-16 15:42:14 -07:00
.gitignore	community[minor]: Add support for metadata indexing policy in Cassandra vector store (#22548 )	2024-06-05 11:23:26 -04:00
.readthedocs.yaml	infra: update rtd yaml (#17502 )	2024-02-13 18:16:44 -08:00
CITATION.cff	rename repo namespace to langchain-ai (#11259 )	2023-10-01 15:30:58 -04:00
LICENSE	Library Licenses (#13300 )	2023-11-28 17:34:27 -08:00
Makefile	docs: revamp ChatOpenAI (#22253 )	2024-05-29 10:20:14 -07:00
MIGRATE.md	Update main readme (#13298 )	2023-11-13 17:37:54 -08:00
poetry.lock	ci: add testing with Python 3.12 (#22813 )	2024-06-12 16:31:36 -04:00
poetry.toml	Unbreak devcontainer (#8154 )	2023-07-23 19:33:47 -07:00
pyproject.toml	ci: add testing with Python 3.12 (#22813 )	2024-06-12 16:31:36 -04:00
README.md	fix: typo in Agents section of README (#22599 )	2024-06-06 07:44:36 -04:00
SECURITY.md	Updated security policy (#19089 )	2024-03-14 20:58:47 +00:00

README.md

🦜️🔗 LangChain

⚡ Build context-aware reasoning applications ⚡

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to speak with our sales team.

Quick Install

With pip:

pip install langchain

With conda:

conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by large language models (LLMs).

For these applications, LangChain simplifies the entire application lifecycle:

Open-source libraries: Build your applications using LangChain's modular building blocks and components. Integrate with hundreds of third-party providers.
Productionization: Inspect, monitor, and evaluate your apps with LangSmith so that you can constantly optimize and deploy with confidence.
Deployment: Turn any chain into a REST API with LangServe.

Open-source libraries

langchain-core: Base abstractions and LangChain Expression Language.
langchain-community: Third party integrations.
- Some integrations have been further split into partner packages that only rely on langchain-core. Examples include langchain_openai and langchain_anthropic.
langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
LangGraph: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.

Productionization:

LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Deployment:

LangServe: A library for deploying LangChain chains as REST APIs.

🧱 What can you build with LangChain?

❓ Question answering with RAG

Documentation
End-to-end Example: Chat LangChain and repo

🧱 Extracting structured output

Documentation
End-to-end Example: SQL Llama2 Template

🤖 Chatbots

Documentation
End-to-end Example: Web LangChain (web researcher chatbot) and repo

And much more! Head to the Tutorials section of the docs for more.

🚀 How does LangChain help?

The main value props of the LangChain libraries are:

Components: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

LangChain Expression Language (LCEL)

LCEL is the foundation of many of LangChain's components, and is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.

Overview: LCEL and its benefits
Interface: The standard Runnable interface for LCEL objects
Primitives: More on the primitives LCEL includes
Cheatsheet: Quick overview of the most common usage patterns

Components

Components fall into the following modules:

📃 Model I/O

This includes prompt management, prompt optimization, a generic interface for chat models and LLMs, and common utilities for working with model outputs.

📚 Retrieval

Retrieval Augmented Generation involves loading data from a variety of sources, preparing it, then searching over (a.k.a. retrieving from) it for use in the generation step.

🤖 Agents

Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a standard interface for agents along with the LangGraph extension for building custom agents.

📖 Documentation

Please see here for full documentation, which includes:

Introduction: Overview of the framework and the structure of the docs.
Tutorials: If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
How-to guides: Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
Conceptual guide: Conceptual explanations of the key parts of the framework.
API Reference: Thorough documentation of every class and method.

🌐 Ecosystem

🦜🛠️ LangSmith: Tracing and evaluating your language model applications and intelligent agents to help you move from prototype to production.
🦜🕸️ LangGraph: Creating stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain primitives.
🦜🏓 LangServe: Deploying LangChain runnables and chains as REST APIs.
- LangChain Templates: Example applications hosted with LangServe.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

README.md Unescape Escape