Commit Graph

1553 Commits

Author SHA1 Message Date
Nuno Campos
5fc44989bf
core[patch]: Fix "argument of type 'NoneType' is not iterable" error in LangChainTracer (#26576)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-17 10:29:46 -07:00
RUO
0a177ec2cc
community: Enhance MongoDBLoader with flexible metadata and optimized field extraction (#23376)
### Description:
This pull request significantly enhances the MongodbLoader class in the
LangChain community package by adding robust metadata customization and
improved field extraction capabilities. The updated class now allows
users to specify additional metadata fields through the metadata_names
parameter, enabling the extraction of both top-level and deeply nested
document attributes as metadata. This flexibility is crucial for users
who need to include detailed contextual information without altering the
database schema.

Moreover, the include_db_collection_in_metadata flag offers optional
inclusion of database and collection names in the metadata, allowing for
even greater customization depending on the user's needs.

The loader's field extraction logic has been refined to handle missing
or nested fields more gracefully. It now employs a safe access mechanism
that avoids the KeyError previously encountered when a specified nested
field was absent in a document. This update ensures that the loader can
handle diverse and complex data structures without failure, making it
more resilient and user-friendly.

### Issue:
This pull request addresses a critical issue where the MongodbLoader
class in the LangChain community package could throw a KeyError when
attempting to access nested fields that may not exist in some documents.
The previous implementation did not handle the absence of specified
nested fields gracefully, leading to runtime errors and interruptions in
data processing workflows.

This enhancement ensures robust error handling by safely accessing
nested document fields, using default values for missing data, thus
preventing KeyError and ensuring smoother operation across various data
structures in MongoDB. This improvement is crucial for users working
with diverse and complex data sets, ensuring the loader can adapt to
documents with varying structures without failing.

### Dependencies: 
Requires motor for asynchronous MongoDB interaction.

### Twitter handle: 
N/A

### Add tests and docs
Tests: Unit tests have been added to verify that the metadata inclusion
toggle works as expected and that the field extraction correctly handles
nested fields.
Docs: An example notebook demonstrating the use of the enhanced
MongodbLoader is included in the docs/docs/integrations directory. This
notebook includes setup instructions, example usage, and outputs.
(Here is the notebook link : [colab
link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing))
Lint and test
Before submitting, I ran make format, make lint, and make test as per
the contribution guidelines. All tests pass, and the code style adheres
to the LangChain standards.

```python
import unittest
from unittest.mock import patch, MagicMock
import asyncio
from langchain_community.document_loaders.mongodb import MongodbLoader

class TestMongodbLoader(unittest.TestCase):
    def setUp(self):
        """Setup the MongodbLoader test environment by mocking the motor client 
        and database collection interactions."""
        # Mocking the AsyncIOMotorClient
        self.mock_client = MagicMock()
        self.mock_db = MagicMock()
        self.mock_collection = MagicMock()

        self.mock_client.get_database.return_value = self.mock_db
        self.mock_db.get_collection.return_value = self.mock_collection

        # Initialize the MongodbLoader with test data
        self.loader = MongodbLoader(
            connection_string="mongodb://localhost:27017",
            db_name="testdb",
            collection_name="testcol"
        )

    @patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock())
    def test_constructor(self, mock_motor_client):
        """Test if the constructor properly initializes with the correct database and collection names."""
        loader = MongodbLoader(
            connection_string="mongodb://localhost:27017",
            db_name="testdb",
            collection_name="testcol"
        )
        self.assertEqual(loader.db_name, "testdb")
        self.assertEqual(loader.collection_name, "testcol")

    def test_aload(self):
        """Test the aload method to ensure it correctly queries and processes documents."""
        # Setup mock data and responses for the database operations
        self.mock_collection.count_documents.return_value = asyncio.Future()
        self.mock_collection.count_documents.return_value.set_result(1)
        self.mock_collection.find.return_value = [
            {"_id": "1", "content": "Test document content"}
        ]

        # Run the aload method and check responses
        loop = asyncio.get_event_loop()
        results = loop.run_until_complete(self.loader.aload())
        self.assertEqual(len(results), 1)
        self.assertEqual(results[0].page_content, "Test document content")

    def test_construct_projection(self):
        """Verify that the projection dictionary is constructed correctly based on field names."""
        self.loader.field_names = ['content', 'author']
        self.loader.metadata_names = ['timestamp']
        expected_projection = {'content': 1, 'author': 1, 'timestamp': 1}
        projection = self.loader._construct_projection()
        self.assertEqual(projection, expected_projection)

if __name__ == '__main__':
    unittest.main()
```


### Additional Example for Documentation
Sample Data:

```json
[
    {
        "_id": "1",
        "title": "Artificial Intelligence in Medicine",
        "content": "AI is transforming the medical industry by providing personalized medicine solutions.",
        "author": {
            "name": "John Doe",
            "email": "john.doe@example.com"
        },
        "tags": ["AI", "Healthcare", "Innovation"]
    },
    {
        "_id": "2",
        "title": "Data Science in Sports",
        "content": "Data science provides insights into player performance and strategic planning in sports.",
        "author": {
            "name": "Jane Smith",
            "email": "jane.smith@example.com"
        },
        "tags": ["Data Science", "Sports", "Analytics"]
    }
]
```
Example Code:

```python
loader = MongodbLoader(
    connection_string="mongodb://localhost:27017",
    db_name="example_db",
    collection_name="articles",
    filter_criteria={"tags": "AI"},
    field_names=["title", "content"],
    metadata_names=["author.name", "author.email"],
    include_db_collection_in_metadata=True
)

documents = loader.load()

for doc in documents:
    print("Page Content:", doc.page_content)
    print("Metadata:", doc.metadata)
```
Expected Output:

```
Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions.
Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'}
```

Thank you.

---

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-09-17 10:23:17 -04:00
ccurme
900115a568
community: release 0.3 (#26472) 2024-09-13 22:55:56 +00:00
Erick Friis
c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00
Bagatur
e32adad17a
community[patch]: Release 0.2.17 (#26432) 2024-09-13 09:56:39 -07:00
Harrison Chase
28ad244e77
community, openai: support nested dicts (#26414)
needed for thinking tokens

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-12 21:47:47 -07:00
Nuno Campos
212c688ee0
core[minor]: Remove serialized manifest from tracing requests for non-llm runs (#26270)
- This takes a long time to compute, isn't used, and currently called on
every invocation of every chain/retriever/etc
2024-09-10 12:58:24 -07:00
Christophe Bornet
9cf7ae0a52
community: Add docstring for HtmlLinkExtractor (#26213)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-10 00:27:37 +00:00
Christophe Bornet
56580b5fff
community: Add docstring for GLiNERLinkExtractor (#26218)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-10 00:27:23 +00:00
Christophe Bornet
e235a572a0
community: Add docstring for KeybertLinkExtractor (#26210)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-10 00:26:29 +00:00
Tomaz Bratanic
181e4fc0e0
Add session expired retry to neo4j graph (#26182)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-08 11:40:43 -07:00
Sebastian Cherny
b3c7ed4913
Adding bind_tools in ChatOctoAI (#26168)
The object extends from
langchain_community.chat_models.openai.ChatOpenAI which doesn't have
`bind_tools` defined. I tried extending from
`langchain_openai.ChatOpenAI` in
https://github.com/langchain-ai/langchain/pull/25975 but that PR got
closed because this is not correct.
So adding our own `bind_tools` (which for now copying from ChatOpenAI is
good enough) will solve the tool calling issue we are having now.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-08 18:38:43 +00:00
William FH
262e19b15d
infra: Clear cache for env-var checks (#26073) 2024-09-06 21:29:29 +00:00
Bagatur
1241a004cb fmt 2024-09-04 11:44:59 -07:00
Bagatur
4ba14ae9e5 fmt 2024-09-04 11:34:59 -07:00
Bagatur
dba308447d fmt 2024-09-04 11:28:04 -07:00
ZhangShenao
c812237217
Improvement[Community] Improve args description in api doc of DocArrayInMemorySearch (#26024)
- Add missing arg
- Remove redundant arg
2024-09-04 09:26:26 -04:00
Tom Daniel Grande
0207dc1431
community: delta in openai choice can be None, creates handler for that (#25954)
Thank you for contributing to LangChain!

- [X ] **PR title**

- [X ] **PR message**: 

     **Description:** adds a handler for when delta choice is None

     **Issue:** Fixes #25951
     **Dependencies:** Not applicable


- [ X] **Add tests and docs**: Not applicable

- [X ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-03 20:30:03 +00:00
Bagatur
0af447c90b
community[patch]: Release 0.2.16 (#25982) 2024-09-03 18:34:18 +00:00
Dan O'Donovan
f49da71e87
community[patch]: change default Neo4j username/password (#25226)
**Description:**

Change the default Neo4j username/password (when not supplied as
environment variable or in code) from `None` to `""`.

Neo4j has an option to [disable
auth](https://neo4j.com/docs/operations-manual/current/configuration/configuration-settings/#config_dbms.security.auth_enabled)
which is helpful when developing. When auth is disabled, the username /
password through the `neo4j` module should be `""` (ie an empty string).

Empty strings get marked as false in
`langchain_core.utils.env.get_from_dict_or_env` -- changing this code /
behaviour would have a wide impact and is undesirable.

In order to both _allow_ access to Neo4j with auth disabled and _not_
impact `langchain_core` this patch is presented. The downside would be
that if a user forgets to set NEO4J_USERNAME or NEO4J_PASSWORD they
would see an invalid credentials error rather than missing credentials
error. This could be mitigated but would result in a less elegant patch!

**Issue:**
Fix issue where langchain cannot communicate with Neo4j if Neo4j auth is
disabled.
2024-09-03 11:24:18 -07:00
Jorge Piedrahita Ortiz
c7154a4045
community: sambastudio llms api v2 support (#25063)
- **Description:** SambaStudio GenericV2 API support
2024-09-03 10:18:15 -04:00
ZhangShenao
8d784db107
docs: Add missing args in api doc of WebResearchRetriever (#25949)
Add missing args in api doc of `WebResearchRetriever`
2024-09-03 01:24:23 -07:00
Isaac Francisco
4833375200
community[patch]: added option to change how duckduckgosearchresults tool converts api outputs into string (#22580)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-09-02 22:42:19 +00:00
JonZeolla
78ff51ce83
community[patch]: update the default hf bge embeddings (#22627)
**Description:** This updates the langchain_community > huggingface >
default bge embeddings ([the current default recommends this
change](https://huggingface.co/BAAI/bge-large-en))
**Issue:** None
**Dependencies:** None
**Twitter handle:** @jonzeolla

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-09-02 22:10:21 +00:00
Leonid Ganeline
150251fd49
docs: integrations reference updates 13 (#25711)
Added missed provider pages and links. Fixed inconsistent formatting.
Added arxiv references to docstirngs.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-09-02 22:08:50 +00:00
Yash Parmar
51dae57357
community[minor]: jina search tools integrating (jina reader) (#23339)
- **PR title**: "community: add Jina Search tool"
- **Description:** Added the Jina Search tool for querying the Jina
search API. This includes the implementation of the JinaSearchAPIWrapper
and the JinaSearch tool, along with a Jupyter notebook example
demonstrating its usage.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** [Twitter
handle](https://x.com/yashp3020?t=7wM0gQ7XjGciFoh9xaBtqA&s=09)


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-09-02 14:52:14 -07:00
Qingchuan Hao
3145995ed9
community[patch]: BingSearchResults returns raw snippets as artifact(#23304)
Returns an array of results which is more specific and easier for later
use.

Tested locally:
```
resp = tool.invoke("what's the weather like in Shanghai?")
for item in resp:
    print(item)
```
returns
```
{'snippet': '<b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days.', 'title': 'Shanghai, Shanghai, China Weather Forecast | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/weather-forecast/106577'}
{'snippet': '5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> <b>in Shanghai</b> and forecast for today, tomorrow, and next 14 days.', 'title': 'Weather for Shanghai, Shanghai Municipality, China - timeanddate.com', 'link': 'https://www.timeanddate.com/weather/china/shanghai'}
{'snippet': '<b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ...', 'title': 'Shanghai - BBC Weather', 'link': 'https://www.bbc.com/weather/1796236'}
{'snippet': 'Current <b>weather</b> <b>in Shanghai</b>, <b>Shanghai</b>, China. Check current conditions <b>in Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more.', 'title': 'Shanghai, Shanghai, China Current Weather | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/current-weather/106577'}
13-Day Beijing, Xi&#39;an, Chengdu, <b>Shanghai</b> Chinese Language and Culture Immersion Tour. <b>Shanghai</b> in September. Average daily temperature range: 23–29°C (73–84°F) Average rainy days: 10. Average sunny days: 20. September ushers in pleasant autumn <b>weather</b>, making it one of the best months to visit <b>Shanghai</b>. <b>Weather</b> in <b>Shanghai</b>: Climate, Seasons, and Average Monthly Temperature. <b>Shanghai</b> has a subtropical maritime monsoon climate, meaning high humidity and lots of rain. Hot muggy summers, cool falls, cold winters with little snow, and warm springs are the norm. Midsummer through early fall is the best time to visit <b>Shanghai</b>. <b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days. 1165. 45.9. 121. Winter, from December to February, is quite cold: the average January temperature is 5 °C (41 °F). There may be cold periods, with highs around 5 °C (41 °F) or below, and occasionally, even snow can fall. The temperature dropped to -10 °C (14 °F) in January 1977 and to -7 °C (19.5 °F) in January 2016. 5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> in <b>Shanghai</b> and forecast for today, tomorrow, and next 14 days. Everything you need to know about today&#39;s <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. High/Low, Precipitation Chances, Sunrise/Sunset, and today&#39;s Temperature History. <b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ... <b>Shanghai</b> 14 Day Extended Forecast. <b>Weather</b> Today <b>Weather</b> Hourly 14 Day Forecast Yesterday/Past <b>Weather</b> Climate (Averages) Currently: 84 °F. Passing clouds. (<b>Weather</b> station: <b>Shanghai</b> Hongqiao Airport, China). See more current <b>weather</b>. Current <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. Check current conditions in <b>Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more. <b>Shanghai</b> <b>Weather</b> Forecasts. <b>Weather Underground</b> provides local &amp; long-range <b>weather</b> forecasts, weatherreports, maps &amp; tropical <b>weather</b> conditions for the <b>Shanghai</b> area.
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-09-02 21:11:32 +00:00
Alexander KIRILOV
6a8f8a56ac
community[patch]: added content_columns option to CSVLoader (#23809)
**Description:** 
Adding a new option to the CSVLoader that allows us to implicitly
specify the columns that are used for generating the Document content.
Currently these are implicitly set as "all fields not part of the
metadata_columns".

In some cases however it is useful to have a field both as a metadata
and as part of the document content.
2024-09-02 20:25:53 +00:00
Bruno Alvisio
ab527027ac
community: Resolve refs recursively when generating openai_fn from OpenAPI spec (#19002)
- **Description:** This PR is intended to improve the generation of
payloads for OpenAI functions when converting from an OpenAPI spec file.
The solution is to recursively resolve `$refs`.
Currently when converting OpenAPI specs into OpenAI functions using
`openapi_spec_to_openai_fn`, if the schemas have nested references, the
generated functions contain `$ref` that causes the LLM to generate
payloads with an incorrect schema.

For example, for the for OpenAPI spec:

```
text = """
{
  "openapi": "3.0.3",
  "info": {
    "title": "Swagger Petstore - OpenAPI 3.0",
    "termsOfService": "http://swagger.io/terms/",
    "contact": {
      "email": "apiteam@swagger.io"
    },
    "license": {
      "name": "Apache 2.0",
      "url": "http://www.apache.org/licenses/LICENSE-2.0.html"
    },
    "version": "1.0.11"
  },
  "externalDocs": {
    "description": "Find out more about Swagger",
    "url": "http://swagger.io"
  },
  "servers": [
    {
      "url": "https://petstore3.swagger.io/api/v3"
    }
  ],
  "tags": [
    {
      "name": "pet",
      "description": "Everything about your Pets",
      "externalDocs": {
        "description": "Find out more",
        "url": "http://swagger.io"
      }
    },
    {
      "name": "store",
      "description": "Access to Petstore orders",
      "externalDocs": {
        "description": "Find out more about our store",
        "url": "http://swagger.io"
      }
    },
    {
      "name": "user",
      "description": "Operations about user"
    }
  ],
  "paths": {
    "/pet": {
      "post": {
        "tags": [
          "pet"
        ],
        "summary": "Add a new pet to the store",
        "description": "Add a new pet to the store",
        "operationId": "addPet",
        "requestBody": {
          "description": "Create a new pet in the store",
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/Pet"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "Successful operation",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/Pet"
                }
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "Tag": {
        "type": "object",
        "properties": {
          "id": {
            "type": "integer",
            "format": "int64"
          },
          "model_type": {
            "type": "number"
          }
        }
      },
      "Category": {
        "type": "object",
        "required": [
          "model",
          "year",
          "age"
        ],
        "properties": {
          "year": {
            "type": "integer",
            "format": "int64",
            "example": 1
          },
          "model": {
            "type": "string",
            "example": "Ford"
          },
          "age": {
            "type": "integer",
            "example": 42
          }
        }
      },
      "Pet": {
        "required": [
          "name"
        ],
        "type": "object",
        "properties": {
          "id": {
            "type": "integer",
            "format": "int64",
            "example": 10
          },
          "name": {
            "type": "string",
            "example": "doggie"
          },
          "category": {
            "$ref": "#/components/schemas/Category"
          },
          "tags": {
            "type": "array",
            "items": {
              "$ref": "#/components/schemas/Tag"
            }
          },
          "status": {
            "type": "string",
            "description": "pet status in the store",
            "enum": [
              "available",
              "pending",
              "sold"
            ]
          }
        }
      }
    }
  }
}
"""
```

Executing:
```
spec = OpenAPISpec.from_text(text)
pet_openai_functions, pet_callables = openapi_spec_to_openai_fn(spec)
response = model.invoke("Create a pet named Scott", functions=pet_openai_functions)
```

`pet_open_functions` contains unresolved `$refs`:

```
[
  {
    "name": "addPet",
    "description": "Add a new pet to the store",
    "parameters": {
      "type": "object",
      "properties": {
        "json": {
          "properties": {
            "id": {
              "type": "integer",
              "schema_format": "int64",
              "example": 10
            },
            "name": {
              "type": "string",
              "example": "doggie"
            },
            "category": {
              "ref": "#/components/schemas/Category"
            },
            "tags": {
              "items": {
                "ref": "#/components/schemas/Tag"
              },
              "type": "array"
            },
            "status": {
              "type": "string",
              "enum": [
                "available",
                "pending",
                "sold"
              ],
              "description": "pet status in the store"
            }
          },
          "type": "object",
          "required": [
            "name",
            "photoUrls"
          ]
        }
      }
    }
  }
]
```

and the generated JSON has an incorrect schema (e.g. category is filled
with `id` and `name` instead of `model`, `year` and `age`:

```
{
  "id": 1,
  "name": "Scott",
  "category": {
    "id": 1,
    "name": "Dogs"
  },
  "tags": [
    {
      "id": 1,
      "name": "tag1"
    }
  ],
  "status": "available"
}
```

With this change, the generated JSON by the LLM becomes,
`pet_openai_functions` becomes:

```
[
  {
    "name": "addPet",
    "description": "Add a new pet to the store",
    "parameters": {
      "type": "object",
      "properties": {
        "json": {
          "properties": {
            "id": {
              "type": "integer",
              "schema_format": "int64",
              "example": 10
            },
            "name": {
              "type": "string",
              "example": "doggie"
            },
            "category": {
              "properties": {
                "year": {
                  "type": "integer",
                  "schema_format": "int64",
                  "example": 1
                },
                "model": {
                  "type": "string",
                  "example": "Ford"
                },
                "age": {
                  "type": "integer",
                  "example": 42
                }
              },
              "type": "object",
              "required": [
                "model",
                "year",
                "age"
              ]
            },
            "tags": {
              "items": {
                "properties": {
                  "id": {
                    "type": "integer",
                    "schema_format": "int64"
                  },
                  "model_type": {
                    "type": "number"
                  }
                },
                "type": "object"
              },
              "type": "array"
            },
            "status": {
              "type": "string",
              "enum": [
                "available",
                "pending",
                "sold"
              ],
              "description": "pet status in the store"
            }
          },
          "type": "object",
          "required": [
            "name"
          ]
        }
      }
    }
  }
]
```

and the JSON generated by the LLM is:
```
{
  "id": 1,
  "name": "Scott",
  "category": {
    "year": 2022,
    "model": "Dog",
    "age": 42
  },
  "tags": [
    {
      "id": 1,
      "model_type": 1
    }
  ],
  "status": "available"
}
```

which has the intended schema.

    - **Twitter handle:**: @brunoalvisio

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-09-02 13:17:39 -07:00
Luiz F. G. dos Santos
36bbdc776e
community: fix bug to support for file_search tool from OpenAI (#25927)
- **Description:** The function `_is_assistants_builtin_tool` didn't had
support for `file_search` from OpenAI. This was creating conflict and
blocking the usage of such. OpenAI Assistant changed from`retrieval` to
`file_search`.
  
  The following code
  
  ```
              agent = OpenAIAssistantV2Runnable.create_assistant(
                name="Data Analysis Assistant",
                instructions=prompt[0].content,
                tools={'type': 'file_search'},
                model=self.chat_config.connection.deployment_name,
                client=llm,
                as_agent=True,
                tool_resources={
                    "file_search": {
                        "vector_store_ids": vector_store_id
                        }
                    }
                )
```

Was throwing the following error

```
Traceback (most recent call last):
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 500, in get_response
    return await super().get_response(post, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
    response = await self.inner_chat.get_response(post, context)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
    response = await self.inner_chat.get_response(post, context)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
    response = await self.inner_chat.get_response(post, context)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 4 more times]
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/azure_open_ai_chat.py",
line 147, in get_response
chain = chain_factory.get_chain(prompts, post.conversation.id,
overrides, context)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/llm_connections/chains.py",
line 1324, in get_chain
    agent = OpenAIAssistantV2Runnable.create_assistant(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in create_assistant
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in <listcomp>
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 119, in _get_assistants_tool
    return convert_to_openai_tool(tool)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 255, in convert_to_openai_tool
    function = convert_to_openai_function(tool)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 230, in convert_to_openai_function
    raise ValueError(
ValueError: Unsupported function

{'type': 'file_search'}

Functions must be passed in as Dict, pydantic.BaseModel, or Callable. If
they're a dict they must either be in OpenAI function format or valid
JSON schema with top-level 'title' and 'description' keys.
```

With the proposed changes, this is fixed and the function will have support for `file_search`.
  This was the only place missing the support for `file_search`.
  
  Reference doc
  https://platform.openai.com/docs/assistants/tools/file-search
  
  
  - **Twitter handle:** luizf0992

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-09-02 18:21:51 +00:00
xander-art
6cd452d985
Feature/update hunyuan (#25779)
Description: 
    - Add system templates and user templates in integration testing
    - initialize the response id field value to request_id
    - Adjust the default model to hunyuan-pro
    - Remove the default values of Temperature and TopP
    - Add SystemMessage

all the integration tests have passed.
1、Execute integration tests for the first time
<img width="1359" alt="71ca77a2-e9be-4af6-acdc-4d665002bd9b"
src="https://github.com/user-attachments/assets/9298dc3a-aa26-4bfa-968b-c011a4e699c9">

2、Run the integration test a second time
<img width="1501" alt="image"
src="https://github.com/user-attachments/assets/61335416-4a67-4840-bb89-090ba668e237">

Issue: None
Dependencies: None
Twitter handle: None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-09-02 12:55:08 +00:00
Yuwen Hu
566e9ba164
community: add Intel GPU support to ipex-llm llm integration (#22458)
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds Intel GPU support to `ipex-llm` llm integration.
**Dependencies:** `ipex-llm`
**Contribution maintainer**: @ivy-lv11 @Oscilloscope98
**tests and docs**: 
- Add: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/libs/community/tests/llms/test_ipex_llm.py

---------

Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-09-02 08:49:08 -04:00
Emmanuel Leroy
654da27255
improve llamacpp embeddings (#12972)
- **Description:**
Improve llamacpp embedding class by adding the `device` parameter so it
can be passed to the model and used with `gpu`, `cpu` or Apple metal
(`mps`).
Improve performance by making use of the bulk client api to compute
embeddings in batches.
  
  - **Dependencies:** none
  - **Tag maintainer:** 
@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-31 18:27:59 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
64b62f6ae4
community[neo4j_vector]: make embedding dimension check optional (#25737)
**Description:**

Starting from Neo4j 5.23 (22 August 2024), with vector-2.0 indexes,
`vector.dimensions` is not required to be set, which will cause it the
key not exist error in index config if it's not set.

Since the existence of vector.dimensions will only ensure additional
checks, this commit turns embedding dimension check optional, and only
do checks when it exists (not None).

https://neo4j.com/release-notes/database/neo4j-5/

**Twitter handle:** @HollowM186

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-31 12:36:20 +00:00
Christophe Bornet
0a752a74cc
community[patch], docs: Add API reference doc for GraphVectorStore (#25751) 2024-08-30 17:42:00 -07:00
Bagatur
ca1c3bd9c0
community[patch]: bump + fix core dep (#25901) 2024-08-30 15:54:07 -07:00
mehdiosa
c6f00e6bdc
community: Fix branch not being considered when using GithubFileLoader (#20075)
- **Description:** Added `ref` query parameter so data is not loaded
only from the default branch but any branch passed

---------

Co-authored-by: Osama Mehdi <mehdi@hm.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-30 21:47:11 +00:00
Alex Sherstinsky
617a4e617b
community: Fix a bug in handling kwargs overwrites in Predibase integration, and update the documentation. (#25893)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-30 12:41:42 -07:00
Anush
ade4bfdff1
qdrant: Updated class check in Self-Query Retriever factory (#25877)
## Description

- Updates the self-query retriever factory to check for the new Qdrant
vector store class. i.e. `langchain_qdrant.QdrantVectorstore`.
- Deprecates `QdrantSparseVectorRetriever`, since the vector store
implementation natively supports it now.

Resolves #25798
2024-08-30 12:11:55 -04:00
Djordje
862ef32fdc
community: Fixed infinity embeddings async request (#25882)
**Description:** Fix async infinity embeddings
**Issue:** #24942  

@baskaryan, @ccurme
2024-08-30 12:10:34 -04:00
rainsubtime
f75d5621e2
community:Fix a bug of LLM in moonshot (#25878)
- **Description:** When useing LLM integration moonshot,it's occurring
error "'Moonshot' object has no attribute '_client'",it's because of the
"_client" that is private in pydantic v1.0 so that we can't use it.I
turn "_client" into "client" , the error to be resolved!
- **Issue:** the issue #24390 
- **Dependencies:** none
- **Twitter handle:** @Rainsubtime




- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Co-authored-by: Cyue <Cyue_work2001@163.com>
2024-08-30 16:09:39 +00:00
ZhangShenao
fd0f147df3
Improvement[Community] Add tool-calling test case for ChatZhipuAI (#25884)
- Add tool-calling test case for `ChatZhipuAI`
2024-08-30 12:05:43 -04:00
默奕
6377185291
add neo4j query constructor for self query (#25288)
- [x] **PR title - community: add neo4j query constructor for self
query**

- [x] **PR message**
- **Description:** adding a Neo4jTranslator so that the Neo4j vector
database can use SelfQueryRetriever
    - **Issue:** this issue had been raised before in #19748
    - **Dependencies:** none. 
    - **Twitter handle:** @moyi_dang
- p.s. I have not added the query constructor in BUILTIN_TRANSLATORS in
this PR, I want to make changes to only one package at a time.

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-30 14:54:33 +00:00
Erick Friis
09b04c7e3b
"community: release 0.2.15" (#25867) 2024-08-30 02:18:48 +00:00
Erick Friis
f7e62754a1
community: undo azure_ad_access_token breaking change (#25818) 2024-08-30 02:06:14 +00:00
Kyle Winkelman
201bdf7148
community: Cap AzureOpenAIEmbeddings chunk_size at 2048 instead of 16. (#25852)
**Description:** Within AzureOpenAIEmbeddings there is a validation to
cap `chunk_size` at 16. The value of 16 is either an old limitation or
was erroneously chosen. I have checked all of the `preview` and `stable`
releases to ensure that the `embeddings` endpoint can handle 2048
entries
[Azure/azure-rest-api-specs](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference).
I have also found many locations that confirm this limit should be 2048:
-
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings
-
https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits

**Issue:** fixes #25462
2024-08-29 16:48:04 +00:00
Allan Ascencio
a8af396a82
added octoai test (#21793)
- [ ] **PR title**: community: add tests for ChatOctoAI

- [ ] **PR message**: 
Description: Added unit tests for the ChatOctoAI class in the community
package to ensure proper validation and default values. These tests
verify the correct initialization of fields, the handling of missing
required parameters, and the proper setting of aliases.
Issue: N/A
Dependencies: None

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-08-29 15:07:27 +00:00
Param Singh
69f9acb60f
premai[patch]: Standardize premai params (#21513)
Thank you for contributing to LangChain!

community:premai[patch]: standardize init args

- updated `temperature` with Pydantic Field, updated the unit test.
- updated `max_tokens` with Pydantic Field, updated the unit test.
- updated `max_retries` with Pydantic Field, updated the unit test.

Related to #20085

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-29 11:01:28 -04:00
Guangdong Liu
fcf9230257
community(sparkllm): Add function call support in Sparkllm chat model. (#20607)
- **Description:** Add function call support in Sparkllm chat model.
Related documents
https://www.xfyun.cn/doc/spark/Web.html#_2-function-call%E8%AF%B4%E6%98%8E
- @baskaryan

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-29 14:38:39 +00:00
Jorge Piedrahita Ortiz
9ac953a948
Community: sambastudio embeddings GenericV2 API support (#25064)
- **Description:** 
        SambaStudio GenericV2 API support 
        Minor changes for requests error handling
2024-08-29 09:52:49 -04:00