Commit Graph

8721 Commits (38faa74c238e3fc20ff98c5c219a765eb3c0a0ee)
 

Author SHA1 Message Date
Brace Sproul ce0a588ae6
docs[minor]: Add chat model tabs to docs pages (#19589) 3 months ago
BeatrixCohere bd02b83acd
cohere[patch]: Allow overriding of the base URL in Cohere Client (#19766)
This PR adds the ability for a user to override the base API url for the
Cohere client for embeddings and chat llm.
3 months ago
Nisarg Trivedi 1252ccce6f
text-splitters[minor]: Added Haskell support in langchain.text_splitter module (#16191)
- **Description:** Haskell language support added in text_splitter
module
  - **Dependencies:** No
  - **Twitter handle:** @nisargtr

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Hrvoje Milković b7344e3347
community[minor]: Infobip tool integration (#16805)
**Description:** Adding Tool that wraps Infobip API for sending sms or
emails and email validation.
**Dependencies:** None,
**Twitter handle:** @hmilkovic

Implementation:
```
libs/community/langchain_community/utilities/infobip.py
```

Integration tests:
```
libs/community/tests/integration_tests/utilities/test_infobip.py
```

Example notebook:
```
docs/docs/integrations/tools/infobip.ipynb
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Luka Krapic 727a2ea9f1
community[patch]: history size support for DynamoDBChatMessageHistory (#16794)
**Description:** PR adds support for limiting number of messages
preserved in a session history for DynamoDBChatMessageHistory

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Dt22 6dbf1a2de0
community[patch]: fix redis input type for index_schema field (#16874)
### Subject: Fix Type Misdeclaration for index_schema in redis/base.py

I noticed a type misdeclaration for the index_schema column in the
redis/base.py file.

When following the instructions outlined in [Redis Custom Metadata
Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis)
to create our own index_schema, it leads to a Pylance type error. <br/>
**The error message indicates that Dict[str, list[Dict[str, str]]] is
incompatible with the type Optional[Union[Dict[str, str], str,
os.PathLike]].**

```
index_schema = {
    "tag": [{"name": "credit_score"}],
    "text": [{"name": "user"}, {"name": "job"}],
    "numeric": [{"name": "age"}],
}

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users_modified",
    index_schema=index_schema,  
)
```
Therefore, I have created this pull request to rectify the type
declaration problem.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
morgana 074ad5095f
community[patch]: mmr search for Rockset vectorstore integration (#16908)
- **Description:** Adding support for mmr search in the Rockset
vectorstore integration.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
shahrin014 f51e6a35ba
community[patch]: OllamaEmbeddings - Pass headers to post request (#16880)
## Feature
- Set additional headers in constructor
- Headers will be sent in post request

This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.

## Tests
- Test if header is passed
- Test if header is not passed

Similar to https://github.com/langchain-ai/langchain/pull/15881

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Lance Martin e0f137dbe0
docs: Agentic and Self-RAG w/ LangGraph (#16910)
To do:
[ ] Add streaming
[ ] Move to LangGraph
3 months ago
Jan Chorowski b8b42ccbc5
community[minor]: Pathway vectorstore(#14859)
- **Description:** Integration with pathway.com data processing pipeline
acting as an always updated vectorstore
  - **Issue:** not applicable
- **Dependencies:** optional dependency on
[`pathway`](https://pypi.org/project/pathway/)
  - **Twitter handle:** pathway_com

The PR provides and integration with `pathway` to provide an easy to use
always updated vector store:

```python
import pathway as pw
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer

data_sources = []
data_sources.append(
    pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True))

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
vector_server = PathwayVectorServer(
    *data_sources,
    embedder=embeddings_model,
    splitter=text_splitter,
)
vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False)
client = PathwayVectorClient(
    host="127.0.0.1",
    port="8765",
)
query = "What is Pathway?"
docs = client.similarity_search(query)
```

The `PathwayVectorServer` builds a data processing pipeline which
continusly scans documents in a given source connector (google drive,
s3, ...) and builds a vector store. The `PathwayVectorClient` implements
LangChain's `VectorStore` interface and connects to the server to
retrieve documents.

---------

Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com>
Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local>
Co-authored-by: Berke <berkecanrizai1@gmail.com>
Co-authored-by: Adrian Kosowski <adrian@pathway.com>
Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home>
Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home>
Co-authored-by: Szymon Dudycz <szymond@pathway.com>
Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
ccurme 0dbd5f5012
add script to check imports (#19611) 3 months ago
Arturs Konfino 2319212d54
community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762)
If `prompt` is passed into `create_sql_agent()`, then
`toolkit.get_context()` shouldn't be executed against the database
unless relevant prompt variables (`table_info` or `table_names`) are
present .
3 months ago
高璟琦 ec7a59c96c
community[minor]: Add solar embedding (#19761)
Solar is a large language model developed by
[Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM.
You can visit the embedding service provided by Solar within this pr.

You may get **SOLAR_API_KEY** from
https://console.upstage.ai/services/embedding
You can refer to more details about accepted llm integration at
https://python.langchain.com/docs/integrations/llms/solar.
3 months ago
Tomaz Bratanic dec00d3050
community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758)
Makes it easier to flatten complex values to text, so you don't have to
use a lot of Cypher to do it.
3 months ago
Robby f7e8a382cc
community[minor]: add hugging face text-to-speech inference API (#18880)
Description: I implemented a tool to use Hugging Face text-to-speech
inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have
[LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol.

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
DasDingoCodes 73eb3f8fd9
community[minor]: Implement DirectoryLoader lazy_load function (#19537)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: Implement DirectoryLoader lazy_load
function"

- [x] **Description**: The `lazy_load` function of the `DirectoryLoader`
yields each document separately. If the given `loader_cls` of the
`DirectoryLoader` also implemented `lazy_load`, it will be used to yield
subdocuments of the file.

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access:
`libs/community/tests/unit_tests/document_loaders/test_directory_loader.py`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory:
`docs/docs/integrations/document_loaders/directory.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
Christophe Bornet 6b2b511f68
core[minor]: Add aformat_messages to FewShotChatMessagePromptTemplate and ChatPromptTemplate (#19648)
Needed since the example selector may use a vector store.
3 months ago
Leonid Ganeline 5f814820f6
docs: providers pinecone fix (#19737)
Current providers page use link to the old package.
- Fixed installation instructions
- Added a reference to the Pinecone retriever
3 months ago
Bob Lin 53a74ad12b
docs: use markdown cell instead of code block (#19740)
I found that the code of async and async batch was divided into two
blocks:

<img width="823" alt="Screenshot 2024-03-29 at 7 45 59 AM"
src="https://github.com/langchain-ai/langchain/assets/10000925/0fa59d29-a692-4309-afb8-2260f03242ec">


so I changed it to unified.
3 months ago
Ekaterina Aidova 4ce36af335
docs: fix link in openvino integration doc (#19749)
- **Description:** fix incorrect link in docs
 - **Dependencies:** None
3 months ago
Jialei f7c903e24a
community[minor]: add support for Moonshot llm and chat model (#17100) 3 months ago
Gustavo Isturiz 824dccf5e2
docs: fixed xml URL on sitemap docs exmaple, issue #17236 (#17304) 3 months ago
Ethan Yang 7164015135
community[minor]: Add Openvino embedding support (#19632)
This PR is used to support both HF and BGE embeddings with openvino

---------

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
3 months ago
Guangdong Liu cd55d587c2
langchain[patch]: Upgrade openai's sdk and solve some interface adaptation problems. (#19548)
- **Issue:** close #19534
3 months ago
Kirushikesh DB 12861273e1
experimental[patch]: Removed 'SQLResults:' from the LLMResponse in SQLDatabaseChain (#17104)
**Description:** 
When using the SQLDatabaseChain with Llama2-70b LLM and, SQLite
database. I was getting `Warning: You can only execute one statement at
a time.`.

```
from langchain.sql_database import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain

sql_database_path = '/dccstor/mmdataretrieval/mm_dataset/swimming_record/rag_data/swimmingdataset.db'
sql_db = get_database(sql_database_path)
db_chain = SQLDatabaseChain.from_llm(mistral, sql_db, verbose=True, callbacks = [callback_obj])
db_chain.invoke({
    "query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?"
})
```
Error:
```
Warning                                   Traceback (most recent call last)
Cell In[31], line 3
      1 import langchain
      2 langchain.debug=False
----> 3 db_chain.invoke({
      4     "query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?"
      5 })

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:162, in Chain.invoke(self, input, config, **kwargs)
    160 except BaseException as e:
    161     run_manager.on_chain_error(e)
--> 162     raise e
    163 run_manager.on_chain_end(outputs)
    164 final_outputs: Dict[str, Any] = self.prep_outputs(
    165     inputs, outputs, return_only_outputs
    166 )

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:156, in Chain.invoke(self, input, config, **kwargs)
    149 run_manager = callback_manager.on_chain_start(
    150     dumpd(self),
    151     inputs,
    152     name=run_name,
    153 )
    154 try:
    155     outputs = (
--> 156         self._call(inputs, run_manager=run_manager)
    157         if new_arg_supported
    158         else self._call(inputs)
    159     )
    160 except BaseException as e:
    161     run_manager.on_chain_error(e)

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:198, in SQLDatabaseChain._call(self, inputs, run_manager)
    194 except Exception as exc:
    195     # Append intermediate steps to exception, to aid in logging and later
    196     # improvement of few shot prompt seeds
    197     exc.intermediate_steps = intermediate_steps  # type: ignore
--> 198     raise exc

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:143, in SQLDatabaseChain._call(self, inputs, run_manager)
    139     intermediate_steps.append(
    140         sql_cmd
    141     )  # output: sql generation (no checker)
    142     intermediate_steps.append({"sql_cmd": sql_cmd})  # input: sql exec
--> 143     result = self.database.run(sql_cmd)
    144     intermediate_steps.append(str(result))  # output: sql exec
    145 else:

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:436, in SQLDatabase.run(self, command, fetch, include_columns)
    425 def run(
    426     self,
    427     command: str,
    428     fetch: Literal["all", "one"] = "all",
    429     include_columns: bool = False,
    430 ) -> str:
    431     """Execute a SQL command and return a string representing the results.
    432 
    433     If the statement returns rows, a string of the results is returned.
    434     If the statement returns no rows, an empty string is returned.
    435     """
--> 436     result = self._execute(command, fetch)
    438     res = [
    439         {
    440             column: truncate_word(value, length=self._max_string_length)
   (...)
    443         for r in result
    444     ]
    446     if not include_columns:

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:413, in SQLDatabase._execute(self, command, fetch)
    410     elif self.dialect == "postgresql":  # postgresql
    411         connection.exec_driver_sql("SET search_path TO %s", (self._schema,))
--> 413 cursor = connection.execute(text(command))
    414 if cursor.returns_rows:
    415     if fetch == "all":

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1416, in Connection.execute(self, statement, parameters, execution_options)
   1414     raise exc.ObjectNotExecutableError(statement) from err
   1415 else:
-> 1416     return meth(
   1417         self,
   1418         distilled_parameters,
   1419         execution_options or NO_OPTIONS,
   1420     )

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:516, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options)
    514     if TYPE_CHECKING:
    515         assert isinstance(self, Executable)
--> 516     return connection._execute_clauseelement(
    517         self, distilled_params, execution_options
    518     )
    519 else:
    520     raise exc.ObjectNotExecutableError(self)

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1639, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options)
   1627 compiled_cache: Optional[CompiledCacheType] = execution_options.get(
   1628     "compiled_cache", self.engine._compiled_cache
   1629 )
   1631 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
   1632     dialect=dialect,
   1633     compiled_cache=compiled_cache,
   (...)
   1637     linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
   1638 )
-> 1639 ret = self._execute_context(
   1640     dialect,
   1641     dialect.execution_ctx_cls._init_compiled,
   1642     compiled_sql,
   1643     distilled_parameters,
   1644     execution_options,
   1645     compiled_sql,
   1646     distilled_parameters,
   1647     elem,
   1648     extracted_params,
   1649     cache_hit=cache_hit,
   1650 )
   1651 if has_events:
   1652     self.dispatch.after_execute(
   1653         self,
   1654         elem,
   (...)
   1658         ret,
   1659     )

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1848, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1843     return self._exec_insertmany_context(
   1844         dialect,
   1845         context,
   1846     )
   1847 else:
-> 1848     return self._exec_single_context(
   1849         dialect, context, statement, parameters
   1850     )

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1988, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1985     result = context._setup_result_proxy()
   1987 except BaseException as e:
-> 1988     self._handle_dbapi_exception(
   1989         e, str_statement, effective_parameters, cursor, context
   1990     )
   1992 return result

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2346, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2344     else:
   2345         assert exc_info[1] is not None
-> 2346         raise exc_info[1].with_traceback(exc_info[2])
   2347 finally:
   2348     del self._reentrant_error

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1969, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1967                 break
   1968     if not evt_handled:
-> 1969         self.dialect.do_execute(
   1970             cursor, str_statement, effective_parameters, context
   1971         )
   1973 if self._has_events or self.engine._has_events:
   1974     self.dispatch.after_cursor_execute(
   1975         self,
   1976         cursor,
   (...)
   1980         context.executemany,
   1981     )

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/default.py:922, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    921 def do_execute(self, cursor, statement, parameters, context=None):
--> 922     cursor.execute(statement, parameters)

Warning: You can only execute one statement at a time.
```
**Issue:** 
The Error occurs because when generating the SQLQuery, the llm_input
includes the stop character of "\nSQLResult:", so for this user query
the LLM generated response is **SELECT Time FROM men_butterfly_100m
WHERE Swimmer = 'Lance Larson';\nSQLResult:** it is required to remove
the SQLResult suffix on the llm response before executing it on the
database.

```
llm_inputs = {
            "input": input_text,
            "top_k": str(self.top_k),
            "dialect": self.database.dialect,
            "table_info": table_info,
            "stop": ["\nSQLResult:"],
        }

sql_cmd = self.llm_chain.predict(
                callbacks=_run_manager.get_child(),
                **llm_inputs,
            ).strip()

if SQL_RESULT in sql_cmd:
    sql_cmd = sql_cmd.split(SQL_RESULT)[0].strip()
result = self.database.run(sql_cmd)
```


<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
T Cramer 540ebf35a9
community[patch]: Add explicit error message to Bedrock error output. (#17328)
- **Description:** Propagate Bedrock errors into Langchain explicitly.
Use-case: unset region error is hidden behind 'Could not load
credentials...' message
- **Issue:**
[17654](https://github.com/langchain-ai/langchain/issues/17654)
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Marcus Virginia 69bb96c80f
community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374)
- **Description:** Handle for empty metadata and allow collection names
with complex characters
  - **Issue:** #17057
  - **Dependencies:** `surrealdb`

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
ale-delfino 0df76bee37
core[patch]:: XML parser to cover the case when the xml only contains the root level tag (#17456)
Description: Fix xml parser to handle strings that only contain the root
tag
Issue: N/A
Dependencies: None
Twitter handle: N/A

A valid xml text can contain only the root level tag. Example: <body>
  Some text here
</body>
The example above is a valid xml string. If parsed with the current
implementation the result is {"body": []}. This fix checks if the root
level text contains any non-whitespace character and if that's the case
it returns {root.tag: root.text}. The result is that the above text is
correctly parsed as {"body": "Some text here"}

@ale-delfino

Thank you for contributing to LangChain!

Checklist:

- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
- [x] PR message: **Delete this entire template message** and replace it
with the following bulleted list
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @efriis, @eyurtsev, @hwchase17.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
3 months ago
kYLe 124ab79c23
community[minor]: Add Anyscale embedding support (#17605)
**Description:** Add embedding model support for Anyscale Endpoint
**Dependencies:** openai

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Lance Martin 12843f292f
community[patch]: llama cpp embeddings reset default n_batch (#17594)
When testing Nomic embeddings --
```
from langchain_community.embeddings import LlamaCppEmbeddings
embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf"
embd_lc = LlamaCppEmbeddings(model_path=embd_model_path)
embedding_lc = embd_lc.embed_query(query)
```

We were seeing this error for strings > a certain size -- 
```
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count)
    824     s_sizes = []
    826 # add to batch
--> 827 self._batch.add_sequence(tokens, len(s_sizes), False)
    828 t_batch += n_tokens
    829 s_sizes.append(n_tokens)

File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all)
    540 self.batch.token[j] = batch[i]
    541 self.batch.pos[j] = i
--> 542 self.batch.seq_id[j][0] = seq_id
    543 self.batch.n_seq_id[j] = 1
    544 self.batch.logits[j] = logits_all

ValueError: NULL pointer access
```

The default `n_batch` of llama-cpp-python's Llama is `512` but we were
explicitly setting it to `8`.
 
These need to be set to equal for embedding models. 
* The embedding.cpp example has an assertion to make sure these are
always equal.
* Apparently this is not being done properly in llama-cpp-python.

With `n_batch` set to 8, if more than 8 tokens are passed the batch runs
out of space and it crashes.

This also explains why the CPU compute buffer size was small:

raw client with default `n_batch=512`
```
llama_new_context_with_model:        CPU input buffer size   =     3.51 MiB
llama_new_context_with_model:        CPU compute buffer size =    21.00 MiB
```
langchain with `n_batch=8`
```
llama_new_context_with_model:        CPU input buffer size   =     0.04 MiB
llama_new_context_with_model:        CPU compute buffer size =     0.33 MiB
```

We can work around this by passing `n_batch=512`, but this will not be
obvious to some users:
```
    embedding = LlamaCppEmbeddings(model_path=embd_model_path,
                                   n_batch=512)
```

From discussion w/ @cebtenzzre. Related:

https://github.com/abetlen/llama-cpp-python/issues/1189

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Zijian Han 8e976545f3
community[patch]: support OpenAI whisper base url (#17695)
**Description:** The base URL for OpenAI is retrieved from the
environment variable "OPENAI_BASE_URL", whereas for langchain it is
obtained from "OPENAI_API_BASE". By adding `base_url =
os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute
correctly.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Paulo Nascimento 44a3484503
community[patch]: add NotebookLoader unit test (#17721)
Thank you for contributing to LangChain!

- **Description:** added unit tests for NotebookLoader. Linked PR:
https://github.com/langchain-ai/langchain/pull/17614
- **Issue:**
[#17614](https://github.com/langchain-ai/langchain/pull/17614)
    - **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Paulo Nascimento 4c3a67122f
community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771)
**Description:** Created a Langchain Tool for OpenAI DALLE Image
Generation.
**Issue:**
[#15901](https://github.com/langchain-ai/langchain/issues/15901)
**Dependencies:** n/a
**Twitter handle:** @paulodoestech

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Kaixin Yang a8104ea8e9
openai[patch]: add checking codes for calling AI model get error (#17909)
**Description:**: adding checking codes for calling AI model get error
in chat_models/base.py and llms/base.py
**Issue**: Sometimes the AI Model calling will get error, we should
raise it.
Otherwise, the next code 'choices.extend(response["choices"])' will
throw a "TypeError: 'NoneType' object is not iterable" error to mask the
true error.
       Because 'response["choices"]' is None.
**Dependencies**: None

---------

Co-authored-by: yangkx <yangkx@asiainfo-int.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Vincent Chen 833d61adb3
docs: update Together README.md (#18004)
## PR message
**Description:** This PR adds a README file for the Together API in the
`libs/partners` folder of this repository. The README includes:
 - A brief description of the package
 - Installation instructions and class introductions
 - Simple usage examples

**Issue:** #17545 

This PR only contains document changes.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Jiaming 3d3cc71287
community[patch]: fix bugs for bilibili Loader (#18036)
- **Description:** 
1. Fix the BiliBiliLoader that can receive cookie parameters, it
requires 3 other parameters to run. The change is backward compatible.
  2. Add test;      
  3. Add example in docs

- **Issue:** [#14213]

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Ethan Knights 1ef3fa0411
docs: improve readability of Langchain Expression Language get_started.ipynb (#18157)
**Description:** A few grammatical changes to improve readability of the
LCEL .ipynb and tidy some null characters.
**Issue:** N/A

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Sachin Paryani 25c9f3d1d1
community[patch]: Support Streaming in Azure Machine Learning (#18246)
- [x] **PR title**: "community: Support streaming in Azure ML and few
naming changes"

- [x] **PR message**:
- **Description:** Added support for streaming for azureml_endpoint.
Also, renamed and AzureMLEndpointApiType.realtime to
AzureMLEndpointApiType.dedicated. Also, added new classes
CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and
updated the classes LlamaChatContentFormatter and LlamaContentFormatter
to now show a deprecated warning message when instantiated.

---------

Co-authored-by: Sachin Paryani <saparan@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
xiaohuanshu ecb11a4a32
langchain[patch]: fix BaseChatMemory get output data error with extra key (#18117)
**Description:** At times, BaseChatMemory._get_input_output may acquire
some extra keys such as 'intermediate_steps' (agent_executor with
return_intermediate_steps set to True) and 'messages'
(agent_executor.iter with memory). In these instances, _get_input_output
can raise an error due to the presence of multiple keys. The 'output'
field should be used as the default field in these cases.
**Issue:** #16791
3 months ago
Isaac Francisco f5e84c8858
docs: fixing markdown for tips (#18199)
Previous markdown code was not working as intended, new code should add
green box around the tip so it is highlighted

Co-authored-by: Hershenson, Isaac (Extern) <isaac.hershenson.extern@bayer04.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Hayden Wolff 85deee521a
docs: Nvidia Riva Runnables Documentation (#18237)
- **Description:** Documents how to use the Riva runnables to add
streamed automatic-speech-recognition (ASR) and text-to-speech (TTS) to
chains.
  - **Issue:** None
  - **Dependencies:** None
  - **Twitter handle:** @HaydenWolff1

---------

Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <hwolff@MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
Victor Adan afa2d85405
community[patch]: Added missing from_documents method to KNNRetriever. (#18411)
- Description: Added missing `from_documents` method to `KNNRetriever`,
providing the ability to supply metadata to LangChain `Document`s, and
to give it parity to the other retrievers, which do have
`from_documents`.
- Issue: None
- Dependencies: None
- Twitter handle: None

Co-authored-by: Victor Adan <vadan@netroadshow.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
Smit Parmar dfc4177b50
community[patch]: mypy ignore fix (#18483)
Relates to #17048 
Description : Applied fix to dynamodb and elasticsearch file.

Error was : `Cannot override writeable attribute with read-only
property`
Suggestion:
instead of adding 
```
@messages.setter
def messages(self, messages: List[BaseMessage]) -> None:
    raise NotImplementedError("Use add_messages instead")
```

we can change base class property
`messages: List[BaseMessage]`
to
```
@property
def messages(self) -> List[BaseMessage]:...
```

then we don't need to add `@messages.setter` in all child classes.
3 months ago
aditya thomas dc9e9a66db
docs: update docstring of the ChatAnthropic and AnthropicLLM classes (#18649)
**Description:** Update docstring of the ChatAnthropic and AnthropicLLM
classes
**Issue:** Not applicable
**Dependencies:** None
3 months ago
Luca Dorigo f19229c564
core[patch]: fix beta, deprecated typing (#18877)
**Description:** 

While not technically incorrect, the TypeVar used for the `@beta`
decorator prevented pyright (and thus most vscode users) from correctly
seeing the types of functions/classes decorated with `@beta`.

This is in part due to a small bug in pyright
(https://github.com/microsoft/pyright/issues/7448 ) - however, the
`Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not
doing anything - classes are `Callables` by default, so by my
understanding binding to `Type` does not actually provide any more
safety - the modified annotation still works correctly for both
functions, properties, and classes.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
3 months ago
aditya thomas 263ee78886
core[runnables]: docstring for class RunnableSerializable, method configurable_fields (#19722)
**Description:** Update to the docstring for class RunnableSerializable,
method configurable_fields
**Issue:** [Add in code documentation to core Runnable methods
#18804](https://github.com/langchain-ai/langchain/issues/18804)
**Dependencies:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago
HuangZiy e1f10a697e
openai[patch]: perform judgment processing on chat model streaming delta (#18983)
**PR title:** partners: openai chat model
**PR message:** perform judgment processing on chat model streaming
delta
Closes #18977

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
3 months ago
wulixuan b7c8bc8268
community[patch]: fix yuan2 errors in LLMs (#19004)
1. fix yuan2 errors while invoke Yuan2.
2. update tests.
3 months ago
Bob Lin aba4bd0d13
docs: Add async batch case (#19686) 3 months ago
aditya thomas ec4dcfca7f
core[runnables]: docstring of class RunnableSerializable, method configurable_alternatives (#19724)
**Description:** Update to the docstring for class RunnableSerializable,
method configurable_alternatives
**Issue:** [Add in code documentation to core Runnable methods
#18804](https://github.com/langchain-ai/langchain/issues/18804)
**Dependencies:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
3 months ago