**Description**: This PR adds support for using the [LLMLingua project
](https://github.com/microsoft/LLMLingua) especially the LongLLMLingua
(Enhancing Large Language Model Inference via Prompt Compression) as a
document compressor / transformer.
The LLMLingua project is an interesting project that can greatly improve
RAG system by compressing prompts and contexts while keeping their
semantic relevance.
**Issue**: https://github.com/microsoft/LLMLingua/issues/31
**Dependencies**: [llmlingua](https://pypi.org/project/llmlingua/)
@baskaryan
---------
Co-authored-by: Ayodeji Ayibiowu <ayodeji.ayibiowu@getinge.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
## Description
I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.
Initial context is in the issue we opened (#11229).
This pull request adds:
- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
- C
- C++
- C#
- Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
- Kotlin
- Lua
- Perl
- Ruby
- Rust
- Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)
Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.
## Issues
- Closes#11229
- Closes#10996
- Closes#8405
## Dependencies
`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.
## Documentation
We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.
## Maintainer
- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)
Thanks!!
## Git commits
We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Please tag this issue with `nvidia_genai`**
- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A
All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable
The following Python function is an example of creating a chain that
makes use of both of these Runnables:
```python
def create(
config: Configuration,
audio_encoding: RivaAudioEncoding,
sample_rate: int,
audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
"""Create a new instance of the chain."""
_LOGGER.info("Instantiating the chain.")
# create the riva asr client
riva_asr = RivaASR(
url=str(config.riva_asr.service.url),
ssl_cert=config.riva_asr.service.ssl_cert,
encoding=audio_encoding,
audio_channel_count=audio_channels,
sample_rate_hertz=sample_rate,
profanity_filter=config.riva_asr.profanity_filter,
enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
language_code=config.riva_asr.language_code,
)
# create the prompt template
prompt = PromptTemplate.from_template("{user_input}")
# model = ChatOpenAI()
model = ChatNVIDIA(model="mixtral_8x7b") # type: ignore
# create the riva tts client
riva_tts = RivaTTS(
url=str(config.riva_asr.service.url),
ssl_cert=config.riva_asr.service.ssl_cert,
output_directory=config.riva_tts.output_directory,
language_code=config.riva_tts.language_code,
voice_name=config.riva_tts.voice_name,
)
# construct and return the chain
return {"user_input": riva_asr} | prompt | model | riva_tts # type: ignore
```
The following code is an example of creating a new audio stream for
Riva:
```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
await input_stream.aput(chunk)
input_stream.close()
```
The following code is an example of how to execute the chain with
RivaASR and RivaTTS
```python
output_stream = asyncio.Queue()
while not input_stream.complete:
async for chunk in chain.astream(input_stream):
output_stream.put(chunk)
```
Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.
---------
Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Previously, if this did not find a mypy cache then it wouldnt run
this makes it always run
adding mypy ignore comments with existing uncaught issues to unblock other prs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** Adding Oracle Cloud Infrastructure Generative AI
integration. Oracle Cloud Infrastructure (OCI) Generative AI is a fully
managed service that provides a set of state-of-the-art, customizable
large language models (LLMs) that cover a wide range of use cases, and
which is available through a single API. Using the OCI Generative AI
service you can access ready-to-use pretrained models, or create and
host your own fine-tuned custom models based on your own data on
dedicated AI clusters.
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
- **Issue:** None,
- **Dependencies:** OCI Python SDK,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
Passed
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
we provide unit tests. However, we cannot provide integration tests due
to Oracle policies that prohibit public sharing of api keys.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
This PR adds a VectorStore integration for SAP HANA Cloud Vector Engine,
which is an upcoming feature in the SAP HANA Cloud database
(https://blogs.sap.com/2023/11/02/sap-hana-clouds-vector-engine-announcement/).
- **Issue:** N/A
- **Dependencies:** [SAP HANA Python
Client](https://pypi.org/project/hdbcli/)
- **Twitter handle:** @sapopensource
Implementation of the integration:
`libs/community/langchain_community/vectorstores/hanavector.py`
Unit tests:
`libs/community/tests/unit_tests/vectorstores/test_hanavector.py`
Integration tests:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`
Example notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
Access credentials for execution of the integration tests can be
provided to the maintainers.
---------
Co-authored-by: sascha <sascha.stoll@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Implement similarity function selector for ElasticsearchStore. The
scores coming back from Elasticsearch are already similarities (not
distances) and they are already normalized (see
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)).
Hence we leave the scores untouched and just forward them.
This fixes#11539.
However, in hybrid mode (when keyword search and vector search are
involved) Elasticsearch currently returns no scores. This PR adds an
error message around this fact. We need to think a bit more to come up
with a solution for this case.
This PR also corrects a small error in the Elasticsearch integration
test.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Todo
- [x] copy over integration tests
- [x] update docs with new instructions in #15513
- [x] add linear ticket to bump core -> community, community->langchain,
and core->openai deps
- [ ] (optional): add `pip install langchain-openai` command to each
notebook using it
- [x] Update docstrings to not need `openai` install
- [x] Add serialization
- [x] deprecate old models
Contributor steps:
- [x] Add secret names to manual integrations workflow in
.github/workflows/_integration_test.yml
- [x] Add secrets to release workflow (for pre-release testing) in
.github/workflows/_release.yml
Maintainer steps (Contributors should not do these):
- [x] set up pypi and test pypi projects
- [x] add credential secrets to Github Actions
- [ ] add package to conda-forge
Functional changes to existing classes:
- now relies on openai client v1 (1.6.1) via concrete dep in
langchain-openai package
Codebase organization
- some function calling stuff moved to
`langchain_core.utils.function_calling` in order to be used in both
community and langchain-openai
- **Description:**
- This PR introduces a significant enhancement to the LangChain project
by integrating a new chat model powered by the third-generation base
large model, ChatGLM3, via the zhipuai API.
- This advanced model supports functionalities like function calls, code
interpretation, and intelligent Agent capabilities.
- The additions include the chat model itself, comprehensive
documentation in the form of Python notebook docs, and thorough testing
with both unit and integrated tests.
- **Dependencies:** This update relies on the ZhipuAI package as a key
dependency.
- **Twitter handle:** If this PR receives spotlight attention, we would
be honored to receive a mention for our integration of the advanced
ChatGLM3 model via the ZhipuAI API. Kindly tag us at @kaiwu.
To ensure quality and standards, we have performed extensive linting and
testing. Commands such as make format, make lint, and make test have
been run from the root of the modified package to ensure compliance with
LangChain's coding standards.
TO DO: Continue refining and enhancing both the unit tests and
integrated tests.
---------
Co-authored-by: jing <jingguo92@gmail.com>
Co-authored-by: hyy1987 <779003812@qq.com>
Co-authored-by: jianchuanqi <qijianchuan@hotmail.com>
Co-authored-by: lirq <whuclarence@gmail.com>
Co-authored-by: whucalrence <81530213+whucalrence@users.noreply.github.com>
Co-authored-by: Jing Guo <48378126+JaneCrystall@users.noreply.github.com>
- **Description:** Going forward, we have a own API `pip install
gradientai`. Therefore gradually removing the self-build packages in
llamaindex, haystack and langchain.
- **Issue:** None.
- **Dependencies:** `pip install gradientai`
- **Tag maintainer:** @michaelfeil