Commit Graph

7750 Commits

Author SHA1 Message Date
Bagatur
af74301ab9
core[patch], community[patch]: link extraction continue on failure (#17200) 2024-02-07 14:15:30 -08:00
Henry
2281f00198
langchain: Standardize output_parser.py across all agent types for custom FORMAT_INSTRUCTIONS (#17168)
- **Description:** 
This PR standardizes the `output_parser.py` file across all agent types
to ensure a uniform parsing mechanism is implemented. It introduces a
cohesive structure and common interface for output parsing, facilitating
easier modifications and extensions by users. The standardized approach
enhances maintainability and scalability of the codebase by providing a
consistent pattern for output parsing, which can be easily understood
and utilized across different agent types.

This PR builds upon the foundation set by a previously merged PR, which
focused exclusively on standardizing the `output_parser.py` for the
`conversational_agent` ([PR
#16945](https://github.com/langchain-ai/langchain/pull/16945)). With
this new update, I extend the standardization efforts to encompass
`output_parser.py` files across all agent types. This enhancement not
only unifies the parsing mechanism across the board but also introduces
the flexibility for users to incorporate custom `FORMAT_INSTRUCTIONS`.

  - **Issue:** 
https://github.com/langchain-ai/langchain/issues/10721
https://github.com/langchain-ai/langchain/issues/4044

  - **Dependencies:**
No new dependencies required for this change

  - **Twitter handle:**
With my github user is enough. Thanks

I hope you accept my PR.
2024-02-07 13:46:17 -08:00
Erick Friis
1cf5a5858f
remove pg_essay.txt (#17198)
Added in #16159
2024-02-07 12:58:01 -08:00
Tomaz Bratanic
ecf8042a10
templates: Add neo4j semantic layer with ollama template (#17192)
A template with JSON-based agent using Mixtral via Ollama.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-07 12:50:54 -08:00
Erick Friis
f87acf0340
infra: better conditional (#17197) 2024-02-07 12:49:02 -08:00
Erick Friis
4ae91733aa
infra: fix core release (#17195)
core doesn't have any min deps to test
2024-02-07 12:35:27 -08:00
Bagatur
78409634fe
core[patch]: Release 0.1.20 (#17194) 2024-02-07 12:28:05 -08:00
Nuno Campos
65798289a4
core[minor]: Use batched tracing in sdk (#16305)
Remove threadpool executor usage in langchain tracer, this is now
handled by sdk
2024-02-07 12:10:58 -08:00
chyroc
f87b38a559
google-genai[minor]: support functions call (#15146)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-07 12:09:30 -08:00
Tomaz Bratanic
302989a2b1
allow optional newline in the action responses of JSON Agent parser (#17186)
Based on my experiments, the newline isn't always there, so we can make
the regex slightly more robust by allowing an optional newline after the
bacticks
2024-02-07 10:26:14 -08:00
William FH
9fa07076da
Add trace_as_chain_group metadata (#17187) 2024-02-07 09:42:44 -08:00
Leonid Ganeline
5ceaf784f3
docs Integraions/Components menu reordered (#17151)
This PR is opinionated.
- Moved `Embedding models` item to place after `LLMs` and `Chat model`,
so all items with models are together.
- Renamed `Text embedding models` to `Embedding models`. Now, it is
shorter and easier to read. `Text` is obvious from context. The same as
the `Text LLMs` vs. `LLMs` (we also have multi-modal LLMs).
2024-02-06 20:33:41 -08:00
Leonid Ganeline
0af0fc5d25
docs integraions/providers nav fix (#17148)
Issue: `Provides` page is presented as the index page (on the
`Providers` item) and as the `Providers/Providers` item. The latter
should not be in the menu. See the picture.

![image](https://github.com/langchain-ai/langchain/assets/2256422/6894023f-f13a-4f0d-8fe2-ed5b0ae2bdd2)
This PR fixes this.
2024-02-06 20:33:14 -08:00
Leonid Ganeline
bf55279d39
docs: tutorials update (#17132)
Added the course and the one-pager links
2024-02-06 20:30:30 -08:00
Erick Friis
f499a222de
infra: release min version debugging 2 (#17152) 2024-02-06 18:20:19 -08:00
Erick Friis
deb02de051
infra: release min version debugging (#17150) 2024-02-06 18:10:37 -08:00
Erick Friis
9710346095
infra: poetry run min versions 2 (#17149) 2024-02-06 17:57:43 -08:00
Erick Friis
181a033226
infra: poetry run min versions (#17146)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-06 17:37:36 -08:00
Erick Friis
d397721a34
docs: format (#17143) 2024-02-06 16:32:53 -08:00
Erick Friis
2187268208
infra: fix release (#17142) 2024-02-06 16:22:20 -08:00
Erick Friis
3e58df43c2
mistralai[patch]: release 0.0.4 (#17139) 2024-02-06 16:05:20 -08:00
Erick Friis
22b6a03a28
infra: read min versions (#17135) 2024-02-06 16:05:11 -08:00
Erick Friis
f881a3330c
mistralai[patch]: 16k token batching logic embed (#17136) 2024-02-06 15:59:08 -08:00
Arno Schutijzer
863f96b2e0
docs: fix typo in ollama notebook (#17127)
- **Description:** typo fix in ollama notebook
2024-02-06 16:54:40 -05:00
Leonid Ganeline
42c812a549
API References sorted Partner libs menu (#17130)
The `Partner libs` menu is not sorted. Now it is long enough, and items
should be sorted to simplify a package search.
- Sorted items in the `Partner libs` menu
2024-02-06 16:49:23 -05:00
Bagatur
226f376d59
community[patch]: Release 0.0.18 (#17129)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-06 13:40:00 -08:00
Erick Friis
37062549f9
infra: update to cache v4 (#17126)
stop using nodejs 16. Use 20 (stop deprecation annotation on all ci)

Changelog: https://github.com/actions/cache?tab=readme-ov-file#whats-new
2024-02-06 12:55:01 -08:00
Erick Friis
980e30c361
nvidia-ai-endpoints[patch]: release 0.0.2 (#17125) 2024-02-06 12:48:25 -08:00
Erick Friis
15bd1154a7
pinecone[patch]: integration test new namespace (#17121) 2024-02-06 11:56:00 -08:00
Erick Friis
3ccffa5dcc
infra: add integration deps to partner lint (#17122) 2024-02-06 11:51:04 -08:00
Mikhail Khludnev
14ff1438e6
nvidia-trt[patch]: propagate InferenceClientException to the caller. (#16936)
- **Description:**  
 
before the change I've got

1. propagate InferenceClientException to the caller.
2. stop grpc receiver thread on exception 

```
        for token in result_queue:
>           result_str += token
E           TypeError: can only concatenate str (not "InferenceServerException") to str

../../langchain_nvidia_trt/llms.py:207: TypeError
```
And stream thread keeps running. 

after the change request thread stops correctly and caller got a root
cause exception:

```
E                   tritonclient.utils.InferenceServerException: [request id: 4529729] expected number of inputs between 2 and 3 but got 10 inputs for model 'vllm_model'

../../langchain_nvidia_trt/llms.py:205: InferenceServerException
```

  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
  - **Twitter handle:** [t.me/mkhl_spb](https://t.me/mkhl_spb)
 
I'm not sure about test coverage. Should I setup deep mocks or there's a
kind of triton stub via testcontainers or so.
2024-02-06 11:47:07 -08:00
Erick Friis
6af912d7e0
infra: add pinecone secret (#17120) 2024-02-06 11:27:04 -08:00
Junyoung Park
1ed73f1992
community[minor]: Add SelfQueryRetriever support to PGVector (#16991)
- **Description:** Add SelfQueryRetriever support to PGVector
  - **Issue:** -
  - **Dependencies:** -
  - **Twitter handle:** -

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 10:50:50 -08:00
Bagatur
cd945e3a5b
core[patch]: Release 0.1.19 (#17117) 2024-02-06 09:54:22 -08:00
Frank
ef082c77b1
community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 09:42:33 -08:00
老阿張
ac662b3698
docs: Fix typo in amadeus.ipynb (#16916)
Description: "enviornment should be  environment"? 🤔
Issue: Typo
Dependencies: Nope
Twitter handle: laoazhang
2024-02-06 09:42:05 -08:00
Henry
eaeb8a5f71
langchain[patch]: output_parser.py in conversation_chat is customizable (#16945)
**Description:**
With this modification, users can customize the `FORMAT_INSTRUCTIONS`
template, allowing them to create their own prompts

As it is happening in
[this](https://github.com/langchain-ai/langchain/issues/10721) issue,
the `FORMAT_INSTRUCTIONS` is not customizable for the output parser,
unless you create your own class `ConvoOutputParser`. To avoid this, a
modification was done, creating a `format_instruction` variable that
users can customize with ease after initialize the agent.

For example:
```
agent = initialize_agent(
    agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    tools = tools,
    llm = llm_agent,
    verbose = True,
    max_iterations = 3,
    early_stopping_method = 'generate',
    memory = b_w_memory,
    handle_parsing_errors = True,
    agent_kwargs={
        'system_message':PREFIX,
        'human_message':SUFFIX,
        'template_tool_response':TEMPLATE_TOOL_RESPONSE,
        }
)
agent.agent.output_parser.format_instructions = "MY CUSTOM FORMAT INSTRUCTIONS"
print(agent.agent.output_parser.get_format_instructions())
MY CUSTOM FORMAT INSTRUCTIONS
```

Other parameters like `system_message`, `human_message`, or
`template_tool_response` are already customizable and with this PR, the
last parameter `FORMAT_INSTRUCTIONS` in
`langchain.agents.conversational_chat.prompt` can be modified.


**Issue:**
https://github.com/langchain-ai/langchain/issues/10721

**Dependencies:**
No new dependencies required for this change

**Twitter handle:**
With my github user is enough. Thanks

I hope you accept my PR.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 09:41:53 -08:00
Ryan Kraus
f027696b5f
community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-05 19:50:50 -08:00
Jan de Boer
2d8015554c
docs: Link to Brave Website added (#16958)
**Description:** Link to the Brave Website added to the
`brave-search.ipynb` notebook.
This notebook is shown in the docs as an example for the brave tool.

**Issue:** There was to reference on where / how to get an api key
 
**Dependencies:** none
 
**Twitter handle:** not for this one :)
2024-02-05 18:29:16 -08:00
os1ma
fd88e0f800
docs: update StreamlitCallbackHandler example (#16970)
- **Description:** docs: update StreamlitCallbackHandler example.
  - **Issue:** None
  - **Dependencies:** None

I have updated the example for StreamlitCallbackHandler in the
documentation bellow.
https://python.langchain.com/docs/integrations/callbacks/streamlit

Previously, the example used `initialize_agent`, which has been
deprecated, so I've updated it to use `create_react_agent` instead. Many
langchain users are likely searching examples of combining
`create_react_agent` or `openai_tools_agent_chain` with
StreamlitCallbackHandler. I'm sure this update will be really helpful
for them!

Unfortunately, writing unit tests for this example is difficult, so I
have not written any tests. I have run this code in a standalone Python
script file and ensured it runs correctly.
2024-02-05 18:20:59 -08:00
Marc Mahe
f08a9139d2
docs: update mistral docs for version 0.1+ (#17011)
**Description:**
Updated integration page for mistralai.
2024-02-05 18:03:12 -08:00
François Paupier
929f071513
community[patch]: Fix error in LlamaCpp community LLM with Configurable Fields, 'grammar' custom type not available (#16995)
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
  - **Issue:** #16994 
  - **Dependencies:** None
  - **Twitter handle:** @fpaupier

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 17:56:58 -08:00
Leonid Ganeline
563f325034
experimental[patch]: fixed import in experimental (#17078) 2024-02-05 17:47:13 -08:00
Ikko Eltociear Ashimine
5f5f5acbc5
docs: fix typo in dspy.ipynb (#16996)
langugage -> language
2024-02-05 17:31:06 -08:00
Eugene Yurtsev
fbab8baac5
core[patch]: Add astream events config test (#17055)
Verify that astream events propagates config correctly

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 17:24:58 -08:00
Eugene Yurtsev
609ea019b2
docs: Update streaming documentation (#17066)
Updating streaming documentation following fix of JSON parser for
streaming json.
2024-02-05 17:24:46 -08:00
Erick Friis
64785822dc
templates: bump (#17074) 2024-02-05 17:12:12 -08:00
Scott Nath
10bd901139
infra: add integration_tests and coverage to MAKEFILE (#17053)
- **Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`

- **Issue:** the issue # it fixes if applicable,
    - moving out of https://github.com/langchain-ai/langchain/pull/17014
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 16:39:55 -08:00
Giulio Zani
9f0b63dba0
experimental[patch]: Fixes issue #17060 (#17062)
As described in issue #17060, in the case in which text has only one
sentence the following function fails. Checking for that and adding a
return case fixed the issue.

```python
    def split_text(self, text: str) -> List[str]:
        """Split text into multiple components."""
        # Splitting the essay on '.', '?', and '!'
        single_sentences_list = re.split(r"(?<=[.?!])\s+", text)
        sentences = [
            {"sentence": x, "index": i} for i, x in enumerate(single_sentences_list)
        ]
        sentences = combine_sentences(sentences)
        embeddings = self.embeddings.embed_documents(
            [x["combined_sentence"] for x in sentences]
        )
        for i, sentence in enumerate(sentences):
            sentence["combined_sentence_embedding"] = embeddings[i]
        distances, sentences = calculate_cosine_distances(sentences)
        start_index = 0

        # Create a list to hold the grouped sentences
        chunks = []
        breakpoint_percentile_threshold = 95
        breakpoint_distance_threshold = np.percentile(
            distances, breakpoint_percentile_threshold
        )  # If you want more chunks, lower the percentile cutoff

        indices_above_thresh = [
            i for i, x in enumerate(distances) if x > breakpoint_distance_threshold
        ]  # The indices of those breakpoints on your list

        # Iterate through the breakpoints to slice the sentences
        for index in indices_above_thresh:
            # The end index is the current breakpoint
            end_index = index

            # Slice the sentence_dicts from the current start index to the end index
            group = sentences[start_index : end_index + 1]
            combined_text = " ".join([d["sentence"] for d in group])
            chunks.append(combined_text)

            # Update the start index for the next group
            start_index = index + 1

        # The last group, if any sentences remain
        if start_index < len(sentences):
            combined_text = " ".join([d["sentence"] for d in sentences[start_index:]])
            chunks.append(combined_text)
        return chunks
```

Co-authored-by: Giulio Zani <salamanderxing@Giulios-MBP.homenet.telecomitalia.it>
2024-02-05 16:18:57 -08:00
Jimmy Moore
912210ac19
core[patch]: fix _sql_record_manager mypy for #17048 (#17073)
- **Description:** Add relevant type annotations for relevant session
and query objects to resolve mypy errors when `# type: ignore` comments
are removed.
  - **Issue:** #17048
  - **Dependencies:** None,
  - **Twitter handle:** [clesiemo3](https://twitter.com/clesiemo3)
 
I attempted to solve the `UpsertionRecord` ignore but it would require
added a deprecated plugin or moving completely to sqlalchemy 2.0+ from
my understanding. I'm assuming this is not something desired at this
point in time.
2024-02-05 16:18:40 -08:00