Commit Graph

5642 Commits (3405dbbc6446ffccba4392ee6773ea6af496abda)
 

Author SHA1 Message Date
Erick Friis 3405dbbc64
dash not underscore (#12716)
template names are auto-populating with the wrong convention (with
underscores)
11 months ago
123-fake-st 8bd3ce59cd
PyPDFLoader use url in metadata source if file is a web path (#12092)
**Description:** Update `langchain.document_loaders.pdf.PyPDFLoader` to
store url in metadata (instead of a temporary file path) if user
provides a web path to a pdf

- **Issue:** Related to #7034; the reporter on that issue submitted a PR
updating `PyMuPDFParser` for this behavior, but it has unresolved merge
issues as of 20 Oct 2023 #7077
- In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes
in `langchain.document_loaders.pdf` exhibit similar behavior and could
benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute
to some/all of that, including assisting with `PyMuPDFParser`, if my
work is agreeable)
- The root cause is that the underlying pdf parser classes, e.g.
`langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive
information about the url; the parsers receive a
`langchain.document_loaders.blob_loaders.blob`, which contains the pdf
contents and local file path, but not the url
- This update passes the web path directly to the parser since it's
minimally invasive and doesn't require further changes to maintain
existing behavior for local files... bigger picture, I'd consider
extending `blob` so that extra information like this can be
communicated, but that has much bigger implications on the codebase
which I think warrants maintainer input

  - **Dependencies:** None

```python
# old behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0}

# new behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
11 months ago
Dave Kwon b1954aab13
feat: Add page metadata on PDFMinerLoader (#12277)
- **Description:** #12273 's suggestion PR
Like other PDFLoader, loading pdf per each page and giving page
metadata.
  - **Issue:** #12273 
  - **Twitter handle:** @blue0_0hope

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
11 months ago
Duda Nogueira 7148f3e1fe
Weaviate - Fix schema existence check (#12711)
This will allow you create the schema beforehand. The check was failing
and preventing importing into existing classes.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
11 months ago
Sayandip 8dbbcf0b6c
Adding a template for Solo Performance Prompting Agent (#12627)
**Description:** This template creates an agent that transforms a single
LLM into a cognitive synergist by engaging in multi-turn
self-collaboration with multiple personas.
**Tag maintainer:** @hwchase17

---------

Co-authored-by: Sayandip Sarkar <sayandip.sarkar@skypointcloud.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
11 months ago
Aidos Kanapyanov ae63c186af
Mask API key for Anyscale LLM (#12406)
Description: Add masking of API Key for Anyscale LLM when printed.
Issue: #12165 
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
11 months ago
Predrag Gruevski 5ae51a8a85
Fix typo highlighted by `ruff` autoformatter. (#12691)
H/t @MichaReiser for spotting it:
https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045
11 months ago
Predrag Gruevski 724b92231d
Remove `black` caching config from CI lint workflow. (#12594)
To merge after #12585 is merged.
11 months ago
Predrag Gruevski 0ea837404a
Only publish to test PyPI from the `_test_release.yml` workflow. (#12668)
PyPI trusted publishing wants to know which workflow is expected to do
the publish. We always want to publish from the same workflow, so we're
making `_test_release.yml` the only workflow that publishes to Test
PyPI.
11 months ago
Predrag Gruevski 321cd44f13
Use separate jobs for building and publishing test releases. (#12671)
This follows the principle of least privilege. Our `poetry build` step
doesn't need, and shouldn't get, access to our GitHub OIDC capability.

This is the same structure as I used in the already-merged PR for
refactoring the regular PyPI release workflow: #12578.
11 months ago
Erick Friis 44c8b159b9
properly increment version in cli (#12685)
Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.
11 months ago
Erick Friis b825dddf95
fix elastic rag template in playground (#12682)
- a few instructions in the readme (load_documents -> ingest.py)
- added docker run command for local elastic
- adds input type definition to render playground properly
11 months ago
Lance Martin f0eba1ac63
Add RAG input types (#12684)
Co-authored-by: Erick Friis <erick@langchain.dev>
11 months ago
Erick Friis 392cfbee24
link to templates (#12680) 11 months ago
Leonid Ganeline ddcec005bc
fix for `YahooFinanceNewsTool` (#12665)
Added YahooFinanceNewsTool to the __init__.py 
It was missed here.
11 months ago
Predrag Gruevski 09711ad5a1
Both lint and format `templates` with ruff v0.1.3. (#12676)
- Both lint and format code in `templates`.
- Upgrade to ruff v0.1.3.
11 months ago
Predrag Gruevski 01a3c9b94e
Use an in-project virtualenv in the CLI package. (#12678)
Keeping it in sync with how our other packages are configured.
11 months ago
Predrag Gruevski f7f35a9102
Use black to lint notebooks and docs for now. (#12679)
Due to #12677 having lots of errors for the time being.
11 months ago
Jacob Lee bd668fcea1
Adds version CLI command (#12619)
Will be automatically bumped with `poetry version patch`.

@efriis @hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
11 months ago
Frank bf5805bb32
Add quip loader (#12259)
- **Description:** implement [quip](https://quip.com) loader
  - **Issue:** https://github.com/langchain-ai/langchain/issues/10352
  - **Dependencies:** No
  -  pass make format, make lint, make test

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Roman Vasilyev c9a6940d58
PGVector fix (#12592)
latest release broken, this fixes it

---------

Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
11 months ago
Lance Martin 9e17d1a225
Update Vertex template (#12644)
Co-authored-by: Erick Friis <erick@langchain.dev>
11 months ago
Predrag Gruevski aa3f4a9bc8
Remove the CLI package's pydantic compatibility tests. (#12675)
They aren't necessary, since the CLI package doesn't have a direct
dependency on pydantic.
11 months ago
Predrag Gruevski e8b99364b3
Use `ruff` for both linting and formatting in `langchain-cli`. (#12672)
Prior to this PR, `ruff` was used only for linting and not for
formatting, despite the names of the commands. This PR makes it be used
for both linting code and autoformatting it.
11 months ago
Harrison Chase 9a10b2b047
fix plate chain (#12673) 11 months ago
Margaret Qian acfc485808
Update MosaicML Embedding Input Key (#12657)
This input key was missed in the last update PR:
https://github.com/langchain-ai/langchain/pull/7391

The input/output formats are intended to be like this:

```
{"inputs": [<prompt>]} 

{"outputs": [<output_text>]}
```
11 months ago
Erika Cardenas d26ac5f999
Update README for Hybrid Search Weaviate (#12661)
- **Description:** Updated the README for Hybrid Search Weaviate
11 months ago
Predrag Gruevski c871cc5055
Remove `print()` statements which seemed leftover from debugging. (#12648)
Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.
11 months ago
Erick Friis 2a7e0a27cb
update lc version (#12655)
also updated py version in `csv-agent` and `rag-codellama-fireworks`
because they have stricter python requirements
11 months ago
Predrag Gruevski 360cff81a3
Overwrite existing distributions when uploading to test PyPI. (#12658) 11 months ago
Lance Martin da94c750c5
Add RAG template for Timescale Vector (#12651)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Matvey Arye <mat@timescale.com>
11 months ago
Noam Gat 14e8c74736
LM Format Enforcer Integration + Sample Notebook (#12625)
## Description

This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.

![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)

The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.

The integration is loosely based on the jsonformer integration in terms
of project structure.

## Dependencies

No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.

## Tests

Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.

## Twitter Handle

[@noamgat](https://twitter.com/noamgat)


Looking forward to hearing feedback!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Stefano Lottini a4e4b5a86f
Relax python version and remove need for explicit setup step (#12637)
This PR addresses what seems like a unnecessary Python version
restriction in the pyroject.toml specs within both Cassandra (/Astra DB)
templates. With "^3.11" I got some version incompatibilities with the
latest "langchain add [...]" commands, so these are now relaxed in line
with the other templates I could inspect.

Incidentally, in the "entomology" template, the need for an explicit
"setup" step for the user to carry on has been removed, replaced by a
check-and-execute-if-necessary instruction on app startup.

Thank you for your attention!
11 months ago
Predrag Gruevski 5308b836c7
Upgrade to `actions/checkout@v4` in the docs lint job. (#12581) 11 months ago
Predrag Gruevski 94f018f1ba
Support release-testing packages with dashes in their names. (#12654) 11 months ago
Erick Friis 912ace18e9
fix template py verisons (#12650) 11 months ago
Brian McBrayer b74468f399
Fix small typo on Founcational -> Router notebook (#12634)
- **Description:** Fix small typo on Founcational -> Router notebook
11 months ago
Predrag Gruevski 72fa5a463d
Show ruff output inline in GitHub PRs. (#12647) 11 months ago
William FH 17c2e3b87e
Rename Template (#12649)
To chatbot feedback. Update import
11 months ago
Erick Friis 7f6e751a3d
template updates (#12646) 11 months ago
Leonid Kuligin a53cac4508
added template to use Vertex Vector Search for q&a (#12622)
added template to use Vertex Vector Search for q&a
11 months ago
Lance Martin 944cb552bb
Minor updates to READMEs (#12642) 11 months ago
William FH 88f0f1e73b
Conversational Feedback (#12590)
Context in the README.

Show how score chat responses based on a followup from the user and then
log that as feedback in LangSmith
11 months ago
Predrag Gruevski f94e24dfd7
Install and use `ruff format` instead of black for code formatting. (#12585)
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
11 months ago
William FH bfd719f9d8
bind_functions convenience method (#12518)
I always take 20-30 seconds to re-discover where the
`convert_to_openai_function` wrapper lives in our codebase. Chat
langchain [has no
clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r)
what to do either. There's the older `create_openai_fn_chain` , but we
haven't been recommending it in LCEL. The example we show in the
[cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions)
is really verbose.


General function calling should be as simple as possible to do, so this
seems a bit more ergonomic to me (feel free to disagree). Another option
would be to directly coerce directly in the class's init (or when
calling invoke), if provided. I'm not 100% set against that. That
approach may be too easy but not simple. This PR feels like a decent
compromise between simple and easy.

```
from enum import Enum
from typing import Optional

from pydantic import BaseModel, Field


class Category(str, Enum):
    """The category of the issue."""

    bug = "bug"
    nit = "nit"
    improvement = "improvement"
    other = "other"


class IssueClassification(BaseModel):
    """Classify an issue."""

    category: Category
    other_description: Optional[str] = Field(
        description="If classified as 'other', the suggested other category"
    )
    

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI().bind_functions([IssueClassification])
llm.invoke("This PR adds a convenience wrapper to the bind argument")

# AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n  "category": "improvement"\n}'}})
```
11 months ago
Nuno Campos 3143324984
Improve Runnable type inference for input_schemas (#12630)
- Prefer lambda type annotations over inferred dict schema
- For sequences that start with RunnableAssign infer seq input type as
"input type of 2nd item in sequence - output type of runnable assign"
11 months ago
Nuno Campos 2f563cee20
Add Runnable.with_listeners() (#12549)
- This binds start/end/error listeners to a runnable, which will be
called with the Run object
11 months ago
Bagatur bcc62d63be
bump 327 (#12623) 11 months ago
Erick Friis a1fae1fddd
Readme rewrite (#12615)
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
11 months ago
Ankur Singh 00766c9f31
Improves the description of the installation command (#12354)
- **Description:**

 Before: 
`
To install modules needed for the common LLM providers, run:
`

After:
`
To install modules needed for the common LLM providers, run the
following command. Please bear in mind that this command is exclusively
compatible with the `bash` shell:
`


> This is required for the user so that the user will know if this
command is compatible with `zsh` or not.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago