Commit Graph

5697 Commits

Author SHA1 Message Date
Eugene Yurtsev
0e1aedb9f4
Use jinja2 sandboxing by default (#12733)
* This is an opt-in feature, so users should be aware of risks if using
jinja2.
* Regardless we'll add sandboxing by default to jinja2 templates -- this
  sandboxing is a best effort basis.
* Best strategy is still to make sure that jinja2 templates are only
loaded from trusted sources.
2023-11-01 14:54:01 -07:00
Erick Friis
ab5309f6f2
template updates (#12736)
- langchain license
- add timescale vector dep to that template
2023-11-01 13:53:26 -07:00
Lance Martin
6406c53089
Update template index w/ Timescale (#12729) 2023-11-01 12:04:54 -07:00
Erick Friis
14340ee7cd
use http.client instead of urllib3 (#12660)
dep problems with requests

cloudflare debugging not worth it with urllib
2023-11-01 11:15:05 -07:00
Bagatur
eee5181b7a
bump 328, exp 37 (#12722) 2023-11-01 10:27:39 -07:00
Erick Friis
3405dbbc64
dash not underscore (#12716)
template names are auto-populating with the wrong convention (with
underscores)
2023-11-01 09:48:37 -07:00
123-fake-st
8bd3ce59cd
PyPDFLoader use url in metadata source if file is a web path (#12092)
**Description:** Update `langchain.document_loaders.pdf.PyPDFLoader` to
store url in metadata (instead of a temporary file path) if user
provides a web path to a pdf

- **Issue:** Related to #7034; the reporter on that issue submitted a PR
updating `PyMuPDFParser` for this behavior, but it has unresolved merge
issues as of 20 Oct 2023 #7077
- In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes
in `langchain.document_loaders.pdf` exhibit similar behavior and could
benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute
to some/all of that, including assisting with `PyMuPDFParser`, if my
work is agreeable)
- The root cause is that the underlying pdf parser classes, e.g.
`langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive
information about the url; the parsers receive a
`langchain.document_loaders.blob_loaders.blob`, which contains the pdf
contents and local file path, but not the url
- This update passes the web path directly to the parser since it's
minimally invasive and doesn't require further changes to maintain
existing behavior for local files... bigger picture, I'd consider
extending `blob` so that extra information like this can be
communicated, but that has much bigger implications on the codebase
which I think warrants maintainer input

  - **Dependencies:** None

```python
# old behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0}

# new behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
2023-11-01 11:27:00 -04:00
Dave Kwon
b1954aab13
feat: Add page metadata on PDFMinerLoader (#12277)
- **Description:** #12273 's suggestion PR
Like other PDFLoader, loading pdf per each page and giving page
metadata.
  - **Issue:** #12273 
  - **Twitter handle:** @blue0_0hope

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-01 11:25:37 -04:00
Duda Nogueira
7148f3e1fe
Weaviate - Fix schema existence check (#12711)
This will allow you create the schema beforehand. The check was failing
and preventing importing into existing classes.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-01 08:22:15 -07:00
Sayandip
8dbbcf0b6c
Adding a template for Solo Performance Prompting Agent (#12627)
**Description:** This template creates an agent that transforms a single
LLM into a cognitive synergist by engaging in multi-turn
self-collaboration with multiple personas.
**Tag maintainer:** @hwchase17

---------

Co-authored-by: Sayandip Sarkar <sayandip.sarkar@skypointcloud.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-01 08:10:07 -07:00
Aidos Kanapyanov
ae63c186af
Mask API key for Anyscale LLM (#12406)
Description: Add masking of API Key for Anyscale LLM when printed.
Issue: #12165 
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-01 10:22:26 -04:00
Predrag Gruevski
5ae51a8a85
Fix typo highlighted by ruff autoformatter. (#12691)
H/t @MichaReiser for spotting it:
https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045
2023-10-31 22:16:06 -04:00
Predrag Gruevski
724b92231d
Remove black caching config from CI lint workflow. (#12594)
To merge after #12585 is merged.
2023-10-31 21:39:05 -04:00
Predrag Gruevski
0ea837404a
Only publish to test PyPI from the _test_release.yml workflow. (#12668)
PyPI trusted publishing wants to know which workflow is expected to do
the publish. We always want to publish from the same workflow, so we're
making `_test_release.yml` the only workflow that publishes to Test
PyPI.
2023-10-31 21:36:38 -04:00
Predrag Gruevski
321cd44f13
Use separate jobs for building and publishing test releases. (#12671)
This follows the principle of least privilege. Our `poetry build` step
doesn't need, and shouldn't get, access to our GitHub OIDC capability.

This is the same structure as I used in the already-merged PR for
refactoring the regular PyPI release workflow: #12578.
2023-10-31 21:36:26 -04:00
Erick Friis
44c8b159b9
properly increment version in cli (#12685)
Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.
2023-10-31 17:27:43 -07:00
Erick Friis
b825dddf95
fix elastic rag template in playground (#12682)
- a few instructions in the readme (load_documents -> ingest.py)
- added docker run command for local elastic
- adds input type definition to render playground properly
2023-10-31 17:18:35 -07:00
Lance Martin
f0eba1ac63
Add RAG input types (#12684)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 17:13:44 -07:00
Erick Friis
392cfbee24
link to templates (#12680) 2023-10-31 16:19:22 -07:00
Leonid Ganeline
ddcec005bc
fix for YahooFinanceNewsTool (#12665)
Added YahooFinanceNewsTool to the __init__.py 
It was missed here.
2023-10-31 14:58:09 -07:00
Predrag Gruevski
09711ad5a1
Both lint and format templates with ruff v0.1.3. (#12676)
- Both lint and format code in `templates`.
- Upgrade to ruff v0.1.3.
2023-10-31 14:52:00 -07:00
Predrag Gruevski
01a3c9b94e
Use an in-project virtualenv in the CLI package. (#12678)
Keeping it in sync with how our other packages are configured.
2023-10-31 14:51:24 -07:00
Predrag Gruevski
f7f35a9102
Use black to lint notebooks and docs for now. (#12679)
Due to #12677 having lots of errors for the time being.
2023-10-31 14:51:05 -07:00
Jacob Lee
bd668fcea1
Adds version CLI command (#12619)
Will be automatically bumped with `poetry version patch`.

@efriis @hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 14:50:04 -07:00
Frank
bf5805bb32
Add quip loader (#12259)
- **Description:** implement [quip](https://quip.com) loader
  - **Issue:** https://github.com/langchain-ai/langchain/issues/10352
  - **Dependencies:** No
  -  pass make format, make lint, make test

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 14:11:24 -07:00
Roman Vasilyev
c9a6940d58
PGVector fix (#12592)
latest release broken, this fixes it

---------

Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-31 17:01:15 -04:00
Lance Martin
9e17d1a225
Update Vertex template (#12644)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 14:00:22 -07:00
Predrag Gruevski
aa3f4a9bc8
Remove the CLI package's pydantic compatibility tests. (#12675)
They aren't necessary, since the CLI package doesn't have a direct
dependency on pydantic.
2023-10-31 16:57:38 -04:00
Predrag Gruevski
e8b99364b3
Use ruff for both linting and formatting in langchain-cli. (#12672)
Prior to this PR, `ruff` was used only for linting and not for
formatting, despite the names of the commands. This PR makes it be used
for both linting code and autoformatting it.
2023-10-31 13:52:25 -07:00
Harrison Chase
9a10b2b047
fix plate chain (#12673) 2023-10-31 13:45:09 -07:00
Margaret Qian
acfc485808
Update MosaicML Embedding Input Key (#12657)
This input key was missed in the last update PR:
https://github.com/langchain-ai/langchain/pull/7391

The input/output formats are intended to be like this:

```
{"inputs": [<prompt>]} 

{"outputs": [<output_text>]}
```
2023-10-31 14:43:30 -04:00
Erika Cardenas
d26ac5f999
Update README for Hybrid Search Weaviate (#12661)
- **Description:** Updated the README for Hybrid Search Weaviate
2023-10-31 11:02:34 -07:00
Predrag Gruevski
c871cc5055
Remove print() statements which seemed leftover from debugging. (#12648)
Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.
2023-10-31 13:45:48 -04:00
Erick Friis
2a7e0a27cb
update lc version (#12655)
also updated py version in `csv-agent` and `rag-codellama-fireworks`
because they have stricter python requirements
2023-10-31 10:19:15 -07:00
Predrag Gruevski
360cff81a3
Overwrite existing distributions when uploading to test PyPI. (#12658) 2023-10-31 10:02:50 -07:00
Lance Martin
da94c750c5
Add RAG template for Timescale Vector (#12651)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Matvey Arye <mat@timescale.com>
2023-10-31 09:56:29 -07:00
Noam Gat
14e8c74736
LM Format Enforcer Integration + Sample Notebook (#12625)
## Description

This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.

![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)

The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.

The integration is loosely based on the jsonformer integration in terms
of project structure.

## Dependencies

No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.

## Tests

Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.

## Twitter Handle

[@noamgat](https://twitter.com/noamgat)


Looking forward to hearing feedback!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 09:49:01 -07:00
Stefano Lottini
a4e4b5a86f
Relax python version and remove need for explicit setup step (#12637)
This PR addresses what seems like a unnecessary Python version
restriction in the pyroject.toml specs within both Cassandra (/Astra DB)
templates. With "^3.11" I got some version incompatibilities with the
latest "langchain add [...]" commands, so these are now relaxed in line
with the other templates I could inspect.

Incidentally, in the "entomology" template, the need for an explicit
"setup" step for the user to carry on has been removed, replaced by a
check-and-execute-if-necessary instruction on app startup.

Thank you for your attention!
2023-10-31 09:42:27 -07:00
Predrag Gruevski
5308b836c7
Upgrade to actions/checkout@v4 in the docs lint job. (#12581) 2023-10-31 12:41:18 -04:00
Predrag Gruevski
94f018f1ba
Support release-testing packages with dashes in their names. (#12654) 2023-10-31 12:40:34 -04:00
Erick Friis
912ace18e9
fix template py verisons (#12650) 2023-10-31 09:20:29 -07:00
Brian McBrayer
b74468f399
Fix small typo on Founcational -> Router notebook (#12634)
- **Description:** Fix small typo on Founcational -> Router notebook
2023-10-31 09:16:29 -07:00
Predrag Gruevski
72fa5a463d
Show ruff output inline in GitHub PRs. (#12647) 2023-10-31 12:16:01 -04:00
William FH
17c2e3b87e
Rename Template (#12649)
To chatbot feedback. Update import
2023-10-31 09:15:30 -07:00
Erick Friis
7f6e751a3d
template updates (#12646) 2023-10-31 09:13:58 -07:00
Leonid Kuligin
a53cac4508
added template to use Vertex Vector Search for q&a (#12622)
added template to use Vertex Vector Search for q&a
2023-10-31 08:49:24 -07:00
Lance Martin
944cb552bb
Minor updates to READMEs (#12642) 2023-10-31 08:34:46 -07:00
William FH
88f0f1e73b
Conversational Feedback (#12590)
Context in the README.

Show how score chat responses based on a followup from the user and then
log that as feedback in LangSmith
2023-10-31 08:34:17 -07:00
Predrag Gruevski
f94e24dfd7
Install and use ruff format instead of black for code formatting. (#12585)
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
2023-10-31 10:53:12 -04:00
William FH
bfd719f9d8
bind_functions convenience method (#12518)
I always take 20-30 seconds to re-discover where the
`convert_to_openai_function` wrapper lives in our codebase. Chat
langchain [has no
clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r)
what to do either. There's the older `create_openai_fn_chain` , but we
haven't been recommending it in LCEL. The example we show in the
[cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions)
is really verbose.


General function calling should be as simple as possible to do, so this
seems a bit more ergonomic to me (feel free to disagree). Another option
would be to directly coerce directly in the class's init (or when
calling invoke), if provided. I'm not 100% set against that. That
approach may be too easy but not simple. This PR feels like a decent
compromise between simple and easy.

```
from enum import Enum
from typing import Optional

from pydantic import BaseModel, Field


class Category(str, Enum):
    """The category of the issue."""

    bug = "bug"
    nit = "nit"
    improvement = "improvement"
    other = "other"


class IssueClassification(BaseModel):
    """Classify an issue."""

    category: Category
    other_description: Optional[str] = Field(
        description="If classified as 'other', the suggested other category"
    )
    

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI().bind_functions([IssueClassification])
llm.invoke("This PR adds a convenience wrapper to the bind argument")

# AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n  "category": "improvement"\n}'}})
```
2023-10-31 07:15:37 -07:00