Commit Graph

1921 Commits (textloader_autodetect_encodings)
 

Author SHA1 Message Date
blob42 a880679052 verbose catch of open() errors on TextLoader (#4481) 1 year ago
blob42 3805aef263 timeout on file encoding detection (#4481) 1 year ago
blob42 5837d18cbe move encoding detection to own file (#4481)
- encoding detection returns meaningful encodings schema
1 year ago
blob42 672fec896f add docstring and move encoding detection to own func (#4481) 1 year ago
blob42 21ec50009b move tests to unit tests (#4481) 1 year ago
blob42 b5d3e99f9b TextLoader autodetect encodings + better exception handling (#4481) 1 year ago
Harrison Chase 243886be93
Harrison/virtual time (#4658)
Co-authored-by: ifsheldon <39153080+ifsheldon@users.noreply.github.com>
Co-authored-by: maple.liang <maple.liang@gempoll.com>
1 year ago
Harrison Chase f2f2aced6d
allow partials in from_template (#4638) 1 year ago
Harrison Chase fbfa49f2c1
agent serialization (#4642) 1 year ago
Harrison Chase ef49c659f6
add embedding router (#4644) 1 year ago
Harrison Chase 5020094e3b
Harrison/azure content filter (#4645)
Co-authored-by: Rob Kopel <R0bk@users.noreply.github.com>
1 year ago
Harrison Chase f5e2f70115
Harrison/json new line (#4646)
Co-authored-by: David Chen <davidchen@gliacloud.com>
1 year ago
Harrison Chase 87d8d221fb
Harrison/headers for openai (#4648)
Co-authored-by: aakash.shah <aakash.shah@quintiles.com>
1 year ago
Harrison Chase c09bb00959
Harrison/summary memory history (#4649)
Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>
1 year ago
Harrison Chase 44ae673388
Harrison/multithreading directory loader (#4650)
Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Harrison Chase b0c733e327
list of messages (#4651) 1 year ago
Harrison Chase 873b0c7eb6
Harrison/structured chat mem (#4652)
Co-authored-by: d 3 n 7 <29033313+d3n7@users.noreply.github.com>
1 year ago
Harrison Chase 9ba3a798c4
Harrison/from keys redis (#4653)
Co-authored-by: Christoph Kahl <christoph@zauberware.com>
1 year ago
Harrison Chase e781ff9256
Harrison/chatopenaibase path (#4656)
Co-authored-by: Dave <dave@gray101.com>
1 year ago
Harrison Chase 279605b4d3
Harrison/metaphor search (#4657)
Co-authored-by: Jeffrey Wang <jeffreyzhiyuanwang@gmail.com>
1 year ago
Harrison Chase 9aa9fe7021
Harrison/spark connect example (#4659)
Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>
1 year ago
Prerit Das 2747ccbcf1
Allow custom base Zapier prompt (#4213)
Currently, all Zapier tools are built using the pre-written base Zapier
prompt. These small changes (that retain default behavior) will allow a
user to create a Zapier tool using the ZapierNLARunTool while providing
their own base prompt.

Their prompt must contain input fields for zapier_description and
params, checked and enforced in the tool's root validator.

An example of when this may be useful: user has several, say 10, Zapier
tools enabled. Currently, the long generic default Zapier base prompt is
attached to every single tool, using an extreme number of tokens for no
real added benefit (repeated). User prompts LLM on how to use Zapier
tools once, then overrides the base prompt.

Or: user has a few specific Zapier tools and wants to maximize their
success rate. So, user writes prompts/descriptions for those tools
specific to their use case, and provides those to the ZapierNLARunTool.

A consideration - this is the simplest way to implement this I could
think of... though ideally custom prompting would be possible at the
Toolkit level as well. For now, this should be sufficient in solving the
concerns outlined above.
1 year ago
Paresh Mathur e2bc836571
Fix #4087 by setting the correct csv dialect (#4103)
The error in #4087 was happening because of the use of csv.Dialect.*
which is just an empty base class. we need to make a choice on what is
our base dialect. I usually use excel so I put it as excel, if
maintainers have other preferences do let me know.

Open Questions:
1. What should be the default dialect?
2. Should we rework all tests to mock the open function rather than the
csv.DictReader?
3. Should we make a separate input for `dialect` like we have for
`encoding`?

---------

Co-authored-by: = <=>
1 year ago
Leonid Ganeline 3ce78ef6c4
docs: document_loaders classification (#4069)
**Problem statement:** the
[document_loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html#)
section is too long and hard to comprehend.
**Proposal:** group document_loaders by 3 classes: (see `Files changed`
tab)

UPDATE: I've completely reworked the document_loader classification.
Now this PR changes only one file! 

FYI @eyurtsev @hwchase17
1 year ago
Zander Chase 928cdd57a4
[Breaking] Refactor Base Tracer(#4549)
### Refactor the BaseTracer
- Remove the 'session' abstraction from the BaseTracer
- Rename 'RunV2' object(s) to be called 'Run' objects (Rename previous
Run objects to be RunV1 objects)
- Ditto for sessions: TracerSession*V2 -> TracerSession*
- Remove now deprecated conversion from v1 run objects to v2 run objects
in LangChainTracerV2
- Add conversion from v2 run objects to v1 run objects in V1 tracer
1 year ago
Harrison Chase 1e322ffc1c change heading 1 year ago
Harrison Chase 86c1f090fd
bump version to 168 (#4632) 1 year ago
Davis Chase 9ab7101182
WIP: FLARE-inspired chain (#4612)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase daa3e6dedb
Harrison/prompt constructor methods (#4616) 1 year ago
Harrison Chase 6265cbfb11
Harrison/standard llm interface (#4615) 1 year ago
Harrison Chase 485ecc3580
option for csv agent to not include df in prompt (#4610) 1 year ago
Harrison Chase 7d425cbf38
improve sql prompt (#4611)
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
1 year ago
Hans van Dam 01531cb16d
remove quotes from sql database prompts (caused syntax error) (#4101)
fixes a syntax error mentioned in
#2027 and #3305
another PR to remedy is in #3385, but I believe that is not tacking the
core problem.
Also #2027 mentions a solution that works:
add to the prompt:
'The SQL query should be outputted plainly, do not surround it in quotes
or anything else.'

To me it seems strange to first ask for:

SQLQuery: "SQL Query to run"

and then to tell the LLM not to put the quotes around it. Other
templates (than the sql one) do not use quotes in their steps.
This PR changes that to:

SQLQuery: SQL Query to run
1 year ago
Zander Chase 0c6ed657ef
Convert Chain to a Chain Factory (#4605)
## Change Chain argument in client to accept a chain factory

The `run_over_dataset` functionality seeks to treat each iteration of an
example as an independent trial.
Chains have memory, so it's easier to permit this type of behavior if we
accept a factory method rather than the chain object directly.

There's still corner cases / UX pains people will likely run into, like:
- Caching may cause issues
- if memory is persisted to a shared object (e.g., same redis queue) ,
this could impact what is retrieved
- If we're running the async methods with concurrency using local
models, if someone naively instantiates the chain and loads each time,
it could lead to tons of disk I/O or OOM
1 year ago
Tim Asp ed0d557ede
docs: fix pdf docs hierarchy and formatting (#4593)
# Fix pdf loader docs page


![image](https://github.com/hwchase17/langchain/assets/707699/4a11f379-00ed-4f7a-9870-71f74e0cadc6)

Using h1's messes with hierarchy, this fixes that, and moves the
PyPDFium2 loader out of the middle of PDFMiner docs
1 year ago
Davis Chase 36f9e9a0ba
Skip flaky unit test (#4591) 1 year ago
Eugene Yurtsev 08ed927c32
Turn on extended tests (#4588)
# Turn on strict extended tests

This PR turns on strict testing for extended tests.
1 year ago
Zander Chase d96f6a106b
Add Steamship Image Generation Tool (#4580)
Co-authored-by: Enias Cailliau <enias@steamship.com>
1 year ago
Davis Chase 739c297c94
Release 167 (#4589) 1 year ago
Davis Chase a4a9d1f403
Improve vespa interface (#4546)
![Screenshot 2023-05-11 at 7 50 31
PM](https://github.com/hwchase17/langchain/assets/130488702/bc8ab4bb-8006-44fc-ba07-df54e84ee2c1)
1 year ago
vinoyang 72f18fd08b
Provide get current date function dialect for other DBs (#4576)
# Provide get current date function dialect for other DBs

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
1 year ago
Neil Ruaro 3a2855945b
added documentation on retrieving a PG vectorstore (#4578)
This PR adds in documentation on querying an existing vectorstore in PG 

Fixes 3191 (issue)
1 year ago
Andrea Pinto 1e5d25b93c
Improve error messages formatting in doc loaders (#4586)
# Cosmetic in errors formatting

Added appropriate spacing to the `ImportError` message in a bunch of
document loaders to enhance trace readability (including Google Drive,
Youtube, Confluence and others). This change ensures that the error
messages are not displayed as a single line block, and that the `pip
install xyz` commands can be copied to clipboard from terminal easily.

## Who can review?

@eyurtsev
1 year ago
kYLe 570d057db4
Expose AnyScale LLM in langchain.llms (#4585)
# Expose AnyScale LLM in  langchain.llms

Fixes # update init.py so we can from langchain.llms import Anyscale
1 year ago
Eugene Yurtsev a5371a0fa2
Add pytest --only-extended and --only-core options (#4494)
# Adds testing options to pytest

This PR adds the following options: 

* `--only-core` will skip all extended tests, running all core tests.
* `--only-extended` will skip all core tests. Forcing alll extended
tests to be run.

Running `py.test` without specifying either option will remain
unaffected. Run
all tests that can be run within the unit_tests direction. Extended
tests will
run if required packages are installed.

## Before submitting

## Who can review?
1 year ago
Harrison Chase 5ad151ed44
Add constitutional principles from paper (#4554)
Add constitutional principles from https://arxiv.org/pdf/2212.08073.pdf

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Sai Vinay G cf4c1394a2
feat: Added class to support huggingface text generation inference server (#4447)
[Text Generation
Inference](https://github.com/huggingface/text-generation-inference) is
a Rust, Python and gRPC server for generating text using LLMs.

This pull request add support for self hosted Text Generation Inference
servers.

feature: #4280

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Zander Chase 258c319855
Dereference Messages (#4557)
Update how we parse the messages now that the server splits prompts /
messages up
1 year ago
Leonid Ganeline e17d0319d5
Add `arxiv` retriever (#4538) 1 year ago
vinoyang 25cd6e060a
Enhance the prompt to make the LLM generate right date for real today (#4505)
# Enhance the prompt to make the LLM generate right date for real today

Fixes # (issue)

Currently, if the user's question contains `today`, the clickhouse
always points to an old date. This may be related to the fact that the
GPT training data is relatively old.
1 year ago