Commit Graph

1654 Commits (52e4fba897b24361112fceebf87b3d7b81c4a18c)
 

Author SHA1 Message Date
Davis Chase 52e4fba897
Fix self query pinecone translation (#3892)
Enum to string conversion handled differently between python 3.9 and
3.11, currently breaking in 3.11 (see #3788). Thanks @peter-brady for
catching this!
1 year ago
Jef Packer 47a685adcf
count tokens instead of chars in autogpt prompt (#3841)
This looks like a bug. 

Overall by using len instead of token_counter the prompt thinks it has
less context window than it actually does. Because of this it adds fewer
messages. The reduced previous message context makes the agent
repetitive when selecting tasks.
1 year ago
Nikolas Garske c4d3d74148
Fix typos in arxiv.ipynb (#3887)
Several minor typos in the doc for the arxiv document loaders were
fixed.
1 year ago
Zander Chase f7cb2af5f4
Export StructuredTool at `/tools` (#3858) 1 year ago
Ankush Gola e87f81b3ec
add more color to callbacks docs (#3856) 1 year ago
Zander Chase 19912d755e
Vwp/arxiv (#3855)
Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>
1 year ago
Zander Chase e17858470c
Vwp/multi line input (#3854)
Co-authored-by: Paolo Rechia <paolorechia@gmail.com>
1 year ago
Harrison Chase c896657d28
bump version to 154 (#3846) 1 year ago
Zander Chase d7e17fc8fe
Deprecate StdInquireTool (#3850)
- Deprecate StdInInquire tool (dup of HumanInputRun)
- Expose missing tools from `langchain.tools`
1 year ago
Zander Chase b1d69d3e7a
Vwp/fix vectorstore typing (#3851)
Co-authored-by: Jay Stakelon <stakes@users.noreply.github.com>
1 year ago
Zander Chase fbbdf161cd
Lambda Tool (#3842)
Co-authored-by: Jason Holtkamp <holtkam2@gmail.com>
1 year ago
Ankush Gola d3ec00b566
Callbacks Refactor [base] (#3256)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Zander Chase 18ec22fe56
Remove multi-input tool section (#3810)
Moving to new notebook. Will re-intro w/ new agent
1 year ago
mbchang adcad98bee
fix: fix filepath error in agent simulations docs (#3795) 1 year ago
Harrison Chase 20aad0bed1 stripe docs 1 year ago
Harrison Chase 378f0889eb
bump version to 153 (#3774) 1 year ago
Sheldon 399065e858
update zilliz example (#3578)
1. Now the Zilliz example can't connect to Zilliz Cloud, fixed

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase bd7e0a534c
Harrison/csv loader (#3771)
Co-authored-by: mrT23 <tal.r@codium.ai>
1 year ago
Harrison Chase c494ca3ad2
Harrison/doc2txt (#3772)
Co-authored-by: rishni ratnam <rishniratnam@gmail.com>
1 year ago
Mike Wang ce4fea983b
[simple] added test case and improve self class return type annotation (#3773)
a simple follow up of https://github.com/hwchase17/langchain/pull/3748
- added test case
- improve annotation when function return type is class itself.
1 year ago
Harrison Chase 0c0f14407c
Harrison/tair (#3770)
Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>
1 year ago
Aurélien SCHILTZ 502ba6a0be
Fix type annotation for SQLDatabaseToolkit.llm (#3581)
Currently `langchain.agents.agent_toolkits.SQLDatabaseToolkit` has a
field `llm` with type `BaseLLM`. This breaks initialization for some
LLMs. For example, trying to use it with GPT4:
```

from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_toolkits import SQLDatabaseToolkit


db = SQLDatabase.from_uri("some_db_uri")
llm = ChatOpenAI(model_name="gpt-4")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# pydantic.error_wrappers.ValidationError: 1 validation error for SQLDatabaseToolkit
# llm
#  Can't instantiate abstract class BaseLLM with abstract methods _agenerate, _generate, _llm_type (type=type_error)
```
Seems like much of the rest of the codebase has switched from BaseLLM to
BaseLanguageModel. This PR makes the change for SQLDatabaseToolkit as
well
1 year ago
uyhcire 0a7a2b99b5
Fix Chroma integration failing when there are less than 4 items in the collection (#3674)
The code was failing to decrement the `n_results` kwarg passed to
`query(...)`
1 year ago
Rafal Wojdyla 57e028549a
Expose kwargs in `LLMChainExtractor.from_llm` (#3748)
Re: https://github.com/hwchase17/langchain/issues/3747
1 year ago
Mike Wang 512c24fc9c
[annotation improvement] Make AgentType->Class Conversion More Scalable (#3749)
In the current solution, AgentType and AGENT_TO_CLASS are placed in two
separate files and both manually maintained. This might cause
inconsistency when we update either of them.

— latest —
based on the discussion with hwchase17, we don’t know how to further use
the newly introduced AgentTypeConfig type, so it doesn’t make sense yet
to add it. Instead, it’s better to move the dictionary to another file
to keep the loading.py file clear. The consistency is a good point.
Instead of asserting the consistency during linting, we added a unittest
for consistency check. I think it works as auto unittest is triggered
every time with clear failure notice. (well, force push is possible, but
we all know what we are doing, so let’s show trust. :>)

~~This PR includes~~
- ~~Introduced AgentTypeConfig as the source of truth of all AgentType
related meta data.~~
- ~~Each AgentTypeConfig is a annotated class type which can be used for
annotation in other places.~~
- ~~Each AgentTypeConfig can be easily extended when we have more meta
data needs.~~
- ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always
consistent.~~
- ~~Made AGENT_TO_CLASS automatically generated.~~

~~Test Plan:~~
- ~~since this change is focusing on annotation, lint is the major test
focus.~~
- ~~lint, format and test passed on local.~~
1 year ago
Harrison Chase b7ae9f715d
Langchain with reddit (#3661) (#3768)
I have added a reddit document loader which fetches the text from the
Posts of Subreddits or Reddit users, using the `praw` Python package. I
have also added an example notebook reddit.ipynb in order to guide users
to use this dataloader.
This code was made in format similar to twiiter document loader. I have
run code formating, linting and also checked the code myself for
different scenarios.

This is my first contribution to an open source project and I am really
excited about this. If you want to suggest some improvements in my code,
I will be happy to do it. :)

Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
1 year ago
Kohei Kumazaki fa4c35e9e5
Fix encoding issue in WebBaseLoader (#3602)
The character code mismatches occurred when character information was
not included in the response header (In my case, a Japanese web page).
I solved this issue by changing the encoding setting to
apparent_encoding.
1 year ago
Harrison Chase be7a8e0824
Harrison/redis cache (#3766)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
1 year ago
Mike Wang b588446bf9
[simple][test] Added test case for schema.py (#3692)
- added unittest for schema.py covering utility functions and token
counting.
- fixed a nit. based on huggingface doc, the tokenizer model is gpt-2.
[link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html)
- make lint && make format, passed on local
- screenshot of new test running result

<img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM"
src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">
1 year ago
Harrison Chase 15b92d361d
Harrison/confluence stuff (#3765)
Co-authored-by: Jelmer Borst <japborst@gmail.com>
1 year ago
SimFG 5998b53596
Use the GPTCache api interface (#3693)
Use the GPTCache api interface to reduce the possibility of
compatibility issues
1 year ago
engkheng f37a932b24
Improve chat prompt template docs (#3719)
Add a few more explanations and examples.
1 year ago
Robert Perrotta 22770f5202
Make StuffDocumentsChain doc separator configurable (#3718)
This PR makes the `"\n\n"` string with which `StuffDocumentsChain` joins
formatted documents a property so it can be configured. The new
`document_separator` property defaults to `"\n\n"` so the change is
backwards compatible.
1 year ago
Akhil Vempali 64ba24292d
fix: 🐛 SQLAlchemy import error (#3716)
During the import of langchain, SQLAlchemy was throeing an errror
`ImportError: cannot import name 'Mapped' from 'sqlalchemy.orm'`. This
is becaue the Mapped name was introduced in v1.4
1 year ago
Jon Saginaw f8d69e4e52
Enhancement: Blockchain Document Loader with better Metadata support (#3710)
This PR includes some minor alignment updates, including:

- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API

The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.

```
 metadata = {"source": self.contract_address, 
                      "blockchain": self.blockchainType,
                      "tokenId": tokenId}
```
1 year ago
Davis Chase 220a7076ac
Add Mathpix pdf loader (#3727)
Inspo
https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Rafal Wojdyla 37ed6f2177
Handle length safe embedding only if needed (#3723)
Re: https://github.com/hwchase17/langchain/issues/3722

Copy pasting context from the issue:


1bf1c37c0c/langchain/embeddings/openai.py (L210-L211)

Means that the length safe embedding method is "always" used, initial
implementation https://github.com/hwchase17/langchain/pull/991 has the
`embedding_ctx_length` set to -1 (meaning you had to opt-in for the
length safe method), https://github.com/hwchase17/langchain/pull/2330
changed that to max length of OpenAI embeddings v2, meaning the length
safe method is used at all times.

How about changing that if branch to use length safe method only when
needed, meaning when the text is longer than the max context length?
1 year ago
Harrison Chase 40f6e60e68
Harrison/stripe (#3762)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
1 year ago
Jelmer Borst 8cf2ff0be0
Confluence: Add page status filter for spaces (#3732)
At the moment all content in Confluence is retrieved by default,
including archived content.

Often, this is undesired as the content is not relevant anymore.

**Notes**
Fetching pages by label does not support excluding archived content.
This may lead to unexpected results.
1 year ago
Harrison Chase 7a129ac043
Harrison/pypdf loader (#3764)
Co-authored-by: Felipe Meres <felipe@felipemeres.com>
1 year ago
mbchang 4eefea0fe8
new example: single agent, simulated environment (openai gym) (#3758)
For many applications of LLM agents, the environment is real (internet,
database, REPL, etc). However, we can also define agents to interact in
simulated environments like text-based games. This is an example of how
to create a simple agent-environment interaction loop with
[Gymnasium](https://github.com/Farama-Foundation/Gymnasium) (formerly
[OpenAI Gym](https://github.com/openai/gym)).
1 year ago
0xDTE 6ce34bb4fe
Fixing broken document links (#3756)
simple document url fixes. nothing fancy.
1 year ago
Rafal Wojdyla 160bfae93f
Add `DocstoreFn` - lookup doc via arbitrary function (#3760)
This **partially** addresses
https://github.com/hwchase17/langchain/issues/1524, but it's also useful
for some of our use cases.

This `DocstoreFn` allows to lookup a document given a function that
accepts the `search` string without the need to implement a custom
`Docstore`.

This could be useful when:
* you don't want to implement a `Docstore` just to provide a custom
`search`
 * it's expensive to construct an `InMemoryDocstore`/dict
 * you retrieve documents from remote sources
 * you just want to reuse existing objects
1 year ago
Harrison Chase c55ba43093
Harrison/vespa (#3761)
Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>
1 year ago
mbchang ee20b3e0d0
bug fix: initialize the arxivAPIWrapper object (#3733) 1 year ago
leo-gan e510732ad2
docs: improved `vectorstore` notebooks (#3724)
- Added links to the vectorstore providers
- Added installation code (it is not clear that we have to go to the
`LangChan Ecosystem` page to get installation instructions.)
1 year ago
BioErrorLog ad4eae7ef0
Fix linting on the Quickstart Guide sample codes (#3701)
When copying and pasting the sample code from the Quickstart Guide, lint
errors ("missing whitespace around operator") occur."
1 year ago
Zander Chase a46f1d830e
Synchronous Browser (#3745)
Split out sync methods in playwright
1 year ago
Zander Chase 6c2b16e465
Add SceneXplain Tool (#3752) 1 year ago
erwanlc 72c5c15f7f
Fix: Updated links for in depth explanation of chain types in the Question Answering notebooks (#3714)
In the notebook question_answering.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb)),
and the notebook qa_with_sources.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb)),
the first paragraph contains a dead link:

> This notebook walks through how to use LangChain for question
answering over a list of documents. It covers four different types of
chains: stuff, map_reduce, refine, map_rerank. For a more in depth
explanation of what these chain types are, see
[here](32793f94fd/docs/modules/chains/combine_docs.md).

The file combine_docs.md doesn't exist anymore and thus provide 404 -
Page not found.

I updated the links so it redirect to
https://docs.langchain.com/docs/components/chains/index_related_chains
as in the summarize notebook
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/summarize.ipynb))
present in the same folder.
1 year ago