Commit Graph

1945 Commits (49ce5ce1ca70657e34b63c2f239222e9557be115)
 

Author SHA1 Message Date
Zander Chase b1d69d3e7a
Vwp/fix vectorstore typing (#3851)
Co-authored-by: Jay Stakelon <stakes@users.noreply.github.com>
1 year ago
Zander Chase fbbdf161cd
Lambda Tool (#3842)
Co-authored-by: Jason Holtkamp <holtkam2@gmail.com>
1 year ago
Ankush Gola d3ec00b566
Callbacks Refactor [base] (#3256)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Zander Chase 18ec22fe56
Remove multi-input tool section (#3810)
Moving to new notebook. Will re-intro w/ new agent
1 year ago
mbchang adcad98bee
fix: fix filepath error in agent simulations docs (#3795) 1 year ago
Harrison Chase 20aad0bed1 stripe docs 1 year ago
Harrison Chase 378f0889eb
bump version to 153 (#3774) 1 year ago
Sheldon 399065e858
update zilliz example (#3578)
1. Now the Zilliz example can't connect to Zilliz Cloud, fixed

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase bd7e0a534c
Harrison/csv loader (#3771)
Co-authored-by: mrT23 <tal.r@codium.ai>
1 year ago
Harrison Chase c494ca3ad2
Harrison/doc2txt (#3772)
Co-authored-by: rishni ratnam <rishniratnam@gmail.com>
1 year ago
Mike Wang ce4fea983b
[simple] added test case and improve self class return type annotation (#3773)
a simple follow up of https://github.com/hwchase17/langchain/pull/3748
- added test case
- improve annotation when function return type is class itself.
1 year ago
Harrison Chase 0c0f14407c
Harrison/tair (#3770)
Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>
1 year ago
Aurélien SCHILTZ 502ba6a0be
Fix type annotation for SQLDatabaseToolkit.llm (#3581)
Currently `langchain.agents.agent_toolkits.SQLDatabaseToolkit` has a
field `llm` with type `BaseLLM`. This breaks initialization for some
LLMs. For example, trying to use it with GPT4:
```

from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_toolkits import SQLDatabaseToolkit


db = SQLDatabase.from_uri("some_db_uri")
llm = ChatOpenAI(model_name="gpt-4")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# pydantic.error_wrappers.ValidationError: 1 validation error for SQLDatabaseToolkit
# llm
#  Can't instantiate abstract class BaseLLM with abstract methods _agenerate, _generate, _llm_type (type=type_error)
```
Seems like much of the rest of the codebase has switched from BaseLLM to
BaseLanguageModel. This PR makes the change for SQLDatabaseToolkit as
well
1 year ago
uyhcire 0a7a2b99b5
Fix Chroma integration failing when there are less than 4 items in the collection (#3674)
The code was failing to decrement the `n_results` kwarg passed to
`query(...)`
1 year ago
Rafal Wojdyla 57e028549a
Expose kwargs in `LLMChainExtractor.from_llm` (#3748)
Re: https://github.com/hwchase17/langchain/issues/3747
1 year ago
Mike Wang 512c24fc9c
[annotation improvement] Make AgentType->Class Conversion More Scalable (#3749)
In the current solution, AgentType and AGENT_TO_CLASS are placed in two
separate files and both manually maintained. This might cause
inconsistency when we update either of them.

— latest —
based on the discussion with hwchase17, we don’t know how to further use
the newly introduced AgentTypeConfig type, so it doesn’t make sense yet
to add it. Instead, it’s better to move the dictionary to another file
to keep the loading.py file clear. The consistency is a good point.
Instead of asserting the consistency during linting, we added a unittest
for consistency check. I think it works as auto unittest is triggered
every time with clear failure notice. (well, force push is possible, but
we all know what we are doing, so let’s show trust. :>)

~~This PR includes~~
- ~~Introduced AgentTypeConfig as the source of truth of all AgentType
related meta data.~~
- ~~Each AgentTypeConfig is a annotated class type which can be used for
annotation in other places.~~
- ~~Each AgentTypeConfig can be easily extended when we have more meta
data needs.~~
- ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always
consistent.~~
- ~~Made AGENT_TO_CLASS automatically generated.~~

~~Test Plan:~~
- ~~since this change is focusing on annotation, lint is the major test
focus.~~
- ~~lint, format and test passed on local.~~
1 year ago
Harrison Chase b7ae9f715d
Langchain with reddit (#3661) (#3768)
I have added a reddit document loader which fetches the text from the
Posts of Subreddits or Reddit users, using the `praw` Python package. I
have also added an example notebook reddit.ipynb in order to guide users
to use this dataloader.
This code was made in format similar to twiiter document loader. I have
run code formating, linting and also checked the code myself for
different scenarios.

This is my first contribution to an open source project and I am really
excited about this. If you want to suggest some improvements in my code,
I will be happy to do it. :)

Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
1 year ago
Kohei Kumazaki fa4c35e9e5
Fix encoding issue in WebBaseLoader (#3602)
The character code mismatches occurred when character information was
not included in the response header (In my case, a Japanese web page).
I solved this issue by changing the encoding setting to
apparent_encoding.
1 year ago
Harrison Chase be7a8e0824
Harrison/redis cache (#3766)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
1 year ago
Mike Wang b588446bf9
[simple][test] Added test case for schema.py (#3692)
- added unittest for schema.py covering utility functions and token
counting.
- fixed a nit. based on huggingface doc, the tokenizer model is gpt-2.
[link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html)
- make lint && make format, passed on local
- screenshot of new test running result

<img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM"
src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">
1 year ago
Harrison Chase 15b92d361d
Harrison/confluence stuff (#3765)
Co-authored-by: Jelmer Borst <japborst@gmail.com>
1 year ago
SimFG 5998b53596
Use the GPTCache api interface (#3693)
Use the GPTCache api interface to reduce the possibility of
compatibility issues
1 year ago
engkheng f37a932b24
Improve chat prompt template docs (#3719)
Add a few more explanations and examples.
1 year ago
Robert Perrotta 22770f5202
Make StuffDocumentsChain doc separator configurable (#3718)
This PR makes the `"\n\n"` string with which `StuffDocumentsChain` joins
formatted documents a property so it can be configured. The new
`document_separator` property defaults to `"\n\n"` so the change is
backwards compatible.
1 year ago
Akhil Vempali 64ba24292d
fix: 🐛 SQLAlchemy import error (#3716)
During the import of langchain, SQLAlchemy was throeing an errror
`ImportError: cannot import name 'Mapped' from 'sqlalchemy.orm'`. This
is becaue the Mapped name was introduced in v1.4
1 year ago
Jon Saginaw f8d69e4e52
Enhancement: Blockchain Document Loader with better Metadata support (#3710)
This PR includes some minor alignment updates, including:

- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API

The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.

```
 metadata = {"source": self.contract_address, 
                      "blockchain": self.blockchainType,
                      "tokenId": tokenId}
```
1 year ago
Davis Chase 220a7076ac
Add Mathpix pdf loader (#3727)
Inspo
https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Rafal Wojdyla 37ed6f2177
Handle length safe embedding only if needed (#3723)
Re: https://github.com/hwchase17/langchain/issues/3722

Copy pasting context from the issue:


1bf1c37c0c/langchain/embeddings/openai.py (L210-L211)

Means that the length safe embedding method is "always" used, initial
implementation https://github.com/hwchase17/langchain/pull/991 has the
`embedding_ctx_length` set to -1 (meaning you had to opt-in for the
length safe method), https://github.com/hwchase17/langchain/pull/2330
changed that to max length of OpenAI embeddings v2, meaning the length
safe method is used at all times.

How about changing that if branch to use length safe method only when
needed, meaning when the text is longer than the max context length?
1 year ago
Harrison Chase 40f6e60e68
Harrison/stripe (#3762)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
1 year ago
Jelmer Borst 8cf2ff0be0
Confluence: Add page status filter for spaces (#3732)
At the moment all content in Confluence is retrieved by default,
including archived content.

Often, this is undesired as the content is not relevant anymore.

**Notes**
Fetching pages by label does not support excluding archived content.
This may lead to unexpected results.
1 year ago
Harrison Chase 7a129ac043
Harrison/pypdf loader (#3764)
Co-authored-by: Felipe Meres <felipe@felipemeres.com>
1 year ago
mbchang 4eefea0fe8
new example: single agent, simulated environment (openai gym) (#3758)
For many applications of LLM agents, the environment is real (internet,
database, REPL, etc). However, we can also define agents to interact in
simulated environments like text-based games. This is an example of how
to create a simple agent-environment interaction loop with
[Gymnasium](https://github.com/Farama-Foundation/Gymnasium) (formerly
[OpenAI Gym](https://github.com/openai/gym)).
1 year ago
0xDTE 6ce34bb4fe
Fixing broken document links (#3756)
simple document url fixes. nothing fancy.
1 year ago
Rafal Wojdyla 160bfae93f
Add `DocstoreFn` - lookup doc via arbitrary function (#3760)
This **partially** addresses
https://github.com/hwchase17/langchain/issues/1524, but it's also useful
for some of our use cases.

This `DocstoreFn` allows to lookup a document given a function that
accepts the `search` string without the need to implement a custom
`Docstore`.

This could be useful when:
* you don't want to implement a `Docstore` just to provide a custom
`search`
 * it's expensive to construct an `InMemoryDocstore`/dict
 * you retrieve documents from remote sources
 * you just want to reuse existing objects
1 year ago
Harrison Chase c55ba43093
Harrison/vespa (#3761)
Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>
1 year ago
mbchang ee20b3e0d0
bug fix: initialize the arxivAPIWrapper object (#3733) 1 year ago
leo-gan e510732ad2
docs: improved `vectorstore` notebooks (#3724)
- Added links to the vectorstore providers
- Added installation code (it is not clear that we have to go to the
`LangChan Ecosystem` page to get installation instructions.)
1 year ago
BioErrorLog ad4eae7ef0
Fix linting on the Quickstart Guide sample codes (#3701)
When copying and pasting the sample code from the Quickstart Guide, lint
errors ("missing whitespace around operator") occur."
1 year ago
Zander Chase a46f1d830e
Synchronous Browser (#3745)
Split out sync methods in playwright
1 year ago
Zander Chase 6c2b16e465
Add SceneXplain Tool (#3752) 1 year ago
erwanlc 72c5c15f7f
Fix: Updated links for in depth explanation of chain types in the Question Answering notebooks (#3714)
In the notebook question_answering.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb)),
and the notebook qa_with_sources.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb)),
the first paragraph contains a dead link:

> This notebook walks through how to use LangChain for question
answering over a list of documents. It covers four different types of
chains: stuff, map_reduce, refine, map_rerank. For a more in depth
explanation of what these chain types are, see
[here](32793f94fd/docs/modules/chains/combine_docs.md).

The file combine_docs.md doesn't exist anymore and thus provide 404 -
Page not found.

I updated the links so it redirect to
https://docs.langchain.com/docs/components/chains/index_related_chains
as in the summarize notebook
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/summarize.ipynb))
present in the same folder.
1 year ago
Alan Cha e3b7a20454
Fix typo (#3728) 1 year ago
Zander Chase 5042bd40d3
Add Shell Tool (#3335)
Create an official bash shell tool to replace the dynamically generated one
1 year ago
Zander Chase 334c162f16
Add Other File Utilities (#3209)
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file

Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
1 year ago
Zander Chase 491c27f861
PlayWright Web Browser Toolkit (#3262)
Adds a PlayWright web browser toolkit with the following tools:

- NavigateTool (navigate_browser) - navigate to a URL
- NavigateBackTool (previous_page) - wait for an element to appear
- ClickTool (click_element) - click on an element (specified by
selector)
- ExtractTextTool (extract_text) - use beautiful soup to extract text
from the current web page
- ExtractHyperlinksTool (extract_hyperlinks) - use beautiful soup to
extract hyperlinks from the current web page
- GetElementsTool (get_elements) - select elements by CSS selector
- CurrentPageTool (current_page) - get the current page URL
1 year ago
Zander Chase da7b51455c
Dynamic tool -> single purpose (#3697)
I think the logic of
https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565
is too confusing.

I prefer this alternative because:
- All `Tool()` implementations by default will be treated the same as
before. No breaking changes.
- Less reliance on pydantic magic
- The decorator (which only is typed as returning a callable) can infer
schema and generate a structured tool
- Either way, the recommended way to create a custom tool is through
inheriting from the base tool
1 year ago
Zach Schillaci 1bf1c37c0c
Update VectorDBQA to RetrievalQA in tools (#3698)
Because `VectorDBQA` and `VectorDBQAWithSourcesChain` are deprecated
1 year ago
Harrison Chase 32793f94fd
bump version to 152 (#3695) 1 year ago
mbchang 1da3ee1386
Multiagent authoritarian (#3686)
This notebook showcases how to implement a multi-agent simulation where
a privileged agent decides who to speak.
This follows the polar opposite selection scheme as [multi-agent
decentralized speaker
selection](https://python.langchain.com/en/latest/use_cases/agent_simulations/multiagent_bidding.html).

We show an example of this approach in the context of a fictitious
simulation of a news network. This example will showcase how we can
implement agents that
- think before speaking
- terminate the conversation
1 year ago
Zander Chase 4654c58f72
Add validation on agent instantiation for multi-input tools (#3681)
Tradeoffs here:
- No lint-time checking for compatibility
- Differs from JS package
- The signature inference, etc. in the base tool isn't simple
- The `args_schema` is optional 

Pros:
- Forwards compatibility retained
- Doesn't break backwards compatibility
- User doesn't have to think about which class to subclass (single base
tool or dynamic `Tool` interface regardless of input)
-  No need to change the load_tools, etc. interfaces

Co-authored-by: Hasan Patel <mangafield@gmail.com>
1 year ago