Commit Graph

557 Commits

Author SHA1 Message Date
Harrison Chase
2ec25ddd4c
add unstructured examples (#913) 2023-02-06 18:13:46 -08:00
Kevin Huo
31b054f69d
Add pinecone integration test (#911)
Basic integration test for pinecone
2023-02-06 18:13:35 -08:00
Harrison Chase
93a091cfb8
Optionally return shell output on incorrect command (#894) (#899)
This allows the LLM to correct its previous command by looking at the
error message output to the shell.

Additionally, this uses subprocess.run because that is now recommended
over subprocess.check_output:

https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module

Co-authored-by: Amos Ng <me@amos.ng>
2023-02-06 12:46:16 -08:00
James Briggs
3aa53b44dd
added i_end in batch extraction (#907)
Fix for issue #906 

Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone
`from_texts` method
2023-02-06 12:45:56 -08:00
Harrison Chase
82c080c6e6
bump version to 0078 (#908) 2023-02-06 00:32:44 -08:00
Harrison Chase
71e662e88d
update docs (#905) 2023-02-06 00:26:20 -08:00
Harrison Chase
53d56d7650
Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00
Harrison Chase
2a68be3e8d
chat vector db chain (#902) 2023-02-05 21:38:47 -08:00
James Briggs
8217a2f26c
Update pinecone init details in docs (#898)
PR to fix outdated environment details in the docs, see issue #897 

I added code comments as pointers to where users go to get API keys, and
where they can find the relevant environment variable.
2023-02-05 15:21:56 -08:00
Bagatur
7658263bfb
Check type of LLM.generate prompts arg (#886)
Was passing prompt in directly as string and getting nonsense outputs.
Had to inspect source code to realize that first arg should be a list.
Could be nice if there was an explicit error or warning, seems like this
could be a common mistake.
2023-02-04 22:49:17 -08:00
Samantha Whitmore
32b11101d3
Get elements of ActionInput on newlines (#889)
The re.DOTALL flag in Python's re (regular expression) module makes the
. (dot) metacharacter match newline characters as well as any other
character.

Without re.DOTALL, the . metacharacter only matches any character except
for a newline character. With re.DOTALL, the . metacharacter matches any
character, including newline characters.
2023-02-04 20:42:25 -08:00
Harrison Chase
1614c5f5fd
fix flaky tests (#892) 2023-02-04 20:41:33 -08:00
Harrison Chase
a2b699dcd2
prompt template from string (#884) 2023-02-04 17:04:58 -08:00
Alex
7cc44b3bdb
Add to gallery (#882) 2023-02-04 09:45:20 -08:00
Harrison Chase
0b9f086d36
Harrison/docs splitter (#879) 2023-02-03 15:09:13 -08:00
Harrison Chase
bcfbc7a818
version 0077 (#878) 2023-02-03 14:49:52 -08:00
Ryan Walker
1dd0733515
Fix small typo in getting started docs (#876)
Just noticed this little typo while reading the docs, thought I'd open a
PR!
2023-02-03 14:22:12 -08:00
Zach Schillaci
4c79100b15
Correct prompt typo + update example for SQLDatabaseChain (#868)
See https://github.com/hwchase17/langchain/issues/821
2023-02-03 08:34:41 -08:00
Harrison Chase
777aaff841
fix routing to tiktoken encoder (#866) 2023-02-02 22:08:14 -08:00
Harrison Chase
e9ef08862d
validate template (#865) 2023-02-02 22:08:01 -08:00
Harrison Chase
364b771743
sql return direct (#864) 2023-02-02 22:07:41 -08:00
Harrison Chase
483441d305
pass kwargs through to loading (#863) 2023-02-02 22:07:26 -08:00
Harrison Chase
8df6b68093
fix length based example selector (#862) 2023-02-02 22:06:56 -08:00
Harrison Chase
3f48eed5bd
Harrison/milvus (#856)
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: Frank Liu <frank.liu@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
Co-authored-by: Frank Liu <frank@frankzliu.com>
2023-02-02 22:05:47 -08:00
Ankush Gola
933441cc52
Add retry to OpenAI llm (#849)
add ability to retry when certain exceptions are raised by
`openai.Completions.create`

Test plan: ran all OpenAI integration tests.
2023-02-02 19:56:26 -08:00
kahkeng
4a8f5cdf4b
Add alternative token-based text splitter (#816)
This does not involve a separator, and will naively chunk input text at
the appropriate boundaries in token space.

This is helpful if we have strict token length limits that we need to
strictly follow the specified chunk size, and we can't use aggressive
separators like spaces to guarantee the absence of long strings.

CharacterTextSplitter will let these strings through without splitting
them, which could cause overflow errors downstream.

Splitting at arbitrary token boundaries is not ideal but is hopefully
mitigated by having a decent overlap quantity. Also this results in
chunks which has exact number of tokens desired, instead of sometimes
overcounting if we concatenate shorter strings.

Potentially also helps with #528.
2023-02-02 19:55:13 -08:00
Harrison Chase
523ad2e6bd
vercel deployments (#850) 2023-02-02 19:54:09 -08:00
Harrison Chase
fc0cfd7d1f
docs (#848) 2023-02-02 11:35:36 -08:00
Harrison Chase
4d32441b86
bump version to 0076 (#847) 2023-02-02 10:05:39 -08:00
Harrison Chase
23d5f64bda
Harrison/ngram example (#846)
Co-authored-by: Sean Spriggens <ssprigge@syr.edu>
2023-02-02 09:44:42 -08:00
Harrison Chase
0de55048b7
return code for pal (#844) 2023-02-02 08:47:20 -08:00
Harrison Chase
d564308e0f
rfc: instruct embeddings (#811)
Co-authored-by: seanaedmiston <seane999@gmail.com>
2023-02-02 08:44:02 -08:00
Nick Furlotte
576609e665
Update PAL to allow passing local and global context to PythonREPL (#774)
Passing additional variables to the python environment can be useful for
example if you want to generate code to analyze a dataset.

I also added a tracker for the executed code - `code_history`.
2023-02-02 08:34:23 -08:00
Harrison Chase
3f952eb597
add from string method (#820) 2023-02-02 08:23:54 -08:00
Ikko Eltociear Ashimine
ba26a879e0
Fix typo in crawler.py (#842)
seperator -> separator
2023-02-02 08:23:38 -08:00
Eli Mernit
bfabd1d5c0
Added new deployment template (#835)
This PR introduces a new template for deploying LangChain apps as web
endpoints. It includes template code, and links to a detailed
code-walkthrough.
2023-02-01 23:38:36 -08:00
Jonas Ehrenstein
f3508228df
Minor fix for google search util: it's uncertain if "snippet" in results exists (#830)
The results from Google search may not always contain a "snippet". 

Example:
`{'kind': 'customsearch#result', 'title': 'FEMA Flood Map', 'htmlTitle':
'FEMA Flood Map', 'link': 'https://msc.fema.gov/portal/home',
'displayLink': 'msc.fema.gov', 'formattedUrl':
'https://msc.fema.gov/portal/home', 'htmlFormattedUrl':
'https://<b>msc</b>.fema.gov/portal/home'}`

This will cause a KeyError at line 99
`snippets.append(result["snippet"])`.
2023-02-01 23:37:52 -08:00
Zach Schillaci
b4eb043b81
Minor fix to SQLDatabaseChain doc (#826) 2023-02-01 23:37:38 -08:00
Istora Mandiri
06438794e1
Fix typo in textsplitter docs (#825) 2023-02-01 23:32:35 -08:00
Raza Habib
9f8e05ffd4
Update __init__.py (#827)
Remove duplicate APIChain
2023-02-01 23:31:38 -08:00
Harrison Chase
b0d560be56
add to gallery (#824) 2023-02-01 07:10:15 -08:00
Johanna Appel
ebea40ce86
Add 'truncate' parameter for CohereEmbeddings (#798)
Currently, the 'truncate' parameter of the cohere API is not supported.

This means that by default, if trying to generate and embedding that is
too big, the call will just fail with an error (which is frustrating if
using this embedding source e.g. with GPT-Index, because it's hard to
handle it properly when generating a lot of embeddings).
With the parameter, one can decide to either truncate the START or END
of the text to fit the max token length and still generate an embedding
without throwing the error.

In this PR, I added this parameter to the class.

_Arguably, there should be a better way to handle this error, e.g. by
optionally calling a function or so that gets triggered when the token
limit is reached and can split the document or some such. Especially in
the use case with GPT-Index, its often hard to estimate the token counts
for each document and I'd rather sort out the troublemakers or simply
split them than interrupting the whole execution.
Thoughts?_

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-01 07:09:03 -08:00
Harrison Chase
b9045f7e0d
bump version to 0075 (#819) 2023-01-31 00:18:32 -08:00
Harrison Chase
7b4882a2f4
Harrison/tf embeddings (#817)
Co-authored-by: Ryohei Kuroki <10434946+yakigac@users.noreply.github.com>
2023-01-31 00:00:08 -08:00
Harrison Chase
5d4b6e4d4e
conversational agent fix (#818) 2023-01-30 23:59:55 -08:00
Harrison Chase
94ae126747
return sql intermediate steps (#792) 2023-01-30 15:10:48 -08:00
bair82
ae5695ad32
Update cohere.py (#795)
When stop tokens are set in Cohere LLM constructor, they are currently
not stripped from the response, and they should be stripped
2023-01-30 14:55:44 -08:00
Johanna Appel
cacf4091c0
Fix documentation for 'model' parameter in CohereEmbeddings (#797)
Currently, the class parameter 'model_name' of the CohereEmbeddings
class is not supported, but 'model' is. The class documentation is
inconsistent with this, though, so I propose to either fix the
documentation (this PR right now) or fix the parameter.

It will create the following error:
```
ValidationError: 1 validation error for CohereEmbeddings
model_name
  extra fields not permitted (type=value_error.extra)
```
2023-01-30 14:55:08 -08:00
Jason Liu
54f9e4287f
Pass kwargs from initialize_agent into agent classmethod (#799)
# Problem
I noticed that in order to change the prefix of the prompt in the
`zero-shot-react-description` agent
we had to dig around to subset strings deep into the agent's attributes.
It requires the user to inspect a long chain of attributes and classes.

`initialize_agent -> AgentExecutor -> Agent -> LLMChain -> Prompt from
Agent.create_prompt`

``` python
agent = initialize_agent(
    tools=tools,
    llm=fake_llm,
    agent="zero-shot-react-description"
)
prompt_str = agent.agent.llm_chain.prompt.template
new_prompt_str = change_prefix(prompt_str)
agent.agent.llm_chain.prompt.template = new_prompt_str
```

# Implemented Solution

`initialize_agent` accepts `**kwargs` but passes it to `AgentExecutor`
but not `ZeroShotAgent`, by simply giving the kwargs to the agent class
methods we can support changing the prefix and suffix for one agent
while allowing future agents to take advantage of `initialize_agent`.


```
agent = initialize_agent(
    tools=tools,
    llm=fake_llm,
    agent="zero-shot-react-description",
    agent_kwargs={"prefix": prefix, "suffix": suffix}
)
```

To be fair, this was before finding docs around custom agents here:
https://langchain.readthedocs.io/en/latest/modules/agents/examples/custom_agent.html?highlight=custom%20#custom-llmchain
but i find that my use case just needed to change the prefix a little.


# Changes

* Pass kwargs to Agent class method
* Added a test to check suffix and prefix

---------

Co-authored-by: Jason Liu <jason@jxnl.coA>
2023-01-30 14:54:09 -08:00
Roger Zurawicki
c331009440
docs: Update langchain link to PyPI (#800)
Simple one-line fix

CONTRIBUTING used a link that pointed to the `ruff` project.
2023-01-30 14:53:16 -08:00