langchain/docs/integrations/vectara.md
Ofer Mendelevitch f8cf09a230
Update to Vectara integration (#5950)
This PR updates the Vectara integration (@hwchase17 ):
* Adds reuse of requests.session to imrpove efficiency and speed.
* Utilizes Vectara's low-level API (instead of standard API) to better
match user's specific chunking with LangChain
* Now add_texts puts all the texts into a single Vectara document so
indexing is much faster.
* updated variables names from alpha to lambda_val (to be consistent
with Vectara docs) and added n_context_sentence so it's available to use
if needed.
* Updates to documentation and tests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 16:27:01 -07:00

3.0 KiB

Vectara

What is Vectara?

Vectara Overview:

  • Vectara is developer-first API platform for building GenAI applications
  • To use Vectara - first sign up and create an account. Then create a corpus and an API key for indexing and searching.
  • You can use Vectara's indexing API to add documents into Vectara's index
  • You can use Vectara's Search API to query Vectara's index (which also supports Hybrid search implicitly).
  • You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.

Installation and Setup

To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.

Alternatively these can be provided as environment variables

  • export VECTARA_CUSTOMER_ID="your_customer_id"
  • export VECTARA_CORPUS_ID="your_corpus_id"
  • export VECTARA_API_KEY="your-vectara-api-key"

Usage

VectorStore

There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.

To import this vectorstore:

from langchain.vectorstores import Vectara

To create an instance of the Vectara vectorstore:

vectara = Vectara(
    vectara_customer_id=customer_id, 
    vectara_corpus_id=corpus_id, 
    vectara_api_key=api_key
)

The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and VECTARA_API_KEY, respectively.

To query the vectorstore, you can use the similarity_search method (or similarity_search_with_score), which takes a query string and returns a list of results:

results = vectara.similarity_score("what is LangChain?")

similarity_search_with_score also supports the following additional arguments:

  • k: number of results to return (defaults to 5)
  • lambda_val: the lexical matching factor for hybrid search (defaults to 0.025)
  • filter: a filter to apply to the results (default None)
  • n_sentence_context: number of sentences to include before/after the actual matching segment when returning results. This defaults to 0 so as to return the exact text segment that matches, but can be used with other values e.g. 2 or 3 to return adjacent text segments.

The results are returned as a list of relevant documents, and a relevance score of each document.

For a more detailed examples of using the Vectara wrapper, see one of these two sample notebooks: