Compare commits

...

9 Commits

Author SHA1 Message Date
William FH 9fa9f05e5d
Catch System Error in ast parse (#20961)
I can't seem to reproduce, but i got this:

```
SystemError: AST constructor recursion depth mismatch (before=102, after=37)
```

And the operation isn't critical for the actual forward pass so seems
preferable to expand our caught exceptions
2 weeks ago
YH 2aca7fcdcf
core[patch]: Enhance link extraction with query parameters (#20259)
**Description**: This update enhances the `extract_sub_links` function
within the `langchain_core/utils/html.py` module to include query
parameters in the extracted URLs.

**Issue**: N/A

**Dependencies**: No additional dependencies required for this change.

**Twitter handle**: N/A

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2 weeks ago
CT 0e917e319b
docs: Add langchainhub to pip install (#20185)
Added langchainhub package in import statement which is required for
"from langchain import hub" to work.

Added sample code to add OpenAI key

Co-authored-by: Chi Yan Tang <100466443+poochiekittie@users.noreply.github.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2 weeks ago
Pamela Fox 45092a36a2
docs: Fix langgraph link (#20244)
Just a simple PR to fix a broken link. Apparently having backticks
outside a link makes it render as code.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2 weeks ago
Chip Davis e818c75f8a
infra: test directory loader multithreaded (#20281)
This is a unit test for #20230 which was a fix for using multithreaded
mode with directory loader @eyurtsev
2 weeks ago
Guilherme Zanotelli f931a9ce60
community[patch]: Pass kwargs to SPARQLStore from RdfGraph (#20385)
This introduces `store_kwargs` which behaves similarly to `graph_kwargs`
on the `RdfGraph` object, which will enable users to pass `headers` and
other arguments to the underlying `SPARQLStore` object. I have also made
a [PR in `rdflib` to support passing
`default_graph`](https://github.com/RDFLib/rdflib/pull/2761).

Example usage:
```python
from langchain_community.graphs import RdfGraph

graph = RdfGraph(
    query_endpoint="http://localhost/sparql",
    standard="rdf",
    store_kwargs=dict(
        default_graph="http://example.com/mygraph"
    )
)
```

<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.-->

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2 weeks ago
Chandre Van Der Westhuizen e57cf73cf5
docs: Added MindsDB provider (#20322)
MindsDB integrates with LangChain, enabling users to deploy, serve, and
fine-tune models available via LangChain within MindsDB, making them
accessible to numerous data sources.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2 weeks ago
Jorge Piedrahita Ortiz 40b2e2916b
community[minor]: Sambanova llm integration (#20955)
- **Description:** Added [Sambanova systems](https://sambanova.ai/)
integration, including sambaverse and sambastudio LLMs
- **Dependencies:**   sseclient-py  (optional)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2 weeks ago
Rahul Triptahi 955cf186d2
community[patch]: Ingest source, owner and full_path if present in Document's metadata. (#20949)
Description: The PebbloSafeLoader should first check for owner,
full_path and size in metadata before implementing its own logic.
Dependencies: None
Documentation: NA.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2 weeks ago

@ -47,7 +47,7 @@ For these applications, LangChain simplifies the entire application lifecycle:
- **`langchain-community`**: Third party integrations.
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[LangGraph](https://python.langchain.com/docs/langgraph)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
- **[`LangGraph`](https://python.langchain.com/docs/langgraph)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
### Productionization:
- **[LangSmith](https://python.langchain.com/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sambanova\n",
"\n",
"**[Sambanova](https://sambanova.ai/)'s** [Sambaverse](https://sambaverse.sambanova.ai/) and [Sambastudio](https://sambanova.ai/technology/full-stack-ai-platform) are platforms for running your own open source models\n",
"\n",
"This example goes over how to use LangChain to interact with Sambanova models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sambaverse"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Sambaverse** allows you to interact with multiple Open source models you can se the list of available models an interact with then in the [playground](https://sambaverse.sambanova.ai/playground)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An API key is required to access to Sambaverse models get one creating an account in [sambaverse.sambanova.ai](https://sambaverse.sambanova.ai/)\n",
"\n",
"The [sseclient-py](https://pypi.org/project/sseclient-py/) package is required to run streaming predictions "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --quiet sseclient-py==1.8.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Register your API Key environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"sambaverse_api_key = \"<Your sambaverse API key>\"\n",
"\n",
"# Set the environment variables\n",
"os.environ[\"SAMBAVERSE_API_KEY\"] = sambaverse_api_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call Sambaverse models directly from langchain!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.llms.sambanova import Sambaverse\n",
"\n",
"llm = Sambaverse(\n",
" sambaverse_model_name=\"Meta/llama-2-7b-chat-hf\",\n",
" streaming=False,\n",
" model_kwargs={\n",
" \"do_sample\": True,\n",
" \"max_tokens_to_generate\": 1000,\n",
" \"temperature\": 0.01,\n",
" \"process_prompt\": True,\n",
" \"select_expert\": \"llama-2-7b-chat-hf\",\n",
" # \"repetition_penalty\": {\"type\": \"float\", \"value\": \"1\"},\n",
" # \"top_k\": {\"type\": \"int\", \"value\": \"50\"},\n",
" # \"top_p\": {\"type\": \"float\", \"value\": \"1\"}\n",
" },\n",
")\n",
"\n",
"print(llm.invoke(\"Why should I use open source models?\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SambaStudio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**SambaStudio** allows you to Train, run batch inference jous, and deploy online inference endpoints to run your own fine tunned open source models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A SambaStudio environment is required to deploy a model. Get more information in [sambanova.ai/products/enterprise-ai-platform-sambanova-suite](https://sambanova.ai/products/enterprise-ai-platform-sambanova-suite)\n",
"\n",
"The [sseclient-py](https://pypi.org/project/sseclient-py/) package is required to run streaming predictions "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --quiet sseclient-py==1.8.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Register your environment variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"sambastudio_base_url = \"<Your SambaStudio environment URL>\"\n",
"sambastudio_project_id = \"<Your SambaStudio project id>\"\n",
"sambastudio_endpoint_id = \"<Your SambaStudio endpoint id>\"\n",
"sambastudio_api_key = \"<Your SambaStudio endpoint API key>\"\n",
"\n",
"# Set the environment variables\n",
"os.environ[\"SAMBASTUDIO_BASE_URL\"] = sambastudio_base_url\n",
"os.environ[\"SAMBASTUDIO_PROJECT_ID\"] = sambastudio_project_id\n",
"os.environ[\"SAMBASTUDIO_ENDPOINT_ID\"] = sambastudio_endpoint_id\n",
"os.environ[\"SAMBASTUDIO_API_KEY\"] = sambastudio_api_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call SambaStudio models directly from langchain!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.llms.sambanova import SambaStudio\n",
"\n",
"llm = SambaStudio(\n",
" streaming=False,\n",
" model_kwargs={\n",
" \"do_sample\": True,\n",
" \"max_tokens_to_generate\": 1000,\n",
" \"temperature\": 0.01,\n",
" # \"repetition_penalty\": {\"type\": \"float\", \"value\": \"1\"},\n",
" # \"top_k\": {\"type\": \"int\", \"value\": \"50\"},\n",
" # \"top_logprobs\": {\"type\": \"int\", \"value\": \"0\"},\n",
" # \"top_p\": {\"type\": \"float\", \"value\": \"1\"}\n",
" },\n",
")\n",
"\n",
"print(llm.invoke(\"Why should I use open source models?\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,14 @@
# MindsDB
MindsDB is the platform for customizing AI from enterprise data. With MindsDB and it's nearly 200 integrations to [data sources](https://docs.mindsdb.com/integrations/data-overview) and [AI/ML frameworks](https://docs.mindsdb.com/integrations/ai-overview), any developer can use their enterprise data to customize AI for their purpose, faster and more securely.
With MindsDB, you can connect any data source to any AI/ML model to implement and automate AI-powered applications. Deploy, serve, and fine-tune models in real-time, utilizing data from databases, vector stores, or applications. Do all that using universal tools developers already know.
MindsDB integrates with LangChain, enabling users to:
- Deploy models available via LangChain within MindsDB, making them accessible to numerous data sources.
- Fine-tune models available via LangChain within MindsDB using real-time and dynamic data.
- Automate AI workflows with LangChain and MindsDB.
Follow [our docs](https://docs.mindsdb.com/integrations/ai-engines/langchain) to learn more about MindsDBs integration with LangChain and see examples.

@ -75,102 +75,19 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "578d6a90",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: openai in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (0.27.8)\n",
"Requirement already satisfied: tiktoken in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (0.4.0)\n",
"Requirement already satisfied: chromadb in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (0.4.4)\n",
"Requirement already satisfied: langchain in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (0.0.299)\n",
"Requirement already satisfied: requests>=2.20 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from openai) (2.31.0)\n",
"Requirement already satisfied: tqdm in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from openai) (4.64.1)\n",
"Requirement already satisfied: aiohttp in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from openai) (3.8.5)\n",
"Requirement already satisfied: regex>=2022.1.18 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from tiktoken) (2023.6.3)\n",
"Requirement already satisfied: pydantic<2.0,>=1.9 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (1.10.12)\n",
"Requirement already satisfied: chroma-hnswlib==0.7.2 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (0.7.2)\n",
"Requirement already satisfied: fastapi<0.100.0,>=0.95.2 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (0.99.1)\n",
"Requirement already satisfied: uvicorn[standard]>=0.18.3 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (0.23.2)\n",
"Requirement already satisfied: numpy>=1.21.6 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (1.24.4)\n",
"Requirement already satisfied: posthog>=2.4.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (3.0.1)\n",
"Requirement already satisfied: typing-extensions>=4.5.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (4.7.1)\n",
"Requirement already satisfied: pulsar-client>=3.1.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (3.2.0)\n",
"Requirement already satisfied: onnxruntime>=1.14.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (1.15.1)\n",
"Requirement already satisfied: tokenizers>=0.13.2 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (0.13.3)\n",
"Requirement already satisfied: pypika>=0.48.9 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (0.48.9)\n",
"Collecting tqdm (from openai)\n",
" Obtaining dependency information for tqdm from https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl.metadata\n",
" Downloading tqdm-4.66.1-py3-none-any.whl.metadata (57 kB)\n",
"\u001b[2K \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m57.6/57.6 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: overrides>=7.3.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (7.4.0)\n",
"Requirement already satisfied: importlib-resources in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from chromadb) (6.0.0)\n",
"Requirement already satisfied: PyYAML>=5.3 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (6.0.1)\n",
"Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (2.0.20)\n",
"Requirement already satisfied: anyio<4.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (3.7.1)\n",
"Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (4.0.3)\n",
"Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (0.5.9)\n",
"Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (1.33)\n",
"Requirement already satisfied: langsmith<0.1.0,>=0.0.38 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (0.0.42)\n",
"Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (2.8.5)\n",
"Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from langchain) (8.2.3)\n",
"Requirement already satisfied: attrs>=17.3.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (23.1.0)\n",
"Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (3.2.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.9.2)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.4.0)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)\n",
"Requirement already satisfied: idna>=2.8 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from anyio<4.0->langchain) (3.4)\n",
"Requirement already satisfied: sniffio>=1.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from anyio<4.0->langchain) (1.3.0)\n",
"Requirement already satisfied: exceptiongroup in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from anyio<4.0->langchain) (1.1.3)\n",
"Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (3.20.1)\n",
"Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (1.5.1)\n",
"Requirement already satisfied: typing-inspect>=0.4.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (0.9.0)\n",
"Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from fastapi<0.100.0,>=0.95.2->chromadb) (0.27.0)\n",
"Requirement already satisfied: jsonpointer>=1.9 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from jsonpatch<2.0,>=1.33->langchain) (2.4)\n",
"Requirement already satisfied: coloredlogs in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from onnxruntime>=1.14.1->chromadb) (15.0.1)\n",
"Requirement already satisfied: flatbuffers in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from onnxruntime>=1.14.1->chromadb) (23.5.26)\n",
"Requirement already satisfied: packaging in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from onnxruntime>=1.14.1->chromadb) (23.1)\n",
"Requirement already satisfied: protobuf in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from onnxruntime>=1.14.1->chromadb) (4.23.4)\n",
"Requirement already satisfied: sympy in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from onnxruntime>=1.14.1->chromadb) (1.12)\n",
"Requirement already satisfied: six>=1.5 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from posthog>=2.4.0->chromadb) (1.16.0)\n",
"Requirement already satisfied: monotonic>=1.5 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from posthog>=2.4.0->chromadb) (1.6)\n",
"Requirement already satisfied: backoff>=1.10.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from posthog>=2.4.0->chromadb) (2.2.1)\n",
"Requirement already satisfied: python-dateutil>2.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from posthog>=2.4.0->chromadb) (2.8.2)\n",
"Requirement already satisfied: certifi in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from pulsar-client>=3.1.0->chromadb) (2023.7.22)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.16)\n",
"Requirement already satisfied: click>=7.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (8.1.7)\n",
"Requirement already satisfied: h11>=0.8 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.14.0)\n",
"Requirement already satisfied: httptools>=0.5.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.6.0)\n",
"Requirement already satisfied: python-dotenv>=0.13 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (1.0.0)\n",
"Requirement already satisfied: uvloop!=0.15.0,!=0.15.1,>=0.14.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.17.0)\n",
"Requirement already satisfied: watchfiles>=0.13 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.19.0)\n",
"Requirement already satisfied: websockets>=10.4 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from uvicorn[standard]>=0.18.3->chromadb) (11.0.3)\n",
"Requirement already satisfied: zipp>=3.1.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from importlib-resources->chromadb) (3.16.2)\n",
"Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain) (1.0.0)\n",
"Requirement already satisfied: humanfriendly>=9.1 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb) (10.0)\n",
"Requirement already satisfied: mpmath>=0.19 in /Users/bagatur/langchain/.venv/lib/python3.9/site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n",
"Using cached tqdm-4.66.1-py3-none-any.whl (78 kB)\n",
"Installing collected packages: tqdm\n",
" Attempting uninstall: tqdm\n",
" Found existing installation: tqdm 4.64.1\n",
" Uninstalling tqdm-4.64.1:\n",
" Successfully uninstalled tqdm-4.64.1\n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"clarifai 9.8.1 requires tqdm==4.64.1, but you have tqdm 4.66.1 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0mSuccessfully installed tqdm-4.66.1\n"
]
}
],
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-openai tiktoken chromadb langchain\n",
"%pip install --upgrade --quiet langchain-openai tiktoken chromadb langchain langchainhub\n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file\n",
"#\n",
"# import os\n",
"# os.environ['OPENAI_API_KEY'] = 'sk...'\n",
"#\n",
"# import dotenv\n",
"\n",
"# dotenv.load_dotenv()"
]
},

@ -157,16 +157,19 @@ class PebbloSafeLoader(BaseLoader):
doc_content = [doc.dict() for doc in loaded_docs]
docs = []
for doc in doc_content:
doc_authorized_identities = doc.get("metadata", {}).get(
"authorized_identities", []
)
doc_metadata = doc.get("metadata", {})
doc_authorized_identities = doc_metadata.get("authorized_identities", [])
doc_source_path = get_full_path(
doc.get("metadata", {}).get("source", self.source_path)
doc_metadata.get(
"full_path", doc_metadata.get("source", self.source_path)
)
)
doc_source_owner = doc_metadata.get(
"owner", PebbloSafeLoader.get_file_owner_from_path(doc_source_path)
)
doc_source_owner = PebbloSafeLoader.get_file_owner_from_path(
doc_source_path
doc_source_size = doc_metadata.get(
"size", self.get_source_size(doc_source_path)
)
doc_source_size = self.get_source_size(doc_source_path)
page_content = str(doc.get("page_content"))
page_content_size = self.calculate_content_size(page_content)
self.source_aggregate_size += page_content_size

@ -117,6 +117,7 @@ class RdfGraph:
standard: Optional[str] = "rdf",
local_copy: Optional[str] = None,
graph_kwargs: Optional[Dict] = None,
store_kwargs: Optional[Dict] = None,
) -> None:
"""
Set up the RDFlib graph
@ -130,6 +131,9 @@ class RdfGraph:
:param graph_kwargs: Additional rdflib.Graph specific kwargs
that will be used to initialize it,
if query_endpoint is provided.
:param store_kwargs: Additional sparqlstore.SPARQLStore specific kwargs
that will be used to initialize it,
if query_endpoint is provided.
"""
self.source_file = source_file
self.serialization = serialization
@ -174,12 +178,13 @@ class RdfGraph:
self.graph.parse(source_file, format=self.serialization)
if query_endpoint:
store_kwargs = store_kwargs or {}
self.mode = "store"
if not update_endpoint:
self._store = sparqlstore.SPARQLStore()
self._store = sparqlstore.SPARQLStore(**store_kwargs)
self._store.open(query_endpoint)
else:
self._store = sparqlstore.SPARQLUpdateStore()
self._store = sparqlstore.SPARQLUpdateStore(**store_kwargs)
self._store.open((query_endpoint, update_endpoint))
graph_kwargs = graph_kwargs or {}
self.graph = rdflib.Graph(self._store, **graph_kwargs)

@ -0,0 +1,865 @@
import json
from typing import Any, Dict, Generator, Iterator, List, Optional, Union
import requests
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
from langchain_core.pydantic_v1 import Extra, root_validator
from langchain_core.utils import get_from_dict_or_env
class SVEndpointHandler:
"""
SambaNova Systems Interface for Sambaverse endpoint.
:param str host_url: Base URL of the DaaS API service
"""
API_BASE_PATH = "/api/predict"
def __init__(self, host_url: str):
"""
Initialize the SVEndpointHandler.
:param str host_url: Base URL of the DaaS API service
"""
self.host_url = host_url
self.http_session = requests.Session()
@staticmethod
def _process_response(response: requests.Response) -> Dict:
"""
Processes the API response and returns the resulting dict.
All resulting dicts, regardless of success or failure, will contain the
`status_code` key with the API response status code.
If the API returned an error, the resulting dict will contain the key
`detail` with the error message.
If the API call was successful, the resulting dict will contain the key
`data` with the response data.
:param requests.Response response: the response object to process
:return: the response dict
:rtype: dict
"""
result: Dict[str, Any] = {}
try:
text_result = response.text.strip().split("\n")[-1]
result = {"data": json.loads("".join(text_result.split("data: ")[1:]))}
except Exception as e:
result["detail"] = str(e)
if "status_code" not in result:
result["status_code"] = response.status_code
return result
@staticmethod
def _process_streaming_response(
response: requests.Response,
) -> Generator[GenerationChunk, None, None]:
"""Process the streaming response"""
try:
import sseclient
except ImportError:
raise ValueError(
"could not import sseclient library"
"Please install it with `pip install sseclient-py`."
)
client = sseclient.SSEClient(response)
close_conn = False
for event in client.events():
if event.event == "error_event":
close_conn = True
text = json.dumps({"event": event.event, "data": event.data})
chunk = GenerationChunk(text=text)
yield chunk
if close_conn:
client.close()
def _get_full_url(self) -> str:
"""
Return the full API URL for a given path.
:returns: the full API URL for the sub-path
:rtype: str
"""
return f"{self.host_url}{self.API_BASE_PATH}"
def nlp_predict(
self,
key: str,
sambaverse_model_name: Optional[str],
input: Union[List[str], str],
params: Optional[str] = "",
stream: bool = False,
) -> Dict:
"""
NLP predict using inline input string.
:param str project: Project ID in which the endpoint exists
:param str endpoint: Endpoint ID
:param str key: API Key
:param str input_str: Input string
:param str params: Input params string
:returns: Prediction results
:rtype: dict
"""
if isinstance(input, str):
input = [input]
parsed_input = []
for element in input:
parsed_element = {
"conversation_id": "sambaverse-conversation-id",
"messages": [
{
"message_id": 0,
"role": "user",
"content": element,
}
],
}
parsed_input.append(json.dumps(parsed_element))
if params:
data = {"inputs": parsed_input, "params": json.loads(params)}
else:
data = {"inputs": parsed_input}
response = self.http_session.post(
self._get_full_url(),
headers={
"key": key,
"Content-Type": "application/json",
"modelName": sambaverse_model_name,
},
json=data,
)
return SVEndpointHandler._process_response(response)
def nlp_predict_stream(
self,
key: str,
sambaverse_model_name: Optional[str],
input: Union[List[str], str],
params: Optional[str] = "",
) -> Iterator[GenerationChunk]:
"""
NLP predict using inline input string.
:param str project: Project ID in which the endpoint exists
:param str endpoint: Endpoint ID
:param str key: API Key
:param str input_str: Input string
:param str params: Input params string
:returns: Prediction results
:rtype: dict
"""
if isinstance(input, str):
input = [input]
parsed_input = []
for element in input:
parsed_element = {
"conversation_id": "sambaverse-conversation-id",
"messages": [
{
"message_id": 0,
"role": "user",
"content": element,
}
],
}
parsed_input.append(json.dumps(parsed_element))
if params:
data = {"inputs": parsed_input, "params": json.loads(params)}
else:
data = {"inputs": parsed_input}
# Streaming output
response = self.http_session.post(
self._get_full_url(),
headers={
"key": key,
"Content-Type": "application/json",
"modelName": sambaverse_model_name,
},
json=data,
stream=True,
)
for chunk in SVEndpointHandler._process_streaming_response(response):
yield chunk
class Sambaverse(LLM):
"""
Sambaverse large language models.
To use, you should have the environment variable ``SAMBAVERSE_API_KEY``
set with your API key.
get one in https://sambaverse.sambanova.ai
read extra documentation in https://docs.sambanova.ai/sambaverse/latest/index.html
Example:
.. code-block:: python
from langchain_community.llms.sambanova import Sambaverse
Sambaverse(
sambaverse_url="https://sambaverse.sambanova.ai",
sambaverse_api_key: "your sambaverse api key",
sambaverse_model_name: "Meta/llama-2-7b-chat-hf",
streaming: = False
model_kwargs={
"do_sample": False,
"max_tokens_to_generate": 100,
"temperature": 0.7,
"top_p": 1.0,
"repetition_penalty": 1,
"top_k": 50,
},
)
"""
sambaverse_url: str = "https://sambaverse.sambanova.ai"
"""Sambaverse url to use"""
sambaverse_api_key: str = ""
"""sambaverse api key"""
sambaverse_model_name: Optional[str] = None
"""sambaverse expert model to use"""
model_kwargs: Optional[dict] = None
"""Key word arguments to pass to the model."""
streaming: Optional[bool] = False
"""Streaming flag to get streamed response."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@classmethod
def is_lc_serializable(cls) -> bool:
return True
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key exists in environment."""
values["sambaverse_url"] = get_from_dict_or_env(
values, "sambaverse_url", "SAMBAVERSE_URL"
)
values["sambaverse_api_key"] = get_from_dict_or_env(
values, "sambaverse_api_key", "SAMBAVERSE_API_KEY"
)
values["sambaverse_model_name"] = get_from_dict_or_env(
values, "sambaverse_model_name", "SAMBAVERSE_MODEL_NAME"
)
return values
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Get the identifying parameters."""
return {**{"model_kwargs": self.model_kwargs}}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "Sambaverse LLM"
def _get_tuning_params(self, stop: Optional[List[str]]) -> str:
"""
Get the tuning parameters to use when calling the LLM.
Args:
stop: Stop words to use when generating. Model output is cut off at the
first occurrence of any of the stop substrings.
Returns:
The tuning parameters as a JSON string.
"""
_model_kwargs = self.model_kwargs or {}
_stop_sequences = _model_kwargs.get("stop_sequences", [])
_stop_sequences = stop or _stop_sequences
_model_kwargs["stop_sequences"] = ",".join(f'"{x}"' for x in _stop_sequences)
tuning_params_dict = {
k: {"type": type(v).__name__, "value": str(v)}
for k, v in (_model_kwargs.items())
}
tuning_params = json.dumps(tuning_params_dict)
return tuning_params
def _handle_nlp_predict(
self,
sdk: SVEndpointHandler,
prompt: Union[List[str], str],
tuning_params: str,
) -> str:
"""
Perform an NLP prediction using the Sambaverse endpoint handler.
Args:
sdk: The SVEndpointHandler to use for the prediction.
prompt: The prompt to use for the prediction.
tuning_params: The tuning parameters to use for the prediction.
Returns:
The prediction result.
Raises:
ValueError: If the prediction fails.
"""
response = sdk.nlp_predict(
self.sambaverse_api_key, self.sambaverse_model_name, prompt, tuning_params
)
if response["status_code"] != 200:
optional_details = response["details"]
optional_message = response["message"]
raise ValueError(
f"Sambanova /complete call failed with status code "
f"{response['status_code']}. Details: {optional_details}"
f"{response['status_code']}. Message: {optional_message}"
)
return response["data"]["completion"]
def _handle_completion_requests(
self, prompt: Union[List[str], str], stop: Optional[List[str]]
) -> str:
"""
Perform a prediction using the Sambaverse endpoint handler.
Args:
prompt: The prompt to use for the prediction.
stop: stop sequences.
Returns:
The prediction result.
Raises:
ValueError: If the prediction fails.
"""
ss_endpoint = SVEndpointHandler(self.sambaverse_url)
tuning_params = self._get_tuning_params(stop)
return self._handle_nlp_predict(ss_endpoint, prompt, tuning_params)
def _handle_nlp_predict_stream(
self, sdk: SVEndpointHandler, prompt: Union[List[str], str], tuning_params: str
) -> Iterator[GenerationChunk]:
"""
Perform a streaming request to the LLM.
Args:
sdk: The SVEndpointHandler to use for the prediction.
prompt: The prompt to use for the prediction.
tuning_params: The tuning parameters to use for the prediction.
Returns:
An iterator of GenerationChunks.
"""
for chunk in sdk.nlp_predict_stream(
self.sambaverse_api_key, self.sambaverse_model_name, prompt, tuning_params
):
yield chunk
def _stream(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
"""Stream the Sambaverse's LLM on the given prompt.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
run_manager: Callback manager for the run.
**kwargs: Additional keyword arguments. directly passed
to the sambaverse model in API call.
Returns:
An iterator of GenerationChunks.
"""
ss_endpoint = SVEndpointHandler(self.sambaverse_url)
tuning_params = self._get_tuning_params(stop)
try:
if self.streaming:
for chunk in self._handle_nlp_predict_stream(
ss_endpoint, prompt, tuning_params
):
if run_manager:
run_manager.on_llm_new_token(chunk.text)
yield chunk
else:
return
except Exception as e:
# Handle any errors raised by the inference endpoint
raise ValueError(f"Error raised by the inference endpoint: {e}") from e
def _handle_stream_request(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]],
run_manager: Optional[CallbackManagerForLLMRun],
kwargs: Dict[str, Any],
) -> str:
"""
Perform a streaming request to the LLM.
Args:
prompt: The prompt to generate from.
stop: Stop words to use when generating. Model output is cut off at the
first occurrence of any of the stop substrings.
run_manager: Callback manager for the run.
**kwargs: Additional keyword arguments. directly passed
to the sambaverse model in API call.
Returns:
The model output as a string.
"""
completion = ""
for chunk in self._stream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
def _call(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Run the LLM on the given input.
Args:
prompt: The prompt to generate from.
stop: Stop words to use when generating. Model output is cut off at the
first occurrence of any of the stop substrings.
run_manager: Callback manager for the run.
**kwargs: Additional keyword arguments. directly passed
to the sambaverse model in API call.
Returns:
The model output as a string.
"""
try:
if self.streaming:
return self._handle_stream_request(prompt, stop, run_manager, kwargs)
return self._handle_completion_requests(prompt, stop)
except Exception as e:
# Handle any errors raised by the inference endpoint
raise ValueError(f"Error raised by the inference endpoint: {e}") from e
class SSEndpointHandler:
"""
SambaNova Systems Interface for SambaStudio model endpoints.
:param str host_url: Base URL of the DaaS API service
"""
API_BASE_PATH = "/api"
def __init__(self, host_url: str):
"""
Initialize the SSEndpointHandler.
:param str host_url: Base URL of the DaaS API service
"""
self.host_url = host_url
self.http_session = requests.Session()
@staticmethod
def _process_response(response: requests.Response) -> Dict:
"""
Processes the API response and returns the resulting dict.
All resulting dicts, regardless of success or failure, will contain the
`status_code` key with the API response status code.
If the API returned an error, the resulting dict will contain the key
`detail` with the error message.
If the API call was successful, the resulting dict will contain the key
`data` with the response data.
:param requests.Response response: the response object to process
:return: the response dict
:rtype: dict
"""
result: Dict[str, Any] = {}
try:
result = response.json()
except Exception as e:
result["detail"] = str(e)
if "status_code" not in result:
result["status_code"] = response.status_code
return result
@staticmethod
def _process_streaming_response(
response: requests.Response,
) -> Generator[GenerationChunk, None, None]:
"""Process the streaming response"""
try:
import sseclient
except ImportError:
raise ValueError(
"could not import sseclient library"
"Please install it with `pip install sseclient-py`."
)
client = sseclient.SSEClient(response)
close_conn = False
for event in client.events():
if event.event == "error_event":
close_conn = True
text = json.dumps({"event": event.event, "data": event.data})
chunk = GenerationChunk(text=text)
yield chunk
if close_conn:
client.close()
def _get_full_url(self, path: str) -> str:
"""
Return the full API URL for a given path.
:param str path: the sub-path
:returns: the full API URL for the sub-path
:rtype: str
"""
return f"{self.host_url}{self.API_BASE_PATH}{path}"
def nlp_predict(
self,
project: str,
endpoint: str,
key: str,
input: Union[List[str], str],
params: Optional[str] = "",
stream: bool = False,
) -> Dict:
"""
NLP predict using inline input string.
:param str project: Project ID in which the endpoint exists
:param str endpoint: Endpoint ID
:param str key: API Key
:param str input_str: Input string
:param str params: Input params string
:returns: Prediction results
:rtype: dict
"""
if isinstance(input, str):
input = [input]
if params:
data = {"inputs": input, "params": json.loads(params)}
else:
data = {"inputs": input}
response = self.http_session.post(
self._get_full_url(f"/predict/nlp/{project}/{endpoint}"),
headers={"key": key},
json=data,
)
return SSEndpointHandler._process_response(response)
def nlp_predict_stream(
self,
project: str,
endpoint: str,
key: str,
input: Union[List[str], str],
params: Optional[str] = "",
) -> Iterator[GenerationChunk]:
"""
NLP predict using inline input string.
:param str project: Project ID in which the endpoint exists
:param str endpoint: Endpoint ID
:param str key: API Key
:param str input_str: Input string
:param str params: Input params string
:returns: Prediction results
:rtype: dict
"""
if isinstance(input, str):
input = [input]
if params:
data = {"inputs": input, "params": json.loads(params)}
else:
data = {"inputs": input}
# Streaming output
response = self.http_session.post(
self._get_full_url(f"/predict/nlp/stream/{project}/{endpoint}"),
headers={"key": key},
json=data,
stream=True,
)
for chunk in SSEndpointHandler._process_streaming_response(response):
yield chunk
class SambaStudio(LLM):
"""
SambaStudio large language models.
To use, you should have the environment variables
``SAMBASTUDIO_BASE_URL`` set with your SambaStudio environment URL.
``SAMBASTUDIO_PROJECT_ID`` set with your SambaStudio project ID.
``SAMBASTUDIO_ENDPOINT_ID`` set with your SambaStudio endpoint ID.
``SAMBASTUDIO_API_KEY`` set with your SambaStudio endpoint API key.
https://sambanova.ai/products/enterprise-ai-platform-sambanova-suite
read extra documentation in https://docs.sambanova.ai/sambastudio/latest/index.html
Example:
.. code-block:: python
from langchain_community.llms.sambanova import Sambaverse
SambaStudio(
base_url="your SambaStudio environment URL",
project_id=set with your SambaStudio project ID.,
endpoint_id=set with your SambaStudio endpoint ID.,
api_token= set with your SambaStudio endpoint API key.,
streaming=false
model_kwargs={
"do_sample": False,
"max_tokens_to_generate": 1000,
"temperature": 0.7,
"top_p": 1.0,
"repetition_penalty": 1,
"top_k": 50,
},
)
"""
base_url: str = ""
"""Base url to use"""
project_id: str = ""
"""Project id on sambastudio for model"""
endpoint_id: str = ""
"""endpoint id on sambastudio for model"""
api_key: str = ""
"""sambastudio api key"""
model_kwargs: Optional[dict] = None
"""Key word arguments to pass to the model."""
streaming: Optional[bool] = False
"""Streaming flag to get streamed response."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@classmethod
def is_lc_serializable(cls) -> bool:
return True
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Get the identifying parameters."""
return {**{"model_kwargs": self.model_kwargs}}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "Sambastudio LLM"
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
values["base_url"] = get_from_dict_or_env(
values, "sambastudio_base_url", "SAMBASTUDIO_BASE_URL"
)
values["project_id"] = get_from_dict_or_env(
values, "sambastudio_project_id", "SAMBASTUDIO_PROJECT_ID"
)
values["endpoint_id"] = get_from_dict_or_env(
values, "sambastudio_endpoint_id", "SAMBASTUDIO_ENDPOINT_ID"
)
values["api_key"] = get_from_dict_or_env(
values, "sambastudio_api_key", "SAMBASTUDIO_API_KEY"
)
return values
def _get_tuning_params(self, stop: Optional[List[str]]) -> str:
"""
Get the tuning parameters to use when calling the LLM.
Args:
stop: Stop words to use when generating. Model output is cut off at the
first occurrence of any of the stop substrings.
Returns:
The tuning parameters as a JSON string.
"""
_model_kwargs = self.model_kwargs or {}
_stop_sequences = _model_kwargs.get("stop_sequences", [])
_stop_sequences = stop or _stop_sequences
# _model_kwargs['stop_sequences'] = ','.join(
# f"'{x}'" for x in _stop_sequences)
tuning_params_dict = {
k: {"type": type(v).__name__, "value": str(v)}
for k, v in (_model_kwargs.items())
}
tuning_params = json.dumps(tuning_params_dict)
return tuning_params
def _handle_nlp_predict(
self, sdk: SSEndpointHandler, prompt: Union[List[str], str], tuning_params: str
) -> str:
"""
Perform an NLP prediction using the SambaStudio endpoint handler.
Args:
sdk: The SSEndpointHandler to use for the prediction.
prompt: The prompt to use for the prediction.
tuning_params: The tuning parameters to use for the prediction.
Returns:
The prediction result.
Raises:
ValueError: If the prediction fails.
"""
response = sdk.nlp_predict(
self.project_id, self.endpoint_id, self.api_key, prompt, tuning_params
)
if response["status_code"] != 200:
optional_detail = response["detail"]
raise ValueError(
f"Sambanova /complete call failed with status code "
f"{response['status_code']}. Details: {optional_detail}"
)
return response["data"][0]["completion"]
def _handle_completion_requests(
self, prompt: Union[List[str], str], stop: Optional[List[str]]
) -> str:
"""
Perform a prediction using the SambaStudio endpoint handler.
Args:
prompt: The prompt to use for the prediction.
stop: stop sequences.
Returns:
The prediction result.
Raises:
ValueError: If the prediction fails.
"""
ss_endpoint = SSEndpointHandler(self.base_url)
tuning_params = self._get_tuning_params(stop)
return self._handle_nlp_predict(ss_endpoint, prompt, tuning_params)
def _handle_nlp_predict_stream(
self, sdk: SSEndpointHandler, prompt: Union[List[str], str], tuning_params: str
) -> Iterator[GenerationChunk]:
"""
Perform a streaming request to the LLM.
Args:
sdk: The SVEndpointHandler to use for the prediction.
prompt: The prompt to use for the prediction.
tuning_params: The tuning parameters to use for the prediction.
Returns:
An iterator of GenerationChunks.
"""
for chunk in sdk.nlp_predict_stream(
self.project_id, self.endpoint_id, self.api_key, prompt, tuning_params
):
yield chunk
def _stream(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
"""Call out to Sambanova's complete endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
"""
ss_endpoint = SSEndpointHandler(self.base_url)
tuning_params = self._get_tuning_params(stop)
try:
if self.streaming:
for chunk in self._handle_nlp_predict_stream(
ss_endpoint, prompt, tuning_params
):
if run_manager:
run_manager.on_llm_new_token(chunk.text)
yield chunk
else:
return
except Exception as e:
# Handle any errors raised by the inference endpoint
raise ValueError(f"Error raised by the inference endpoint: {e}") from e
def _handle_stream_request(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]],
run_manager: Optional[CallbackManagerForLLMRun],
kwargs: Dict[str, Any],
) -> str:
"""
Perform a streaming request to the LLM.
Args:
prompt: The prompt to generate from.
stop: Stop words to use when generating. Model output is cut off at the
first occurrence of any of the stop substrings.
run_manager: Callback manager for the run.
**kwargs: Additional keyword arguments. directly passed
to the sambaverse model in API call.
Returns:
The model output as a string.
"""
completion = ""
for chunk in self._stream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
def _call(
self,
prompt: Union[List[str], str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Call out to Sambanova's complete endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
"""
if stop is not None:
raise Exception("stop not implemented")
try:
if self.streaming:
return self._handle_stream_request(prompt, stop, run_manager, kwargs)
return self._handle_completion_requests(prompt, stop)
except Exception as e:
# Handle any errors raised by the inference endpoint
raise ValueError(f"Error raised by the inference endpoint: {e}") from e

@ -169,7 +169,9 @@ def get_full_path(path: str) -> str:
or (path in ["unknown", "-", "in-memory"])
):
return path
full_path = pathlib.Path(path).resolve()
full_path = pathlib.Path(path)
if full_path.exists():
full_path = full_path.resolve()
return str(full_path)

@ -0,0 +1,28 @@
"""Test sambanova API wrapper.
In order to run this test, you need to have an sambaverse api key,
and a sambaverse base url, project id, endpoint id, and api key.
You'll then need to set SAMBAVERSE_API_KEY, SAMBASTUDIO_BASE_URL,
SAMBASTUDIO_PROJECT_ID, SAMBASTUDIO_ENDPOINT_ID, and SAMBASTUDIO_API_KEY
environment variables.
"""
from langchain_community.llms.sambanova import SambaStudio, Sambaverse
def test_sambaverse_call() -> None:
"""Test simple non-streaming call to sambaverse."""
llm = Sambaverse(
sambaverse_model_name="Meta/llama-2-7b-chat-hf",
model_kwargs={"select_expert": "llama-2-7b-chat-hf"},
)
output = llm.invoke("What is LangChain")
assert output
assert isinstance(output, str)
def test_sambastudio_call() -> None:
"""Test simple non-streaming call to sambaverse."""
llm = SambaStudio()
output = llm.invoke("What is LangChain")
assert output
assert isinstance(output, str)

@ -8,6 +8,64 @@ from langchain_community.document_loaders.directory import DirectoryLoader
class TestDirectoryLoader:
# Tests that when multhreading is enabled, multiple documents are read successfully.
def test_directory_loader_with_multithreading_enabled(self) -> None:
dir_path = self._get_csv_dir_path()
loader = DirectoryLoader(
dir_path, glob="**/*.csv", loader_cls=CSVLoader, use_multithreading=True
)
expected_docs = [
Document(
page_content="column1: value1",
metadata={
"source": self._get_csv_file_path("test_one_col.csv"),
"row": 0,
},
),
Document(
page_content="column1: value2",
metadata={
"source": self._get_csv_file_path("test_one_col.csv"),
"row": 1,
},
),
Document(
page_content="column1: value3",
metadata={
"source": self._get_csv_file_path("test_one_col.csv"),
"row": 2,
},
),
Document(
page_content="column1: value1\ncolumn2: value2\ncolumn3: value3",
metadata={
"source": self._get_csv_file_path("test_one_row.csv"),
"row": 0,
},
),
Document(
page_content="column1: value1\ncolumn2: value2\ncolumn3: value3",
metadata={
"source": self._get_csv_file_path("test_nominal.csv"),
"row": 0,
},
),
Document(
page_content="column1: value4\ncolumn2: value5\ncolumn3: value6",
metadata={
"source": self._get_csv_file_path("test_nominal.csv"),
"row": 1,
},
),
]
loaded_docs = sorted(loader.load(), key=lambda doc: doc.metadata["source"])
expected_docs = sorted(expected_docs, key=lambda doc: doc.metadata["source"])
for i, doc in enumerate(loaded_docs):
assert doc == expected_docs[i]
# Tests that lazy loading a CSV file with multiple documents is successful.
def test_directory_loader_lazy_load_single_file_multiple_docs(self) -> None:
# Setup

@ -218,7 +218,7 @@ def get_function_first_arg_dict_keys(func: Callable) -> Optional[List[str]]:
visitor = IsFunctionArgDict()
visitor.visit(tree)
return list(visitor.keys) if visitor.keys else None
except (SyntaxError, TypeError, OSError):
except (SyntaxError, TypeError, OSError, SystemError):
return None
@ -241,7 +241,7 @@ def get_lambda_source(func: Callable) -> Optional[str]:
visitor = GetLambdaSource()
visitor.visit(tree)
return visitor.source if visitor.count == 1 else name
except (SyntaxError, TypeError, OSError):
except (SyntaxError, TypeError, OSError, SystemError):
return name
@ -270,7 +270,7 @@ def get_function_nonlocals(func: Callable) -> List[Any]:
else:
values.append(vv)
return values
except (SyntaxError, TypeError, OSError):
except (SyntaxError, TypeError, OSError, SystemError):
return []

@ -88,6 +88,8 @@ def extract_sub_links(
absolute_path = f"{parsed_url.scheme}:{link}"
else:
absolute_path = urljoin(url, parsed_link.path)
if parsed_link.query:
absolute_path += f"?{parsed_link.query}"
absolute_paths.add(absolute_path)
except Exception as e:
if continue_on_failure:

@ -183,3 +183,27 @@ def test_prevent_outside() -> None:
)
)
assert actual == expected
def test_extract_sub_links_with_query() -> None:
html = (
'<a href="https://foobar.com?query=123">one</a>'
'<a href="/hello?query=456">two</a>'
'<a href="//foobar.com/how/are/you?query=789">three</a>'
'<a href="doing?query=101112"></a>'
)
expected = sorted(
[
"https://foobar.com?query=123",
"https://foobar.com/hello?query=456",
"https://foobar.com/how/are/you?query=789",
"https://foobar.com/hello/doing?query=101112",
]
)
actual = sorted(
extract_sub_links(
html, "https://foobar.com/hello/bill.html", base_url="https://foobar.com"
)
)
assert actual == expected, f"Expected {expected}, but got {actual}"

Loading…
Cancel
Save