From 34284c25d4de4352bede97724fc1ef0bf10460bb Mon Sep 17 00:00:00 2001 From: Bagatur <22008038+baskaryan@users.noreply.github.com> Date: Mon, 11 Mar 2024 10:50:39 -0700 Subject: [PATCH] docs: turn on link check (#18924) --- .../llms/huggingface_pipelines.ipynb | 2 +- .../integrations/platforms/huggingface.mdx | 55 +++---------------- docs/docs/integrations/providers/arangodb.mdx | 4 +- docs/docs/integrations/providers/neo4j.mdx | 4 +- .../providers/ontotext_graphdb.mdx | 4 +- docs/docs/integrations/providers/sparkllm.mdx | 2 +- .../tools/passio_nutrition_ai.ipynb | 10 ++-- .../extraction/how_to/handle_files.ipynb | 4 +- docs/docs/use_cases/extraction/index.ipynb | 2 +- docs/docs/use_cases/summarization.ipynb | 2 +- docs/docusaurus.config.js | 2 +- docs/scripts/copy_templates.py | 2 + docs/scripts/resolve_local_links.py | 21 +++++++ docs/vercel_build.sh | 12 ++++ templates/sql-pgvector/README.md | 4 +- 15 files changed, 64 insertions(+), 66 deletions(-) create mode 100644 docs/scripts/resolve_local_links.py diff --git a/docs/docs/integrations/llms/huggingface_pipelines.ipynb b/docs/docs/integrations/llms/huggingface_pipelines.ipynb index 6b2f48c9d7..4c07c06e03 100644 --- a/docs/docs/integrations/llms/huggingface_pipelines.ipynb +++ b/docs/docs/integrations/llms/huggingface_pipelines.ipynb @@ -11,7 +11,7 @@ "\n", "The [Hugging Face Model Hub](https://huggingface.co/models) hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n", "\n", - "These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class. For more information on the hosted pipelines, see the [HuggingFaceHub](./huggingface_hub) notebook." + "These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class." ] }, { diff --git a/docs/docs/integrations/platforms/huggingface.mdx b/docs/docs/integrations/platforms/huggingface.mdx index b985a3b956..cf6ae39811 100644 --- a/docs/docs/integrations/platforms/huggingface.mdx +++ b/docs/docs/integrations/platforms/huggingface.mdx @@ -2,28 +2,26 @@ All functionality related to the [Hugging Face Platform](https://huggingface.co/). -## LLMs +## Chat models -### Hugging Face Hub +### Models from Hugging Face ->The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform -> with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source -> and publicly available, in an online platform where people can easily -> collaborate and build ML together. The Hub works as a central place where anyone -> can explore, experiment, collaborate, and build technology with Machine Learning. +We can use the `Hugging Face` LLM classes or directly use the `ChatHuggingFace` class. -To use, we should have the `huggingface_hub` python [package installed](https://huggingface.co/docs/huggingface_hub/installation). +We need to install several python packages. ```bash pip install huggingface_hub +pip install transformers ``` - -See a [usage example](/docs/integrations/llms/huggingface_hub). +See a [usage example](/docs/integrations/chat/huggingface). ```python -from langchain_community.llms import HuggingFaceHub +from langchain_community.chat_models.huggingface import ChatHuggingFace ``` +## LLMs + ### Hugging Face Local Pipelines Hugging Face models can be run locally through the `HuggingFacePipeline` class. @@ -56,41 +54,6 @@ optimum-cli export openvino --model gpt2 ov_model To apply [weight-only quantization](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#export) when exporting your model. -### Hugging Face TextGen Inference - ->[Text Generation Inference](https://github.com/huggingface/text-generation-inference) is -> a Rust, Python and gRPC server for text generation inference. Used in production at -> [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets. - -We need to install `text_generation` python package. - -```bash -pip install text_generation -``` - -See a [usage example](/docs/integrations/llms/huggingface_textgen_inference). - -```python -from langchain_community.llms import HuggingFaceTextGenInference -``` - -## Chat models - -### Models from Hugging Face - -We can use the `Hugging Face` LLM classes or directly use the `ChatHuggingFace` class. - -We need to install several python packages. - -```bash -pip install huggingface_hub -pip install transformers -``` -See a [usage example](/docs/integrations/chat/huggingface). - -```python -from langchain_community.chat_models.huggingface import ChatHuggingFace -``` ## Embedding Models diff --git a/docs/docs/integrations/providers/arangodb.mdx b/docs/docs/integrations/providers/arangodb.mdx index 2ce6d235df..6720685965 100644 --- a/docs/docs/integrations/providers/arangodb.mdx +++ b/docs/docs/integrations/providers/arangodb.mdx @@ -15,11 +15,11 @@ pip install python-arango Connect your `ArangoDB` Database with a chat model to get insights on your data. -See the notebook example [here](/docs/use_cases/graph/graph_arangodb_qa). +See the notebook example [here](/docs/use_cases/graph/integrations/graph_arangodb_qa). ```python from arango import ArangoClient from langchain_community.graphs import ArangoGraph from langchain.chains import ArangoGraphQAChain -``` \ No newline at end of file +``` diff --git a/docs/docs/integrations/providers/neo4j.mdx b/docs/docs/integrations/providers/neo4j.mdx index 71dc22622e..cbc747c512 100644 --- a/docs/docs/integrations/providers/neo4j.mdx +++ b/docs/docs/integrations/providers/neo4j.mdx @@ -35,7 +35,7 @@ from langchain_community.graphs import Neo4jGraph from langchain.chains import GraphCypherQAChain ``` -See a [usage example](/docs/use_cases/graph/graph_cypher_qa) +See a [usage example](/docs/use_cases/graph/integrations/graph_cypher_qa) ## Constructing a knowledge graph from text @@ -49,7 +49,7 @@ from langchain_community.graphs import Neo4jGraph from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer ``` -See a [usage example](/docs/use_cases/graph/diffbot_graphtransformer) +See a [usage example](/docs/use_cases/graph/integrations/diffbot_graphtransformer) ## Memory diff --git a/docs/docs/integrations/providers/ontotext_graphdb.mdx b/docs/docs/integrations/providers/ontotext_graphdb.mdx index 1b941e72d9..3502c68d96 100644 --- a/docs/docs/integrations/providers/ontotext_graphdb.mdx +++ b/docs/docs/integrations/providers/ontotext_graphdb.mdx @@ -13,9 +13,9 @@ pip install rdflib==7.0.0 Connect your GraphDB Database with a chat model to get insights on your data. -See the notebook example [here](/docs/use_cases/graph/graph_ontotext_graphdb_qa). +See the notebook example [here](/docs/use_cases/graph/integrations/graph_ontotext_graphdb_qa). ```python from langchain_community.graphs import OntotextGraphDBGraph from langchain.chains import OntotextGraphDBQAChain -``` \ No newline at end of file +``` diff --git a/docs/docs/integrations/providers/sparkllm.mdx b/docs/docs/integrations/providers/sparkllm.mdx index aab53960cb..e9d7f94b18 100644 --- a/docs/docs/integrations/providers/sparkllm.mdx +++ b/docs/docs/integrations/providers/sparkllm.mdx @@ -5,7 +5,7 @@ It has cross-domain knowledge and language understanding ability by learning a l It can understand and perform tasks based on natural dialogue. ## SparkLLM LLM Model -An example is available at [example](/docs/integrations/llm/sparkllm). +An example is available at [example](/docs/integrations/llms/sparkllm). ## SparkLLM Chat Model An example is available at [example](/docs/integrations/chat/sparkllm). diff --git a/docs/docs/integrations/tools/passio_nutrition_ai.ipynb b/docs/docs/integrations/tools/passio_nutrition_ai.ipynb index 81f241b247..1c655bcfac 100644 --- a/docs/docs/integrations/tools/passio_nutrition_ai.ipynb +++ b/docs/docs/integrations/tools/passio_nutrition_ai.ipynb @@ -11,7 +11,7 @@ "\n", "## Define tools\n", "\n", - "We first need to create [the Passio NutritionAI tool](/docs/integrations/tools/passio_nutritionai)." + "We first need to create [the Passio NutritionAI tool](/docs/integrations/tools/passio_nutrition_ai)." ] }, { @@ -19,7 +19,7 @@ "id": "c335d1bf", "metadata": {}, "source": [ - "### [Passio Nutrition AI](/docs/integrations/tools/passio_nutritionai-agent)\n", + "### [Passio Nutrition AI](/docs/integrations/tools/passio_nutrition_ai)\n", "\n", "We have a built-in tool in LangChain to easily use Passio NutritionAI to find food nutrition facts.\n", "Note that this requires an API key - they have a free tier.\n", @@ -2098,7 +2098,7 @@ "source": [ "## Create the agent\n", "\n", - "Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more information on this type of agent, as well as other options, see [this guide](./agent_types)\n", + "Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more information on this type of agent, as well as other options, see [this guide](/docs/modules/agents/agent_types/)\n", "\n", "First, we choose the LLM we want to be guiding the agent." ] @@ -2156,7 +2156,7 @@ "id": "f8014c9d", "metadata": {}, "source": [ - "Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next step). For more information about how to think about these components, see our [conceptual guide](./concepts)" + "Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next step). For more information about how to think about these components, see our [conceptual guide](/docs/modules/agents/concepts)" ] }, { @@ -2176,7 +2176,7 @@ "id": "1a58c9f8", "metadata": {}, "source": [ - "Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and execute tools). For more information about how to think about these components, see our [conceptual guide](./concepts)" + "Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and execute tools). For more information about how to think about these components, see our [conceptual guide](/docs/modules/agents/concepts)" ] }, { diff --git a/docs/docs/use_cases/extraction/how_to/handle_files.ipynb b/docs/docs/use_cases/extraction/how_to/handle_files.ipynb index eed1eb16ac..b94c7e6030 100644 --- a/docs/docs/use_cases/extraction/how_to/handle_files.ipynb +++ b/docs/docs/use_cases/extraction/how_to/handle_files.ipynb @@ -19,13 +19,13 @@ "source": [ "Besides raw text data, you may wish to extract information from other file types such as PowerPoint presentations or PDFs.\n", "\n", - "You can use LangChain [document loaders](/modules/data_connection/document_loaders/) to parse files into a text format that can be fed into LLMs.\n", + "You can use LangChain [document loaders](/docs/modules/data_connection/document_loaders/) to parse files into a text format that can be fed into LLMs.\n", "\n", "LangChain features a large number of [document loader integrations](/docs/integrations/document_loaders).\n", "\n", "## MIME type based parsing\n", "\n", - "For basic parsing exmaples take a look [at document loaders](/modules/data_connection/document_loaders/).\n", + "For basic parsing exmaples take a look [at document loaders](/docs/modules/data_connection/document_loaders/).\n", "\n", "Here, we'll be looking at mime-type based parsing which is often useful for extraction based applications if you're writing server code that accepts user uploaded files.\n", "\n", diff --git a/docs/docs/use_cases/extraction/index.ipynb b/docs/docs/use_cases/extraction/index.ipynb index 48f141e144..42a66aac76 100644 --- a/docs/docs/use_cases/extraction/index.ipynb +++ b/docs/docs/use_cases/extraction/index.ipynb @@ -63,7 +63,7 @@ "## Other Resources\n", "\n", "* The [output parser](/docs/modules/model_io/output_parsers/) documentation includes various parser examples for specific types (e.g., lists, datetime, enum, etc).\n", - "* LangChain [document loaders](/modules/data_connection/document_loaders/) to load content from files. Please see list of [integrations](/docs/integrations/document_loaders).\n", + "* LangChain [document loaders](/docs/modules/data_connection/document_loaders/) to load content from files. Please see list of [integrations](/docs/integrations/document_loaders).\n", "* The experimental [Anthropic function calling](https://python.langchain.com/docs/integrations/chat/anthropic_functions) support provides similar functionality to Anthropic chat models.\n", "* [LlamaCPP](https://python.langchain.com/docs/integrations/llms/llamacpp#grammars) natively supports constrained decoding using custom grammars, making it easy to output structured content using local LLMs \n", "* [JSONFormer](/docs/integrations/llms/jsonformer_experimental) offers another way for structured decoding of a subset of the JSON Schema.\n", diff --git a/docs/docs/use_cases/summarization.ipynb b/docs/docs/use_cases/summarization.ipynb index f22e56cfb7..bf45545875 100644 --- a/docs/docs/use_cases/summarization.ipynb +++ b/docs/docs/use_cases/summarization.ipynb @@ -183,7 +183,7 @@ "* 16k token OpenAI `gpt-3.5-turbo-1106` \n", "* 100k token Anthropic [Claude-2](https://www.anthropic.com/index/claude-2)\n", "\n", - "We can also supply `chain_type=\"map_reduce\"` or `chain_type=\"refine\"` (read more [here](/docs/modules/chains/document/refine))." + "We can also supply `chain_type=\"map_reduce\"` or `chain_type=\"refine\"`." ] }, { diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js index 7e66f4a193..c26bef5b4b 100644 --- a/docs/docusaurus.config.js +++ b/docs/docusaurus.config.js @@ -20,7 +20,7 @@ const config = { // For GitHub pages deployment, it is often '//' baseUrl: "/", - onBrokenLinks: "warn", + onBrokenLinks: "throw", onBrokenMarkdownLinks: "throw", themes: ["@docusaurus/theme-mermaid"], diff --git a/docs/scripts/copy_templates.py b/docs/scripts/copy_templates.py index 21b0c7a4f3..b397c6d1d7 100644 --- a/docs/scripts/copy_templates.py +++ b/docs/scripts/copy_templates.py @@ -28,7 +28,9 @@ sidebar_class_name: hidden TEMPLATES_INDEX_DESTINATION = DOCS_TEMPLATES_DIR / "index.md" with open(TEMPLATES_INDEX_DESTINATION, "r") as f: content = f.read() + # replace relative links content = re.sub("\]\(\.\.\/", "](/docs/templates/", content) + with open(TEMPLATES_INDEX_DESTINATION, "w") as f: f.write(sidebar_hidden + content) diff --git a/docs/scripts/resolve_local_links.py b/docs/scripts/resolve_local_links.py new file mode 100644 index 0000000000..1a329cdc66 --- /dev/null +++ b/docs/scripts/resolve_local_links.py @@ -0,0 +1,21 @@ +import os +import re +import sys +from pathlib import Path + +DOCS_DIR = Path(os.path.abspath(__file__)).parents[1] + + +def update_links(doc_path, docs_link): + with open(DOCS_DIR / doc_path, "r") as f: + content = f.read() + + # replace relative links + content = re.sub("\]\(\.\/", f"]({docs_link}", content) + + with open(DOCS_DIR / doc_path, "w") as f: + f.write(content) + + +if __name__ == "__main__": + update_links(sys.argv[1], sys.argv[2]) diff --git a/docs/vercel_build.sh b/docs/vercel_build.sh index 830de694e7..8f1f075836 100755 --- a/docs/vercel_build.sh +++ b/docs/vercel_build.sh @@ -10,15 +10,27 @@ tar -xzf quarto-1.3.450-linux-amd64.tar.gz export PATH=$PATH:$(pwd)/quarto-1.3.450/bin/ +# setup python env python3.8 -m venv .venv source .venv/bin/activate python3.8 -m pip install --upgrade pip python3.8 -m pip install -r vercel_requirements.txt + +# autogenerate integrations tables python3.8 scripts/model_feat_table.py + +# copy in external files mkdir docs/templates cp ../templates/docs/INDEX.md docs/templates/index.md python3.8 scripts/copy_templates.py + cp ../cookbook/README.md src/pages/cookbook.mdx + wget -q https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md +python3.8 scripts/resolve_local_links.py docs/langserve.md https://github.com/langchain-ai/langserve/tree/main/ + wget -q https://raw.githubusercontent.com/langchain-ai/langgraph/main/README.md -O docs/langgraph.md +python3.8 scripts/resolve_local_links.py docs/langgraph.md https://github.com/langchain-ai/langgraph/tree/main/ + +# render quarto render docs/ diff --git a/templates/sql-pgvector/README.md b/templates/sql-pgvector/README.md index ac5eef762d..d454e6c14d 100644 --- a/templates/sql-pgvector/README.md +++ b/templates/sql-pgvector/README.md @@ -40,7 +40,7 @@ Apart from having `pgvector` extension enabled, you will need to do some setup b In order to run RAG over your postgreSQL database you will need to generate the embeddings for the specific columns you want. -This process is covered in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb), but the overall approach consist of: +This process is covered in the [RAG empowered SQL cookbook](https://github.com/langchain-ai/langchain/blob/master/cookbook/retrieval_in_sql.ipynb), but the overall approach consist of: 1. Querying for unique values in the column 2. Generating embeddings for those values 3. Store the embeddings in a separate column or in an auxiliary table. @@ -102,4 +102,4 @@ We can access the template from code with: from langserve.client import RemoteRunnable runnable = RemoteRunnable("http://localhost:8000/sql-pgvector") -``` \ No newline at end of file +```