From 34284c25d4de4352bede97724fc1ef0bf10460bb Mon Sep 17 00:00:00 2001
From: Bagatur <22008038+baskaryan@users.noreply.github.com>
Date: Mon, 11 Mar 2024 10:50:39 -0700
Subject: [PATCH] docs: turn on link check (#18924)

---
 .../llms/huggingface_pipelines.ipynb          |  2 +-
 .../integrations/platforms/huggingface.mdx    | 55 +++----------------
 docs/docs/integrations/providers/arangodb.mdx |  4 +-
 docs/docs/integrations/providers/neo4j.mdx    |  4 +-
 .../providers/ontotext_graphdb.mdx            |  4 +-
 docs/docs/integrations/providers/sparkllm.mdx |  2 +-
 .../tools/passio_nutrition_ai.ipynb           | 10 ++--
 .../extraction/how_to/handle_files.ipynb      |  4 +-
 docs/docs/use_cases/extraction/index.ipynb    |  2 +-
 docs/docs/use_cases/summarization.ipynb       |  2 +-
 docs/docusaurus.config.js                     |  2 +-
 docs/scripts/copy_templates.py                |  2 +
 docs/scripts/resolve_local_links.py           | 21 +++++++
 docs/vercel_build.sh                          | 12 ++++
 templates/sql-pgvector/README.md              |  4 +-
 15 files changed, 64 insertions(+), 66 deletions(-)
 create mode 100644 docs/scripts/resolve_local_links.py

diff --git a/docs/docs/integrations/llms/huggingface_pipelines.ipynb b/docs/docs/integrations/llms/huggingface_pipelines.ipynb
index 6b2f48c9d7..4c07c06e03 100644
--- a/docs/docs/integrations/llms/huggingface_pipelines.ipynb
+++ b/docs/docs/integrations/llms/huggingface_pipelines.ipynb
@@ -11,7 +11,7 @@
     "\n",
     "The [Hugging Face Model Hub](https://huggingface.co/models) hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
     "\n",
-    "These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class. For more information on the hosted pipelines, see the [HuggingFaceHub](./huggingface_hub) notebook."
+    "These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class."
    ]
   },
   {
diff --git a/docs/docs/integrations/platforms/huggingface.mdx b/docs/docs/integrations/platforms/huggingface.mdx
index b985a3b956..cf6ae39811 100644
--- a/docs/docs/integrations/platforms/huggingface.mdx
+++ b/docs/docs/integrations/platforms/huggingface.mdx
@@ -2,28 +2,26 @@
 
 All functionality related to the [Hugging Face Platform](https://huggingface.co/).
 
-## LLMs
+## Chat models
 
-### Hugging Face Hub
+### Models from Hugging Face
 
->The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform 
-> with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source 
-> and publicly available, in an online platform where people can easily 
-> collaborate and build ML together. The Hub works as a central place where anyone 
-> can explore, experiment, collaborate, and build technology with Machine Learning.
+We can use the `Hugging Face` LLM classes or directly use the `ChatHuggingFace` class.
 
-To use, we should have the `huggingface_hub` python [package installed](https://huggingface.co/docs/huggingface_hub/installation).
+We need to install several python packages.
 
 ```bash
 pip install huggingface_hub
+pip install transformers
 ```
-
-See a [usage example](/docs/integrations/llms/huggingface_hub).
+See a [usage example](/docs/integrations/chat/huggingface).
 
 ```python
-from langchain_community.llms import HuggingFaceHub
+from langchain_community.chat_models.huggingface import ChatHuggingFace
 ```
 
+## LLMs
+
 ### Hugging Face Local Pipelines
 
 Hugging Face models can be run locally through the `HuggingFacePipeline` class.
@@ -56,41 +54,6 @@ optimum-cli export openvino --model gpt2 ov_model
 
 To apply [weight-only quantization](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#export) when exporting your model.
 
-### Hugging Face TextGen Inference
-
->[Text Generation Inference](https://github.com/huggingface/text-generation-inference) is 
-> a Rust, Python and gRPC server for text generation inference. Used in production at 
-> [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
-
-We need to install `text_generation` python package.
-
-```bash
-pip install text_generation
-```
-
-See a [usage example](/docs/integrations/llms/huggingface_textgen_inference).
-
-```python
-from langchain_community.llms import HuggingFaceTextGenInference
-```
-
-## Chat models
-
-### Models from Hugging Face
-
-We can use the `Hugging Face` LLM classes or directly use the `ChatHuggingFace` class.
-
-We need to install several python packages.
-
-```bash
-pip install huggingface_hub
-pip install transformers
-```
-See a [usage example](/docs/integrations/chat/huggingface).
-
-```python
-from langchain_community.chat_models.huggingface import ChatHuggingFace
-```
 
 ## Embedding Models
 
diff --git a/docs/docs/integrations/providers/arangodb.mdx b/docs/docs/integrations/providers/arangodb.mdx
index 2ce6d235df..6720685965 100644
--- a/docs/docs/integrations/providers/arangodb.mdx
+++ b/docs/docs/integrations/providers/arangodb.mdx
@@ -15,11 +15,11 @@ pip install python-arango
 
 Connect your `ArangoDB` Database with a chat model to get insights on your data. 
 
-See the notebook example [here](/docs/use_cases/graph/graph_arangodb_qa).
+See the notebook example [here](/docs/use_cases/graph/integrations/graph_arangodb_qa).
 
 ```python
 from arango import ArangoClient
 
 from langchain_community.graphs import ArangoGraph
 from langchain.chains import ArangoGraphQAChain
-```
\ No newline at end of file
+```
diff --git a/docs/docs/integrations/providers/neo4j.mdx b/docs/docs/integrations/providers/neo4j.mdx
index 71dc22622e..cbc747c512 100644
--- a/docs/docs/integrations/providers/neo4j.mdx
+++ b/docs/docs/integrations/providers/neo4j.mdx
@@ -35,7 +35,7 @@ from langchain_community.graphs import Neo4jGraph
 from langchain.chains import GraphCypherQAChain
 ```
 
-See a [usage example](/docs/use_cases/graph/graph_cypher_qa)
+See a [usage example](/docs/use_cases/graph/integrations/graph_cypher_qa)
 
 ## Constructing a knowledge graph from text
 
@@ -49,7 +49,7 @@ from langchain_community.graphs import Neo4jGraph
 from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
 ```
 
-See a [usage example](/docs/use_cases/graph/diffbot_graphtransformer)
+See a [usage example](/docs/use_cases/graph/integrations/diffbot_graphtransformer)
 
 ## Memory
 
diff --git a/docs/docs/integrations/providers/ontotext_graphdb.mdx b/docs/docs/integrations/providers/ontotext_graphdb.mdx
index 1b941e72d9..3502c68d96 100644
--- a/docs/docs/integrations/providers/ontotext_graphdb.mdx
+++ b/docs/docs/integrations/providers/ontotext_graphdb.mdx
@@ -13,9 +13,9 @@ pip install rdflib==7.0.0
 
 Connect your GraphDB Database with a chat model to get insights on your data.
 
-See the notebook example [here](/docs/use_cases/graph/graph_ontotext_graphdb_qa).
+See the notebook example [here](/docs/use_cases/graph/integrations/graph_ontotext_graphdb_qa).
 
 ```python
 from langchain_community.graphs import OntotextGraphDBGraph
 from langchain.chains import OntotextGraphDBQAChain
-```
\ No newline at end of file
+```
diff --git a/docs/docs/integrations/providers/sparkllm.mdx b/docs/docs/integrations/providers/sparkllm.mdx
index aab53960cb..e9d7f94b18 100644
--- a/docs/docs/integrations/providers/sparkllm.mdx
+++ b/docs/docs/integrations/providers/sparkllm.mdx
@@ -5,7 +5,7 @@ It has cross-domain knowledge and language understanding ability by learning a l
 It can understand and perform tasks based on natural dialogue.
 
 ## SparkLLM LLM Model
-An example is available at [example](/docs/integrations/llm/sparkllm).
+An example is available at [example](/docs/integrations/llms/sparkllm).
 
 ## SparkLLM Chat Model
 An example is available at [example](/docs/integrations/chat/sparkllm).
diff --git a/docs/docs/integrations/tools/passio_nutrition_ai.ipynb b/docs/docs/integrations/tools/passio_nutrition_ai.ipynb
index 81f241b247..1c655bcfac 100644
--- a/docs/docs/integrations/tools/passio_nutrition_ai.ipynb
+++ b/docs/docs/integrations/tools/passio_nutrition_ai.ipynb
@@ -11,7 +11,7 @@
     "\n",
     "## Define tools\n",
     "\n",
-    "We first need to create [the Passio NutritionAI tool](/docs/integrations/tools/passio_nutritionai)."
+    "We first need to create [the Passio NutritionAI tool](/docs/integrations/tools/passio_nutrition_ai)."
    ]
   },
   {
@@ -19,7 +19,7 @@
    "id": "c335d1bf",
    "metadata": {},
    "source": [
-    "### [Passio Nutrition AI](/docs/integrations/tools/passio_nutritionai-agent)\n",
+    "### [Passio Nutrition AI](/docs/integrations/tools/passio_nutrition_ai)\n",
     "\n",
     "We have a built-in tool in LangChain to easily use Passio NutritionAI to find food nutrition facts.\n",
     "Note that this requires an API key - they have a free tier.\n",
@@ -2098,7 +2098,7 @@
    "source": [
     "## Create the agent\n",
     "\n",
-    "Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more information on this type of agent, as well as other options, see [this guide](./agent_types)\n",
+    "Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more information on this type of agent, as well as other options, see [this guide](/docs/modules/agents/agent_types/)\n",
     "\n",
     "First, we choose the LLM we want to be guiding the agent."
    ]
@@ -2156,7 +2156,7 @@
    "id": "f8014c9d",
    "metadata": {},
    "source": [
-    "Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next step). For more information about how to think about these components, see our [conceptual guide](./concepts)"
+    "Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next step). For more information about how to think about these components, see our [conceptual guide](/docs/modules/agents/concepts)"
    ]
   },
   {
@@ -2176,7 +2176,7 @@
    "id": "1a58c9f8",
    "metadata": {},
    "source": [
-    "Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and execute tools). For more information about how to think about these components, see our [conceptual guide](./concepts)"
+    "Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and execute tools). For more information about how to think about these components, see our [conceptual guide](/docs/modules/agents/concepts)"
    ]
   },
   {
diff --git a/docs/docs/use_cases/extraction/how_to/handle_files.ipynb b/docs/docs/use_cases/extraction/how_to/handle_files.ipynb
index eed1eb16ac..b94c7e6030 100644
--- a/docs/docs/use_cases/extraction/how_to/handle_files.ipynb
+++ b/docs/docs/use_cases/extraction/how_to/handle_files.ipynb
@@ -19,13 +19,13 @@
    "source": [
     "Besides raw text data, you may wish to extract information from other file types such as PowerPoint presentations or PDFs.\n",
     "\n",
-    "You can use LangChain [document loaders](/modules/data_connection/document_loaders/) to parse files into a text format that can be fed into LLMs.\n",
+    "You can use LangChain [document loaders](/docs/modules/data_connection/document_loaders/) to parse files into a text format that can be fed into LLMs.\n",
     "\n",
     "LangChain features a large number of [document loader integrations](/docs/integrations/document_loaders).\n",
     "\n",
     "## MIME type based parsing\n",
     "\n",
-    "For basic parsing exmaples take a look [at document loaders](/modules/data_connection/document_loaders/).\n",
+    "For basic parsing exmaples take a look [at document loaders](/docs/modules/data_connection/document_loaders/).\n",
     "\n",
     "Here, we'll be looking at mime-type based parsing which is often useful for extraction based applications if you're writing server code that accepts user uploaded files.\n",
     "\n",
diff --git a/docs/docs/use_cases/extraction/index.ipynb b/docs/docs/use_cases/extraction/index.ipynb
index 48f141e144..42a66aac76 100644
--- a/docs/docs/use_cases/extraction/index.ipynb
+++ b/docs/docs/use_cases/extraction/index.ipynb
@@ -63,7 +63,7 @@
     "## Other Resources\n",
     "\n",
     "* The [output parser](/docs/modules/model_io/output_parsers/) documentation includes various parser examples for specific types (e.g., lists, datetime, enum, etc).\n",
-    "* LangChain [document loaders](/modules/data_connection/document_loaders/) to load content from files. Please see list of [integrations](/docs/integrations/document_loaders).\n",
+    "* LangChain [document loaders](/docs/modules/data_connection/document_loaders/) to load content from files. Please see list of [integrations](/docs/integrations/document_loaders).\n",
     "* The experimental [Anthropic function calling](https://python.langchain.com/docs/integrations/chat/anthropic_functions) support provides similar functionality to Anthropic chat models.\n",
     "* [LlamaCPP](https://python.langchain.com/docs/integrations/llms/llamacpp#grammars) natively supports constrained decoding using custom grammars, making it easy to output structured content using local LLMs \n",
     "* [JSONFormer](/docs/integrations/llms/jsonformer_experimental) offers another way for structured decoding of a subset of the JSON Schema.\n",
diff --git a/docs/docs/use_cases/summarization.ipynb b/docs/docs/use_cases/summarization.ipynb
index f22e56cfb7..bf45545875 100644
--- a/docs/docs/use_cases/summarization.ipynb
+++ b/docs/docs/use_cases/summarization.ipynb
@@ -183,7 +183,7 @@
     "* 16k token OpenAI `gpt-3.5-turbo-1106` \n",
     "* 100k token Anthropic [Claude-2](https://www.anthropic.com/index/claude-2)\n",
     "\n",
-    "We can also supply `chain_type=\"map_reduce\"` or `chain_type=\"refine\"` (read more [here](/docs/modules/chains/document/refine))."
+    "We can also supply `chain_type=\"map_reduce\"` or `chain_type=\"refine\"`."
    ]
   },
   {
diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js
index 7e66f4a193..c26bef5b4b 100644
--- a/docs/docusaurus.config.js
+++ b/docs/docusaurus.config.js
@@ -20,7 +20,7 @@ const config = {
   // For GitHub pages deployment, it is often '/<projectName>/'
   baseUrl: "/",
 
-  onBrokenLinks: "warn",
+  onBrokenLinks: "throw",
   onBrokenMarkdownLinks: "throw",
 
   themes: ["@docusaurus/theme-mermaid"],
diff --git a/docs/scripts/copy_templates.py b/docs/scripts/copy_templates.py
index 21b0c7a4f3..b397c6d1d7 100644
--- a/docs/scripts/copy_templates.py
+++ b/docs/scripts/copy_templates.py
@@ -28,7 +28,9 @@ sidebar_class_name: hidden
 TEMPLATES_INDEX_DESTINATION = DOCS_TEMPLATES_DIR / "index.md"
 with open(TEMPLATES_INDEX_DESTINATION, "r") as f:
     content = f.read()
+
 # replace relative links
 content = re.sub("\]\(\.\.\/", "](/docs/templates/", content)
+
 with open(TEMPLATES_INDEX_DESTINATION, "w") as f:
     f.write(sidebar_hidden + content)
diff --git a/docs/scripts/resolve_local_links.py b/docs/scripts/resolve_local_links.py
new file mode 100644
index 0000000000..1a329cdc66
--- /dev/null
+++ b/docs/scripts/resolve_local_links.py
@@ -0,0 +1,21 @@
+import os
+import re
+import sys
+from pathlib import Path
+
+DOCS_DIR = Path(os.path.abspath(__file__)).parents[1]
+
+
+def update_links(doc_path, docs_link):
+    with open(DOCS_DIR / doc_path, "r") as f:
+        content = f.read()
+
+    # replace relative links
+    content = re.sub("\]\(\.\/", f"]({docs_link}", content)
+
+    with open(DOCS_DIR / doc_path, "w") as f:
+        f.write(content)
+
+
+if __name__ == "__main__":
+    update_links(sys.argv[1], sys.argv[2])
diff --git a/docs/vercel_build.sh b/docs/vercel_build.sh
index 830de694e7..8f1f075836 100755
--- a/docs/vercel_build.sh
+++ b/docs/vercel_build.sh
@@ -10,15 +10,27 @@ tar -xzf quarto-1.3.450-linux-amd64.tar.gz
 export PATH=$PATH:$(pwd)/quarto-1.3.450/bin/
 
 
+# setup python env
 python3.8 -m venv .venv
 source .venv/bin/activate
 python3.8 -m pip install --upgrade pip
 python3.8 -m pip install -r vercel_requirements.txt
+
+# autogenerate integrations tables
 python3.8 scripts/model_feat_table.py
+
+# copy in external files
 mkdir docs/templates
 cp ../templates/docs/INDEX.md docs/templates/index.md
 python3.8 scripts/copy_templates.py
+
 cp ../cookbook/README.md src/pages/cookbook.mdx
+
 wget -q https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md
+python3.8 scripts/resolve_local_links.py docs/langserve.md https://github.com/langchain-ai/langserve/tree/main/
+
 wget -q https://raw.githubusercontent.com/langchain-ai/langgraph/main/README.md -O docs/langgraph.md
+python3.8 scripts/resolve_local_links.py docs/langgraph.md https://github.com/langchain-ai/langgraph/tree/main/
+
+# render
 quarto render docs/
diff --git a/templates/sql-pgvector/README.md b/templates/sql-pgvector/README.md
index ac5eef762d..d454e6c14d 100644
--- a/templates/sql-pgvector/README.md
+++ b/templates/sql-pgvector/README.md
@@ -40,7 +40,7 @@ Apart from having `pgvector` extension enabled, you will need to do some setup b
 
 In order to run RAG over your postgreSQL database you will need to generate the embeddings for the specific columns you want. 
 
-This process is covered in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb), but the overall approach consist of:
+This process is covered in the [RAG empowered SQL cookbook](https://github.com/langchain-ai/langchain/blob/master/cookbook/retrieval_in_sql.ipynb), but the overall approach consist of:
 1. Querying for unique values in the column
 2. Generating embeddings for those values
 3. Store the embeddings in a separate column or in an auxiliary table.
@@ -102,4 +102,4 @@ We can access the template from code with:
 from langserve.client import RemoteRunnable
 
 runnable = RemoteRunnable("http://localhost:8000/sql-pgvector")
-```
\ No newline at end of file
+```