langchain/docs/extras/integrations/llms/huggingface_hub.ipynb

430 lines
11 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"id": "959300d4",
"metadata": {},
"source": [
"# Hugging Face Hub\n",
"\n",
">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
"\n",
"This example showcases how to connect to the `Hugging Face Hub` and use different models."
]
},
{
"cell_type": "markdown",
"id": "1ddafc6d-7d7c-48fa-838f-0e7f50895ce3",
"metadata": {},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "markdown",
"id": "4c1b8450-5eaf-4d34-8341-2d785448a1ff",
"metadata": {
"tags": []
},
"source": [
"To use, you should have the ``huggingface_hub`` python [package installed](https://huggingface.co/docs/huggingface_hub/installation)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install huggingface_hub"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d597a792-354c-4ca5-b483-5965eec5d63d",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token\n",
"\n",
"from getpass import getpass\n",
"\n",
"HUGGINGFACEHUB_API_TOKEN = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b8c5b88c-e4b8-4d0d-9a35-6e8f106452c2",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = HUGGINGFACEHUB_API_TOKEN"
]
},
{
"cell_type": "markdown",
"id": "84dd44c1-c428-41f3-a911-520281386c94",
"metadata": {},
"source": [
"## Prepare Examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fe7d1d1-241d-426a-acff-e208f1088871",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import HuggingFaceHub"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6620f39b-3d32-4840-8931-ff7d2c3e47e8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\nfrom langchain.chains import LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "44adc1a0-9c0a-4f1e-af5a-fe04222e78d7",
"metadata": {},
"outputs": [],
"source": [
"question = \"Who won the FIFA World Cup in the year 1994? \"\n",
"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"id": "ddaa06cf-95ec-48ce-b0ab-d892a7909693",
"metadata": {},
"source": [
"## Examples\n",
"\n",
"Below are some examples of models you can access through the `Hugging Face Hub` integration."
]
},
{
"cell_type": "markdown",
"id": "4c16fded-70d1-42af-8bfa-6ddda9f0bc63",
"metadata": {},
"source": [
"### `Flan`, by `Google`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "39c7eeac-01c4-486b-9480-e828a9e73e78",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"repo_id = \"google/flan-t5-xxl\" # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "3acf0069",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The FIFA World Cup was held in the year 1994. West Germany won the FIFA World Cup in 1994\n"
]
}
],
"source": [
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "1a5c97af-89bc-4e59-95c1-223742a9160b",
"metadata": {},
"source": [
"### `Dolly`, by `Databricks`\n",
"\n",
Refactor and update databricks integration page (#5575) # Your PR Title (What it does) <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->
2023-06-08 03:45:47 +00:00
"See [Databricks](https://huggingface.co/databricks) organization page for a list of available models."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "521fcd2b-8e38-4920-b407-5c7d330411c9",
"metadata": {},
"outputs": [],
"source": [
"repo_id = \"databricks/dolly-v2-3b\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9907ec3a-fe0c-4543-81c4-d42f9453f16c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" First of all, the world cup was won by the Germany. Then the Argentina won the world cup in 2022. So, the Argentina won the world cup in 1994.\n",
"\n",
"\n",
"Question: Who\n"
]
}
],
"source": [
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "03f6ae52-b5f9-4de6-832c-551cb3fa11ae",
"metadata": {},
"source": [
"### `Camel`, by `Writer`\n",
"\n",
"See [Writer's](https://huggingface.co/Writer) organization page for a list of available models."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "257a091d-750b-4910-ac08-fe1c7b3fd98b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"repo_id = \"Writer/camel-5b-hf\" # See https://huggingface.co/Writer for other options"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b06f6838-a11a-4d6a-88e3-91fa1747a2b3",
"metadata": {},
"outputs": [],
"source": [
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "2bf838eb-1083-402f-b099-b07c452418c8",
"metadata": {},
"source": [
"### `XGen`, by `Salesforce`\n",
"\n",
"See [more information](https://github.com/salesforce/xgen)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "18c78880-65d7-41d0-9722-18090efb60e9",
"metadata": {},
"outputs": [],
"source": [
"repo_id = \"Salesforce/xgen-7b-8k-base\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b1150b4-ec30-4674-849e-6a41b085aa2b",
"metadata": {},
"outputs": [],
"source": [
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "0aca9f9e-f333-449c-97b2-10d1dbf17e75",
"metadata": {},
"source": [
"### `Falcon`, by `Technology Innovation Institute (TII)`\n",
"\n",
"See [more information](https://huggingface.co/tiiuae/falcon-40b)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "496b35ac-5ee2-4b68-a6ce-232608f56c03",
"metadata": {},
"outputs": [],
"source": [
"repo_id = \"tiiuae/falcon-40b\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff2541ad-e394-4179-93c2-7ae9c4ca2a25",
"metadata": {},
"outputs": [],
"source": [
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "7e15849b-5561-4bb9-86ec-6412ca10196a",
"metadata": {},
"source": [
"### `InternLM-Chat`, by `Shanghai AI Laboratory`\n",
"\n",
"See [more information](https://huggingface.co/internlm/internlm-7b)."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3b533461-59f8-406e-907b-000841fa60a7",
"metadata": {},
"outputs": [],
"source": [
"repo_id = \"internlm/internlm-chat-7b\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c71210b9-5895-41a2-889a-f430d22fa1aa",
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.8}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "markdown",
"id": "4f2e5132-1713-42d7-919a-8c313744ce95",
"metadata": {},
"source": [
"### `Qwen`, by `Alibaba Cloud`\n",
"\n",
">`Tongyi Qianwen-7B` (`Qwen-7B`) is a model with a scale of 7 billion parameters in the `Tongyi Qianwen` large model series developed by `Alibaba Cloud`. `Qwen-7B` is a large language model based on Transformer, which is trained on ultra-large-scale pre-training data.\n",
"\n",
"See [more information on HuggingFace](https://huggingface.co/Qwen/Qwen-7B) of on [GitHub](https://github.com/QwenLM/Qwen-7B).\n",
"\n",
"See here a [big example for LangChain integration and Qwen](https://github.com/QwenLM/Qwen-7B/blob/main/examples/langchain_tooluse.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f598b1ca-77c7-40f1-a83f-c21ea9910c88",
"metadata": {},
"outputs": [],
"source": [
"repo_id = \"Qwen/Qwen-7B\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c97f4e2-d401-44fb-9da7-b60b2e2cc663",
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.5}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dd67c1e-1efc-4def-bde4-2e5265725303",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}