langchain/docs/extras/integrations/text_embedding/nlp_cloud.ipynb
2023-07-23 23:23:16 -07:00

107 lines
2.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "6802946f",
"metadata": {},
"source": [
"# NLP Cloud\n",
"\n",
"NLP Cloud is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n",
"\n",
"The [embeddings](https://docs.nlpcloud.com/#embeddings) endpoint offers several models:\n",
"\n",
"* `paraphrase-multilingual-mpnet-base-v2`: Paraphrase Multilingual MPNet Base V2 is a very fast model based on Sentence Transformers that is perfectly suited for embeddings extraction in more than 50 languages (see the full list here).\n",
"\n",
"* `gpt-j`: GPT-J returns advanced embeddings. It might return better results than Sentence Transformers based models (see above) but it is also much slower.\n",
"\n",
"* `dolphin`: Dolphin returns advanced embeddings. It might return better results than Sentence Transformers based models (see above) but it is also much slower. It natively understands the following languages: Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, French, German, Hungarian, Italian, Japanese, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, and Ukrainian."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "490d7923",
"metadata": {},
"outputs": [],
"source": [
"! pip install nlpcloud"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6a39ed4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import NLPCloudEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c105d8cd",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"NLPCLOUD_API_KEY\"] = \"xxx\"\n",
"nlpcloud_embd = NLPCloudEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cca84023",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "26868d0f",
"metadata": {},
"outputs": [],
"source": [
"query_result = nlpcloud_embd.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0c171c2f",
"metadata": {},
"outputs": [],
"source": [
"doc_result = nlpcloud_embd.embed_documents([text])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}