fix execution_count

pull/892/head
jhills20 6 months ago
parent 6b7062b848
commit 4c54717344

@ -167,7 +167,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Cluster 0 Theme: All these transactions are related to utilities, IT equipment, and property rates. They are all expenses related to the operation of buildings and offices.\n",
"Cluster 0 Theme: All these transactions are related to operational expenses such as energy (electricity), property rates, IT and computer equipment, and transportation (purchase of an electric van).\n",
"EDF ENERGY, Electricity Oct 2019 3 buildings\n",
" City Of Edinburgh Council, Non Domestic Rates \n",
" EDF, Electricity\n",
@ -179,7 +179,7 @@
" Computer Centre UK Ltd, Computer equipment\n",
" ARNOLD CLARK, Purchase of an electric van\n",
" ----------------------------------------------------------------------------------------------------\n",
"Cluster 1 Theme: All the transactions are related to educational, cultural, or professional services and resources. They include payments for student bursaries, collection of papers, architectural works, legal deposit services, digital resources, online/print subscriptions, and literary & archival items. The transaction values are also identical across all the transactions.\n",
"Cluster 1 Theme: All the transactions are related to educational, cultural, or professional services and resources. These include student bursaries, collection of papers, architectural works, legal deposit services, digital resources, and online/print subscriptions. The transaction values are also identical across all transactions.\n",
"Institute of Conservation, This payment covers 2 invoices for student bursary costs\n",
" PRIVATE SALE, Collection of papers of an individual\n",
" LEE BOYD LIMITED, Architectural Works\n",
@ -191,7 +191,7 @@
" ALDL, ALDL Charges\n",
" Private Sale, Literary & Archival Items\n",
" ----------------------------------------------------------------------------------------------------\n",
"Cluster 2 Theme: All transactions are related to \"Kelvin Hall\" and have the same transaction values.\n",
"Cluster 2 Theme: All transactions are related to the same location, \"Kelvin Hall\", and involve similar transaction values.\n",
"CBRE, Kelvin Hall\n",
" GLASGOW CITY COUNCIL, Kelvin Hall\n",
" University Of Glasgow, Kelvin Hall\n",
@ -203,7 +203,7 @@
" Glasgow City Council, Kelvin Hall\n",
" GLASGOW LIFE, Quarterly service charge KH\n",
" ----------------------------------------------------------------------------------------------------\n",
"Cluster 3 Theme: All these transactions are related to facility management services provided by ECG Facilities Service. They include various types of services such as maintenance, inspection, electrical and mechanical works, and other specific tasks like gutter cleaning and pigeon fouling cleaning.\n",
"Cluster 3 Theme: All the transactions are related to facility management services provided by ECG Facilities Service. They include various types of services such as maintenance, inspection, electrical and mechanical works, and other specific tasks like gutter works and cleaning of pigeon fouling.\n",
"ECG FACILITIES SERVICE, This payment covers multiple invoices for facility management fees\n",
" ECG FACILITIES SERVICE, Facilities Management Charge\n",
" ECG FACILITIES SERVICE, Inspection and Maintenance of all Library properties\n",
@ -215,7 +215,7 @@
" ECG Facilities Service, Facilities Management Charge\n",
" ECG Facilities Service, Facilities Management Charge\n",
" ----------------------------------------------------------------------------------------------------\n",
"Cluster 4 Theme: All transactions are related to construction or refurbishment work, specifically at George IV Bridge and Causewayside locations. The companies involved are all construction or building services firms. The transaction values are also identical across all transactions.\n",
"Cluster 4 Theme: All these transactions are related to construction or refurbishment work done by various companies at different locations. The commonality is the nature of the work (construction/refurbishment) and the transaction values.\n",
"M & J Ballantyne Ltd, George IV Bridge Work\n",
" John Graham Construction Ltd, Causewayside Refurbishment\n",
" John Graham Construction Ltd, Causewayside Refurbishment\n",

@ -27,7 +27,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "80e71f33",
"metadata": {
"pycharm": {
@ -44,28 +44,17 @@
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mRequirement already satisfied: tenacity in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (8.2.3)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mCollecting tiktoken==0.3.3\n",
" Downloading tiktoken-0.3.3-cp39-cp39-macosx_11_0_arm64.whl (706 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m706.8/706.8 kB\u001b[0m \u001b[31m14.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: regex>=2022.1.18 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from tiktoken==0.3.3) (2023.8.8)\n",
"\u001b[0mRequirement already satisfied: tiktoken==0.3.3 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (0.3.3)\n",
"Requirement already satisfied: regex>=2022.1.18 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from tiktoken==0.3.3) (2023.8.8)\n",
"Requirement already satisfied: requests>=2.26.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from tiktoken==0.3.3) (2.31.0)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken==0.3.3) (3.2.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken==0.3.3) (3.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken==0.3.3) (1.26.16)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken==0.3.3) (2023.7.22)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mInstalling collected packages: tiktoken\n",
" Attempting uninstall: tiktoken\n",
" Found existing installation: tiktoken 0.5.1\n",
" Uninstalling tiktoken-0.5.1:\n",
" Successfully uninstalled tiktoken-0.5.1\n",
"Successfully installed tiktoken-0.3.3\n",
"Collecting termcolor\n",
" Downloading termcolor-2.3.0-py3-none-any.whl (6.9 kB)\n",
"\u001b[0mRequirement already satisfied: termcolor in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (2.3.0)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mInstalling collected packages: termcolor\n",
"Successfully installed termcolor-2.3.0\n",
"Requirement already satisfied: openai in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (1.3.5)\n",
"\u001b[0mRequirement already satisfied: openai in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (1.3.5)\n",
"Requirement already satisfied: anyio<4,>=3.5.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from openai) (3.7.1)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from openai) (1.8.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from openai) (0.25.1)\n",
@ -87,45 +76,26 @@
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests) (1.26.16)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests) (2023.7.22)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mCollecting arxiv\n",
" Downloading arxiv-2.0.0-py3-none-any.whl.metadata (8.4 kB)\n",
"Collecting feedparser==6.0.10 (from arxiv)\n",
" Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m81.1/81.1 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: requests==2.31.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from arxiv) (2.31.0)\n",
"Collecting sgmllib3k (from feedparser==6.0.10->arxiv)\n",
" Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)\n",
" Installing build dependencies ... \u001b[?25ldone\n",
"\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n",
"\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n",
"\u001b[?25hRequirement already satisfied: charset-normalizer<4,>=2 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests==2.31.0->arxiv) (3.2.0)\n",
"\u001b[0mRequirement already satisfied: arxiv in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (2.0.0)\n",
"Requirement already satisfied: feedparser==6.0.10 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from arxiv) (6.0.10)\n",
"Requirement already satisfied: requests==2.31.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from arxiv) (2.31.0)\n",
"Requirement already satisfied: sgmllib3k in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from feedparser==6.0.10->arxiv) (1.0.0)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests==2.31.0->arxiv) (3.2.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests==2.31.0->arxiv) (3.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests==2.31.0->arxiv) (1.26.16)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from requests==2.31.0->arxiv) (2023.7.22)\n",
"Downloading arxiv-2.0.0-py3-none-any.whl (11 kB)\n",
"Building wheels for collected packages: sgmllib3k\n",
" Building wheel for sgmllib3k (pyproject.toml) ... \u001b[?25ldone\n",
"\u001b[?25h Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6048 sha256=34bcfd8eb3301d2f3ec55a2773accc60e641c93b32b2ca131dae0ddcd60c7bc9\n",
" Stored in directory: /Users/james.hills/Library/Caches/pip/wheels/65/7a/a7/78c287f64e401255dff4c13fdbc672fed5efbfd21c530114e1\n",
"Successfully built sgmllib3k\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mInstalling collected packages: sgmllib3k, feedparser, arxiv\n",
"Successfully installed arxiv-2.0.0 feedparser-6.0.10 sgmllib3k-1.0.0\n",
"Requirement already satisfied: pandas in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (2.1.1)\n",
"\u001b[0mRequirement already satisfied: pandas in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (2.1.1)\n",
"Requirement already satisfied: numpy>=1.22.4 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from pandas) (1.26.0)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from pandas) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from pandas) (2023.3.post1)\n",
"Requirement already satisfied: tzdata>=2022.1 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from pandas) (2023.3)\n",
"Requirement already satisfied: six>=1.5 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas) (1.12.0)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mCollecting PyPDF2\n",
" Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m232.6/232.6 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: typing_extensions>=3.10.0.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from PyPDF2) (4.8.0)\n",
"\u001b[0mRequirement already satisfied: PyPDF2 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (3.0.1)\n",
"Requirement already satisfied: typing_extensions>=3.10.0.0 in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (from PyPDF2) (4.8.0)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0mInstalling collected packages: PyPDF2\n",
"Successfully installed PyPDF2-3.0.1\n",
"Requirement already satisfied: tqdm in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (4.66.1)\n",
"\u001b[0mRequirement already satisfied: tqdm in /Users/james.hills/.pyenv/versions/3.9.16/envs/openai-cookbook/lib/python3.9/site-packages (4.66.1)\n",
"\u001b[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
"\u001b[0m"
]
@ -146,7 +116,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 2,
"id": "dab872c5",
"metadata": {},
"outputs": [],
@ -189,7 +159,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 3,
"id": "2de5d32d",
"metadata": {},
"outputs": [
@ -216,7 +186,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"id": "ae5cb7a1",
"metadata": {},
"outputs": [],
@ -232,7 +202,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 5,
"id": "57217b9d",
"metadata": {},
"outputs": [],
@ -278,7 +248,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 6,
"id": "dda02bdb",
"metadata": {},
"outputs": [
@ -286,7 +256,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/1g/fk6xzzvj0c1ggf_6nwknlfy00000gq/T/ipykernel_84898/3977585067.py:14: DeprecationWarning: The '(Search).results' method is deprecated, use 'Client.results' instead\n",
"/var/folders/1g/fk6xzzvj0c1ggf_6nwknlfy00000gq/T/ipykernel_87832/3977585067.py:14: DeprecationWarning: The '(Search).results' method is deprecated, use 'Client.results' instead\n",
" for result in search.results():\n"
]
},
@ -299,7 +269,7 @@
" 'pdf_url': 'http://arxiv.org/pdf/1808.07982v1'}"
]
},
"execution_count": 12,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -312,7 +282,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 7,
"id": "11675627",
"metadata": {},
"outputs": [],
@ -325,7 +295,7 @@
") -> list[str]:\n",
" \"\"\"Returns a list of strings and relatednesses, sorted from most related to least.\"\"\"\n",
" query_embedding_response = embedding_request(query)\n",
" query_embedding = query_embedding_response[\"data\"][0][\"embedding\"]\n",
" query_embedding = query_embedding_response.data[0].embedding\n",
" strings_and_relatednesses = [\n",
" (row[\"filepath\"], relatedness_fn(query_embedding, row[\"embedding\"]))\n",
" for i, row in df.iterrows()\n",
@ -337,7 +307,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "7211df2c",
"metadata": {},
"outputs": [],
@ -434,7 +404,7 @@
"\n",
" # Final summary\n",
" print(\"Summarizing into overall summary\")\n",
" response = openai.ChatCompletion.create(\n",
" response = client.chat.completions.create(\n",
" model=GPT_MODEL,\n",
" messages=[\n",
" {\n",
@ -453,7 +423,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"id": "898b94d4",
"metadata": {},
"outputs": [
@ -469,7 +439,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.19s/it]\n"
"100%|██████████| 4/4 [00:01<00:00, 2.19it/s]\n"
]
},
{
@ -487,7 +457,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"id": "c715f60d",
"metadata": {},
"outputs": [
@ -496,25 +466,22 @@
"output_type": "stream",
"text": [
"Core Argument:\n",
"- The paper discusses the use of Proximal Policy Optimization (PPO) in sequence generation tasks, specifically in the context of chit-chat chatbots.\n",
"- The authors argue that PPO is a more efficient reinforcement learning algorithm compared to policy gradient, commonly used in text generation tasks.\n",
"- They propose a dynamic approach for PPO (PPO-dynamic) and demonstrate its efficacy in synthetic experiments and chit-chat chatbot tasks.\n",
"- The paper introduces the use of proximal policy optimization (PPO) as a more efficient reinforcement learning algorithm for sequence generation tasks, particularly in the context of chit-chat chatbot.\n",
"- The authors propose a dynamic approach for PPO (PPO-dynamic) to improve stability and performance compared to traditional policy gradient methods.\n",
"\n",
"Evidence:\n",
"- PPO-dynamic achieves high precision scores comparable to other algorithms in a synthetic counting task.\n",
"- PPO-dynamic shows faster progress and more stable learning curves compared to PPO in the synthetic counting task.\n",
"- In the chit-chat chatbot task, PPO-dynamic achieves a slightly higher BLEU-2 score than other algorithms.\n",
"- PPO and PPO-dynamic have more stable learning curves and converge faster than policy gradient.\n",
"- The paper presents experiments on synthetic tasks and chit-chat chatbot, demonstrating that both PPO and PPO-dynamic can stabilize the training and lead the model to learn to generate more diverse outputs.\n",
"- The results suggest that PPO is a better way for sequence learning, and GAN-based sequence learning can use PPO as the new optimization method for better performance.\n",
"- The supplementary material contains the derivation of the proposed PPO-dynamic, experimental settings, and results, including the distribution of the first output with different input sentence lengths.\n",
"\n",
"Conclusions:\n",
"- PPO is a better optimization method for sequence learning compared to policy gradient.\n",
"- PPO-dynamic further improves the optimization process by dynamically adjusting hyperparameters.\n",
"- PPO can be used as a new optimization method for GAN-based sequence learning for better performance.\n"
"- The paper concludes that PPO and PPO-dynamic are effective for sequence generation, with PPO-dynamic achieving comparable or higher precision and BLEU-2 scores in experiments compared to other algorithms.\n",
"- The results indicate that PPO-dynamic has a different distribution compared to the traditional REINFORCE method, showing the potential for improved stability and performance in sequence generation tasks.\n"
]
}
],
"source": [
"print(chat_test_response[\"choices\"][0][\"message\"][\"content\"])\n"
"print(chat_test_response.choices[0].message.content)\n"
]
},
{
@ -530,16 +497,15 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 11,
"id": "77a6fb4f",
"metadata": {},
"outputs": [],
"source": [
"@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))\n",
"def chat_completion_request(messages, functions=None, model=GPT_MODEL):\n",
" headers = {\n",
" \"Content-Type\": \"application/json\",\n",
" \"Authorization\": \"Bearer \" + openai.api_key,\n",
" \"Authorization\": \"Bearer \" + client.api_key,\n",
" }\n",
" json_data = {\"model\": model, \"messages\": messages}\n",
" if functions is not None:\n",
@ -559,7 +525,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 12,
"id": "73f7672d",
"metadata": {},
"outputs": [],
@ -590,7 +556,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 13,
"id": "978b7877",
"metadata": {},
"outputs": [],
@ -635,7 +601,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 14,
"id": "0c88ae15",
"metadata": {},
"outputs": [],
@ -709,7 +675,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 15,
"id": "c39a1d80",
"metadata": {},
"outputs": [],
@ -725,7 +691,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 16,
"id": "253fd0f7",
"metadata": {},
"outputs": [
@ -734,43 +700,40 @@
"output_type": "stream",
"text": [
"Function generation requested, calling function\n",
"Finding and reading paper\n",
"Chunking text from paper\n",
"Summarizing each chunk of text\n"
"Getting search results\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████| 17/17 [00:06<00:00, 2.65it/s]\n"
"/var/folders/1g/fk6xzzvj0c1ggf_6nwknlfy00000gq/T/ipykernel_87832/3977585067.py:14: DeprecationWarning: The '(Search).results' method is deprecated, use 'Client.results' instead\n",
" for result in search.results():\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Summarizing into overall summary\n"
"Got search results, summarizing content\n"
]
},
{
"data": {
"text/markdown": [
"Core Argument:\n",
"- The paper focuses on the theoretical analysis of the PPO-Clip algorithm in the context of deep reinforcement learning.\n",
"- The authors propose two core ideas: reinterpreting PPO-Clip from the perspective of hinge loss and introducing a two-step policy improvement scheme.\n",
"- The paper establishes the global convergence of PPO-Clip and characterizes its convergence rate.\n",
"Proximal Policy Optimization (PPO) is a reinforcement learning algorithm known for its efficiency and stability. It is used for optimizing models in tasks involving non-differentiable evaluation metrics and adversarial learning. PPO replaces the policy gradient with its own optimization approach, and it has been shown to outperform policy gradient in stability and performance. PPO is particularly effective in conditional sequence generation tasks and continuous control tasks.\n",
"\n",
"Evidence:\n",
"- The paper addresses the challenges posed by the clipping mechanism and neural function approximation.\n",
"- The authors provide theoretical proofs, lemmas, and mathematical analysis to support their arguments.\n",
"- The paper presents empirical experiments on various reinforcement learning benchmark tasks to validate the effectiveness of PPO-Clip.\n",
"Here are a few papers that provide more details about how PPO works and its variations:\n",
"1. Title: [\"Proximal Policy Optimization and its Dynamic Version for Sequence Generation\"](http://arxiv.org/abs/1808.07982v1)\n",
" - Summary: This paper demonstrates the efficacy of PPO and its dynamic approach in conditional sequence generation tasks.\n",
"\n",
"Conclusions:\n",
"- The paper offers theoretical insights into the performance of PPO-Clip and provides a framework for analyzing its convergence properties.\n",
"- PPO-Clip is shown to have a global convergence rate of O(1/sqrt(T)), where T is the number of iterations.\n",
"- The hinge loss reinterpretation of PPO-Clip allows for variants with comparable empirical performance.\n",
"- The paper contributes to a better understanding of PPO-Clip in the reinforcement learning community."
"2. Title: [\"CIM-PPO: Proximal Policy Optimization with Liu-Correntropy Induced Metric\"](http://arxiv.org/abs/2110.10522v2)\n",
" - Summary: Analyzes the asymmetry effect of KL divergence on PPO's objective function and proposes a new algorithm called CIM-PPO, which applies correntropy induced metric in PPO.\n",
"\n",
"3. Title: [\"A2C is a special case of PPO\"](http://arxiv.org/abs/2205.09123v1)\n",
" - Summary: Demonstrates that Advantage Actor-critic (A2C) is a special case of PPO through theoretical justifications and empirical experiments.\n",
"\n",
"These papers provide insights into the workings and variations of PPO in reinforcement learning."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
@ -793,7 +756,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 17,
"id": "3ca3e18a",
"metadata": {},
"outputs": [
@ -811,7 +774,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.08s/it]\n"
"100%|██████████| 4/4 [00:01<00:00, 2.42it/s]\n"
]
},
{
@ -825,20 +788,17 @@
"data": {
"text/markdown": [
"Core Argument:\n",
"- The paper discusses the use of proximal policy optimization (PPO) in sequence generation tasks, specifically in the context of chit-chat chatbots.\n",
"- The authors argue that PPO is a more efficient reinforcement learning algorithm compared to policy gradient, which is commonly used in text generation tasks.\n",
"- They propose a dynamic approach for PPO (PPO-dynamic) and demonstrate its efficacy in synthetic experiments and chit-chat chatbot tasks.\n",
"- The paper introduces the use of proximal policy optimization (PPO) as a more efficient reinforcement learning algorithm for sequence generation tasks, particularly in the context of chit-chat chatbots.\n",
"- It proposes a dynamic approach for PPO (PPO-dynamic) to improve stability and performance compared to traditional policy gradient methods.\n",
"\n",
"Evidence:\n",
"- The authors derive the constraints for PPO-dynamic and provide the pseudo code for both PPO and PPO-dynamic.\n",
"- They compare the performance of PPO-dynamic with other algorithms, including REINFORCE, MIXER, and SeqGAN, on a synthetic counting task and a chit-chat chatbot task using the OpenSubtitles dataset.\n",
"- In the synthetic counting task, PPO-dynamic achieves a high precision score comparable to REINFORCE and MIXER, with a faster learning curve compared to PPO.\n",
"- In the chit-chat chatbot task, PPO-dynamic achieves a slightly higher BLEU-2 score than REINFORCE and PPO, with a more stable and faster learning curve than policy gradient.\n",
"- The paper presents experiments on synthetic tasks and chit-chat chatbot, demonstrating that both PPO and PPO-dynamic can stabilize the training and lead the model to learn to generate more diverse outputs.\n",
"- It compares the performance of PPO and PPO-dynamic with other algorithms such as REINFORCE, MIXER, and SeqGAN, showing that PPO-dynamic achieves comparable or higher precision and BLEU-2 scores in the experiments.\n",
"- The supplementary material contains the derivation of the proposed PPO-dynamic, experimental settings, and results, including the distribution of the first output with different input sentence lengths.\n",
"\n",
"Conclusions:\n",
"- The results suggest that PPO is a better optimization method for sequence learning compared to policy gradient.\n",
"- PPO-dynamic further improves the optimization process by dynamically adjusting the hyperparameters.\n",
"- The authors conclude that PPO can be used as a new optimization method for GAN-based sequence learning for better performance."
"- The paper suggests that replacing policy gradient with PPO can lead to improved performance and stability in conditional text generation tasks.\n",
"- PPO-dynamic is shown to generate a more scattered distribution compared to using the REINFORCE method, and it achieves comparable or higher precision and BLEU-2 scores compared to other algorithms in the experiments."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
@ -857,7 +817,7 @@
"updated_response = chat_completion_with_function_execution(\n",
" paper_conversation.conversation_history, functions=arxiv_functions\n",
")\n",
"display(Markdown(updated_response[\"choices\"][0][\"message\"][\"content\"]))\n"
"display(Markdown(updated_response.choices[0].message.content))\n"
]
}
],

Loading…
Cancel
Save