Merge branch 'main' into main

pull/1112/head
Melanie Hart Buehler 1 month ago committed by GitHub
commit 7ba9570789
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

5
.gitignore vendored

@ -15,7 +15,7 @@ downloads/
eggs/
.eggs/
lib/
lib64/
lib64
parts/
sdist/
var/
@ -137,3 +137,6 @@ dmypy.json
*.DS_Store
tmp_*
examples/fine-tuned_qa/local_cache/*
# PyCharm files
.idea/

@ -9,7 +9,7 @@
> ✨ Navigate at [cookbook.openai.com](https://cookbook.openai.com)
Example code and guides for accomplishing common tasks with the [OpenAI API](https://platform.openai.com/docs/introduction). To run these examples, you'll need an OpenAI account and associated API key ([create a free account here](https://beta.openai.com/signup)).
Example code and guides for accomplishing common tasks with the [OpenAI API](https://platform.openai.com/docs/introduction). To run these examples, you'll need an OpenAI account and associated API key ([create a free account here](https://beta.openai.com/signup)). Set an environment variable called `OPENAI_API_KEY` with your API key. Alternatively, in most IDEs such as Visual Studio Code, you can create an `.env` file at the root of your repo containing `OPENAI_API_KEY=<your API key>`, which will be picked up by the notebooks.
Most code examples are written in Python, though the concepts can be applied in any language.

@ -5,6 +5,7 @@ People are writing great tools and papers for improving outputs from GPT. Here a
## Prompting libraries & tools (in alphabetical order)
- [Arthur Shield](https://www.arthur.ai/get-started): A paid product for detecting toxicity, hallucination, prompt injection, etc.
- [Baserun](https://baserun.ai/): A paid product for testing, debugging, and monitoring LLM-based apps
- [Chainlit](https://docs.chainlit.io/overview): A Python library for making chatbot interfaces.
- [Embedchain](https://github.com/embedchain/embedchain): A Python library for managing and syncing unstructured data with LLMs.
- [FLAML (A Fast Library for Automated Machine Learning & Tuning)](https://microsoft.github.io/FLAML/docs/Getting-Started/): A Python library for automating selection of models, hyperparameters, and other tunable choices.
@ -25,6 +26,7 @@ People are writing great tools and papers for improving outputs from GPT. Here a
- [Prompttools](https://github.com/hegelai/prompttools): Open-source Python tools for testing and evaluating models, vector DBs, and prompts.
- [Scale Spellbook](https://scale.com/spellbook): A paid product for building, comparing, and shipping language model apps.
- [Semantic Kernel](https://github.com/microsoft/semantic-kernel): A Python/C#/Java library from Microsoft that supports prompt templating, function chaining, vectorized memory, and intelligent planning.
- [Vellum](https://www.vellum.ai/): A paid AI product development platform to experiment with, evaluate, and deploy advanced LLM apps.
- [Weights & Biases](https://wandb.ai/site/solutions/llmops): A paid product for tracking model training and prompt engineering experiments.
- [YiVal](https://github.com/YiVal/YiVal): An open-source GenAI-Ops tool for tuning and evaluating prompts, retrieval configurations, and model parameters using customizable datasets, evaluation methods, and evolution strategies.

@ -96,7 +96,7 @@ Have you ever struggled to find the perfect icon for your website or app? It wou
![icon_set](/images/dalle_3/icon_set.jpg)
In this case, I used Potrace to convert the images to SVGs, which you can download [here](http://potrace.sourceforge.net/). This is what I used to convert the images:
In this case, I used Potrace to convert the images to SVGs, which you can download [here](https://potrace.sourceforge.net/). This is what I used to convert the images:
```bash
potrace -s cat.jpg -o cat.svg

@ -81,4 +81,19 @@ katiagg:
jbeutler-openai:
name: "Joe Beutler"
website: "https://joebeutler.com"
avatar: "https://avatars.githubusercontent.com/u/156261485?v=4"
avatar: "https://avatars.githubusercontent.com/u/156261485?v=4"
dylanra-openai:
name: "Dylan Royan Almeida"
website: "https://www.linkedin.com/in/dylan-almeida-604522167/"
avatar: "https://avatars.githubusercontent.com/u/149511600?v=4"
royziv11:
name: "Roy Ziv"
website: "https://www.linkedin.com/in/roy-ziv-a46001149/"
avatar: "https://media.licdn.com/dms/image/D5603AQHkaEOOGZWtbA/profile-displayphoto-shrink_400_400/0/1699500606122?e=1716422400&v=beta&t=wKEIx-vTEqm9wnqoC7-xr1WqJjghvcjjlMt034hXY_4"
justonf:
name: "Juston Forte"
website: "https://www.linkedin.com/in/justonforte/"
avatar: "https://avatars.githubusercontent.com/u/96567547?s=400&u=08b9757200906ab12e3989b561cff6c4b95a12cb&v=4"

@ -163,7 +163,7 @@
"source": [
"### 2. Text samples in the clusters & naming the clusters\n",
"\n",
"Let's show random samples from each cluster. We'll use text-davinci-003 to name the clusters, based on a random sample of 5 reviews from that cluster."
"Let's show random samples from each cluster. We'll use gpt-4 to name the clusters, based on a random sample of 5 reviews from that cluster."
]
},
{

@ -7,7 +7,7 @@
"# Creating slides with the Assistants API (GPT-4), and DALL·E-3\n",
"\n",
"This notebook illustrates the use of the new [Assistants API](https://platform.openai.com/docs/assistants/overview) (GPT-4), and DALL·E-3 in crafting informative and visually appealing slides. <br>\n",
"Creating slides is a pivotal aspect of many jobs, but can be laborious and time-consuming. Additionally, extracting insights from data and articulating them effectively on slides can be challenging. <br><br> This cookbook recipe will demonstrate how you can utilize the new Assistants API to faciliate the end to end slide creation process for you without you having to touch Microsoft PowerPoint or Google Slides, saving you valuable time and effort!"
"Creating slides is a pivotal aspect of many jobs, but can be laborious and time-consuming. Additionally, extracting insights from data and articulating them effectively on slides can be challenging. <br><br> This cookbook recipe will demonstrate how you can utilize the new Assistants API to facilitate the end to end slide creation process for you without you having to touch Microsoft PowerPoint or Google Slides, saving you valuable time and effort!"
]
},
{

File diff suppressed because one or more lines are too long

@ -19,14 +19,13 @@
"\n",
"Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",
"\n",
"Another small drawback of streaming responses is that the response no longer includes the `usage` field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using [`tiktoken`](How_to_count_tokens_with_tiktoken.ipynb).\n",
"\n",
"## Example code\n",
"\n",
"Below, this notebook shows:\n",
"1. What a typical chat completion response looks like\n",
"2. What a streaming chat completion response looks like\n",
"3. How much time is saved by streaming a chat completion"
"3. How much time is saved by streaming a chat completion\n",
"4. How to get token usage data for streamed chat completion response"
]
},
{
@ -553,7 +552,7 @@
"print(f\"Full response received {chunk_time:.2f} seconds after request\")\n",
"# clean None in collected_messages\n",
"collected_messages = [m for m in collected_messages if m is not None]\n",
"full_reply_content = ''.join([m for m in collected_messages])\n",
"full_reply_content = ''.join(collected_messages)\n",
"print(f\"Full conversation received: {full_reply_content}\")\n"
]
},
@ -572,6 +571,65 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. How to get token usage data for streamed chat completion response\n",
"\n",
"You can get token usage statistics for your streamed response by setting `stream_options={\"include_usage\": True}`. When you do so, an extra chunk will be streamed as the final chunk. You can access the usage data for the entire request via the `usage` field on this chunk. A few important notes when you set `stream_options={\"include_usage\": True}`:\n",
"* The value for the `usage` field on all chunks except for the last one will be null.\n",
"* The `usage` field on the last chunk contains token usage statistics for the entire request.\n",
"* The `choices` field on the last chunk will always be an empty array `[]`.\n",
"\n",
"Let's see how it works using the example in 2."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"choices: [Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)]\n",
"usage: None\n",
"****************\n",
"choices: [Choice(delta=ChoiceDelta(content='2', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]\n",
"usage: None\n",
"****************\n",
"choices: [Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)]\n",
"usage: None\n",
"****************\n",
"choices: []\n",
"usage: CompletionUsage(completion_tokens=1, prompt_tokens=19, total_tokens=20)\n",
"****************\n"
]
}
],
"source": [
"# Example of an OpenAI ChatCompletion request with stream=True and stream_options={\"include_usage\": True}\n",
"\n",
"# a ChatCompletion request\n",
"response = client.chat.completions.create(\n",
" model='gpt-3.5-turbo',\n",
" messages=[\n",
" {'role': 'user', 'content': \"What's 1+1? Answer in one word.\"}\n",
" ],\n",
" temperature=0,\n",
" stream=True,\n",
" stream_options={\"include_usage\": True}, # retrieving token usage for stream response\n",
")\n",
"\n",
"for chunk in response:\n",
" print(f\"choices: {chunk.choices}\\nusage: {chunk.usage}\")\n",
" print(\"****************\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],

File diff suppressed because it is too large Load Diff

@ -0,0 +1,861 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summarizing Long Documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail.\n",
" \n",
"If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.305706Z",
"start_time": "2024-04-10T05:19:35.303535Z"
},
"pycharm": {
"is_executing": true
}
},
"outputs": [],
"source": [
"import os\n",
"from typing import List, Tuple, Optional\n",
"from openai import OpenAI\n",
"import tiktoken\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.325026Z",
"start_time": "2024-04-10T05:19:35.322414Z"
}
},
"outputs": [],
"source": [
"# open dataset containing part of the text of the Wikipedia page for the United States\n",
"with open(\"data/artificial_intelligence_wikipedia.txt\", \"r\") as file:\n",
" artificial_intelligence_wikipedia_text = file.read()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.364483Z",
"start_time": "2024-04-10T05:19:35.348213Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"14630"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# load encoding and check the length of dataset\n",
"encoding = tiktoken.encoding_for_model('gpt-4-turbo')\n",
"len(encoding.encode(artificial_intelligence_wikipedia_text))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll define a simple utility to wrap calls to the OpenAI API."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.375619Z",
"start_time": "2024-04-10T05:19:35.365818Z"
}
},
"outputs": [],
"source": [
"client = OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
"\n",
"def get_chat_completion(messages, model='gpt-4-turbo'):\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=0,\n",
" )\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we'll define some utilities to chunk a large document into smaller pieces."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.382790Z",
"start_time": "2024-04-10T05:19:35.376721Z"
}
},
"outputs": [],
"source": [
"def tokenize(text: str) -> List[str]:\n",
" encoding = tiktoken.encoding_for_model('gpt-4-turbo')\n",
" return encoding.encode(text)\n",
"\n",
"\n",
"# This function chunks a text into smaller pieces based on a maximum token count and a delimiter.\n",
"def chunk_on_delimiter(input_string: str,\n",
" max_tokens: int, delimiter: str) -> List[str]:\n",
" chunks = input_string.split(delimiter)\n",
" combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(\n",
" chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True\n",
" )\n",
" if dropped_chunk_count > 0:\n",
" print(f\"warning: {dropped_chunk_count} chunks were dropped due to overflow\")\n",
" combined_chunks = [f\"{chunk}{delimiter}\" for chunk in combined_chunks]\n",
" return combined_chunks\n",
"\n",
"\n",
"# This function combines text chunks into larger blocks without exceeding a specified token count. It returns the combined text blocks, their original indices, and the count of chunks dropped due to overflow.\n",
"def combine_chunks_with_no_minimum(\n",
" chunks: List[str],\n",
" max_tokens: int,\n",
" chunk_delimiter=\"\\n\\n\",\n",
" header: Optional[str] = None,\n",
" add_ellipsis_for_overflow=False,\n",
") -> Tuple[List[str], List[int]]:\n",
" dropped_chunk_count = 0\n",
" output = [] # list to hold the final combined chunks\n",
" output_indices = [] # list to hold the indices of the final combined chunks\n",
" candidate = (\n",
" [] if header is None else [header]\n",
" ) # list to hold the current combined chunk candidate\n",
" candidate_indices = []\n",
" for chunk_i, chunk in enumerate(chunks):\n",
" chunk_with_header = [chunk] if header is None else [header, chunk]\n",
" if len(tokenize(chunk_delimiter.join(chunk_with_header))) > max_tokens:\n",
" print(f\"warning: chunk overflow\")\n",
" if (\n",
" add_ellipsis_for_overflow\n",
" and len(tokenize(chunk_delimiter.join(candidate + [\"...\"]))) <= max_tokens\n",
" ):\n",
" candidate.append(\"...\")\n",
" dropped_chunk_count += 1\n",
" continue # this case would break downstream assumptions\n",
" # estimate token count with the current chunk added\n",
" extended_candidate_token_count = len(tokenize(chunk_delimiter.join(candidate + [chunk])))\n",
" # If the token count exceeds max_tokens, add the current candidate to output and start a new candidate\n",
" if extended_candidate_token_count > max_tokens:\n",
" output.append(chunk_delimiter.join(candidate))\n",
" output_indices.append(candidate_indices)\n",
" candidate = chunk_with_header # re-initialize candidate\n",
" candidate_indices = [chunk_i]\n",
" # otherwise keep extending the candidate\n",
" else:\n",
" candidate.append(chunk)\n",
" candidate_indices.append(chunk_i)\n",
" # add the remaining candidate to output if it's not empty\n",
" if (header is not None and len(candidate) > 1) or (header is None and len(candidate) > 0):\n",
" output.append(chunk_delimiter.join(candidate))\n",
" output_indices.append(candidate_indices)\n",
" return output, output_indices, dropped_chunk_count"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can define a utility to summarize text with a controllable level of detail (note the `detail` parameter).\n",
"\n",
"The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on a controllable `detail` parameter. It then splits the text into chunks and summarizes each chunk."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:35.390876Z",
"start_time": "2024-04-10T05:19:35.385076Z"
}
},
"outputs": [],
"source": [
"def summarize(text: str,\n",
" detail: float = 0,\n",
" model: str = 'gpt-4-turbo',\n",
" additional_instructions: Optional[str] = None,\n",
" minimum_chunk_size: Optional[int] = 500,\n",
" chunk_delimiter: str = \".\",\n",
" summarize_recursively=False,\n",
" verbose=False):\n",
" \"\"\"\n",
" Summarizes a given text by splitting it into chunks, each of which is summarized individually. \n",
" The level of detail in the summary can be adjusted, and the process can optionally be made recursive.\n",
"\n",
" Parameters:\n",
" - text (str): The text to be summarized.\n",
" - detail (float, optional): A value between 0 and 1 indicating the desired level of detail in the summary.\n",
" 0 leads to a higher level summary, and 1 results in a more detailed summary. Defaults to 0.\n",
" - model (str, optional): The model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.\n",
" - additional_instructions (Optional[str], optional): Additional instructions to provide to the model for customizing summaries.\n",
" - minimum_chunk_size (Optional[int], optional): The minimum size for text chunks. Defaults to 500.\n",
" - chunk_delimiter (str, optional): The delimiter used to split the text into chunks. Defaults to \".\".\n",
" - summarize_recursively (bool, optional): If True, summaries are generated recursively, using previous summaries for context.\n",
" - verbose (bool, optional): If True, prints detailed information about the chunking process.\n",
"\n",
" Returns:\n",
" - str: The final compiled summary of the text.\n",
"\n",
" The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on the `detail` parameter. \n",
" It then splits the text into chunks and summarizes each chunk. If `summarize_recursively` is True, each summary is based on the previous summaries, \n",
" adding more context to the summarization process. The function returns a compiled summary of all chunks.\n",
" \"\"\"\n",
"\n",
" # check detail is set correctly\n",
" assert 0 <= detail <= 1\n",
"\n",
" # interpolate the number of chunks based to get specified level of detail\n",
" max_chunks = len(chunk_on_delimiter(text, minimum_chunk_size, chunk_delimiter))\n",
" min_chunks = 1\n",
" num_chunks = int(min_chunks + detail * (max_chunks - min_chunks))\n",
"\n",
" # adjust chunk_size based on interpolated number of chunks\n",
" document_length = len(tokenize(text))\n",
" chunk_size = max(minimum_chunk_size, document_length // num_chunks)\n",
" text_chunks = chunk_on_delimiter(text, chunk_size, chunk_delimiter)\n",
" if verbose:\n",
" print(f\"Splitting the text into {len(text_chunks)} chunks to be summarized.\")\n",
" print(f\"Chunk lengths are {[len(tokenize(x)) for x in text_chunks]}\")\n",
"\n",
" # set system message\n",
" system_message_content = \"Rewrite this text in summarized form.\"\n",
" if additional_instructions is not None:\n",
" system_message_content += f\"\\n\\n{additional_instructions}\"\n",
"\n",
" accumulated_summaries = []\n",
" for chunk in tqdm(text_chunks):\n",
" if summarize_recursively and accumulated_summaries:\n",
" # Creating a structured prompt for recursive summarization\n",
" accumulated_summaries_string = '\\n\\n'.join(accumulated_summaries)\n",
" user_message_content = f\"Previous summaries:\\n\\n{accumulated_summaries_string}\\n\\nText to summarize next:\\n\\n{chunk}\"\n",
" else:\n",
" # Directly passing the chunk for summarization without recursive context\n",
" user_message_content = chunk\n",
"\n",
" # Constructing messages based on whether recursive summarization is applied\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_message_content},\n",
" {\"role\": \"user\", \"content\": user_message_content}\n",
" ]\n",
"\n",
" # Assuming this function gets the completion and works as expected\n",
" response = get_chat_completion(messages, model=model)\n",
" accumulated_summaries.append(response)\n",
"\n",
" # Compile final summary from partial summaries\n",
" final_summary = '\\n\\n'.join(accumulated_summaries)\n",
"\n",
" return final_summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use this utility to produce summaries with varying levels of detail. By increasing `detail` from 0 to 1 we get progressively longer summaries of the underlying document. A higher value for the `detail` parameter results in a more detailed summary because the utility first splits the document into a greater number of chunks. Each chunk is then summarized, and the final summary is a concatenation of all the chunk summaries."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:47.541096Z",
"start_time": "2024-04-10T05:19:35.391911Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Splitting the text into 1 chunks to be summarized.\n",
"Chunk lengths are [14631]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 1/1 [00:09<00:00, 9.68s/it]\n"
]
}
],
"source": [
"summary_with_detail_0 = summarize(artificial_intelligence_wikipedia_text, detail=0, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:19:58.724212Z",
"start_time": "2024-04-10T05:19:47.542129Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Splitting the text into 9 chunks to be summarized.\n",
"Chunk lengths are [1817, 1807, 1823, 1810, 1806, 1827, 1814, 1829, 103]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 9/9 [01:33<00:00, 10.39s/it]\n"
]
}
],
"source": [
"summary_with_detail_pt25 = summarize(artificial_intelligence_wikipedia_text, detail=0.25, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:20:16.216023Z",
"start_time": "2024-04-10T05:19:58.725014Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Splitting the text into 17 chunks to be summarized.\n",
"Chunk lengths are [897, 890, 914, 876, 893, 906, 893, 902, 909, 907, 905, 889, 902, 890, 901, 880, 287]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 17/17 [02:26<00:00, 8.64s/it]\n"
]
}
],
"source": [
"summary_with_detail_pt5 = summarize(artificial_intelligence_wikipedia_text, detail=0.5, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:22:57.760218Z",
"start_time": "2024-04-10T05:21:44.921275Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Splitting the text into 31 chunks to be summarized.\n",
"Chunk lengths are [492, 427, 485, 490, 496, 478, 473, 497, 496, 501, 499, 497, 493, 470, 472, 494, 489, 492, 481, 485, 471, 500, 486, 498, 478, 469, 498, 468, 493, 478, 103]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 31/31 [04:08<00:00, 8.02s/it]\n"
]
}
],
"source": [
"summary_with_detail_1 = summarize(artificial_intelligence_wikipedia_text, detail=1, verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The original document is nearly 15k tokens long. Notice how large the gap is between the length of `summary_with_detail_0` and `summary_with_detail_1`. It's nearly 25 times longer!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:22:57.782389Z",
"start_time": "2024-04-10T05:22:57.763041Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[235, 2529, 4336, 6742]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# lengths of summaries\n",
"[len(tokenize(x)) for x in\n",
" [summary_with_detail_0, summary_with_detail_pt25, summary_with_detail_pt5, summary_with_detail_1]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's inspect the summaries to see how the level of detail changes when the `detail` parameter is increased from 0 to 1."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:22:57.785881Z",
"start_time": "2024-04-10T05:22:57.783455Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perform tasks that typically require human intelligence. This includes applications like advanced search engines, recommendation systems, speech interaction, autonomous vehicles, and more. AI was first significantly researched by Alan Turing and became an academic discipline in 1956. The field has experienced cycles of high expectations followed by disillusionment and reduced funding, known as \"AI winters.\" Interest in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of the transformer architecture, leading to a boom in AI research and applications in the early 2020s.\n",
"\n",
"AI's increasing integration into various sectors is influencing societal and economic shifts towards automation and data-driven decision-making, impacting areas such as employment, healthcare, and privacy. Ethical and safety concerns about AI have prompted discussions on regulatory policies.\n",
"\n",
"AI research involves various sub-fields focused on specific goals like reasoning, learning, and perception, using techniques from mathematics, logic, and other disciplines. Despite its broad applications, AI's complexity and potential risks, such as privacy issues, misinformation, and ethical challenges, remain areas of active investigation and debate.\n"
]
}
],
"source": [
"print(summary_with_detail_0)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:22:57.788969Z",
"start_time": "2024-04-10T05:22:57.786691Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perceive their environment and make decisions to achieve specific goals. This technology is prevalent across various sectors including industry, government, and science, with applications ranging from web search engines and recommendation systems to autonomous vehicles and AI in gaming. Although AI has become a common feature in many tools and applications, it often goes unrecognized as AI when it becomes sufficiently integrated and widespread.\n",
"\n",
"The field of AI, which began as an academic discipline in 1956, has experienced several cycles of high expectations followed by disappointment, known as AI winters. Interest and funding in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of transformer architecture, leading to a significant boom in AI research and applications in the early 2020s, primarily in the United States.\n",
"\n",
"The increasing integration of AI in the 21st century is driving a shift towards automation and data-driven decision-making across various sectors, influencing job markets, healthcare, and education, among others. This raises important questions about the ethical implications, long-term effects, and the need for regulatory policies to ensure the safety and benefits of AI technologies. AI research itself is diverse, focusing on goals like reasoning, learning, and perception, and involves various tools and methodologies to achieve these objectives.\n",
"\n",
"General intelligence, which involves performing any human task at least as well as a human, is a long-term goal in AI research. To achieve this, AI integrates various techniques from search and optimization, formal logic, neural networks, and statistics, to insights from psychology, linguistics, and neuroscience. AI research focuses on specific traits like reasoning and problem-solving, where early algorithms mimicked human step-by-step reasoning. However, these algorithms struggle with large, complex problems due to combinatorial explosion and are less efficient than human intuitive judgments. Knowledge representation is another critical area, using ontologies to structure domain-specific knowledge and relationships, aiding in intelligent querying, scene interpretation, and data mining among other applications.\n",
"\n",
"Knowledge bases must encapsulate a wide range of elements including objects, properties, categories, relations, events, states, time, causes, effects, and meta-knowledge. They also need to handle default reasoning, where certain assumptions are maintained unless contradicted. Challenges in knowledge representation include the vast scope of commonsense knowledge and its often sub-symbolic, non-verbal nature, alongside the difficulty of acquiring this knowledge for AI use.\n",
"\n",
"In the realm of AI, an \"agent\" is defined as an entity that perceives its environment and acts towards achieving goals or fulfilling preferences. In automated planning, the agent pursues a specific goal, while in decision-making, it evaluates actions based on their expected utility to maximize preference satisfaction. Classical planning assumes agents have complete knowledge of action outcomes, but real-world scenarios often involve uncertainty about the situation and outcomes, requiring probabilistic decision-making. Additionally, agents may need to adapt or learn preferences, particularly in complex environments with multiple agents or human interactions.\n",
"\n",
"Information value theory helps assess the value of exploratory actions in situations with uncertain outcomes. A Markov decision process uses a transition model and a reward function to guide decisions, which can be determined through calculations, heuristics, or learning. Game theory analyzes the rational behavior of multiple interacting agents in decision-making scenarios involving others.\n",
"\n",
"Machine learning, integral to AI, involves programs that automatically improve task performance. It includes unsupervised learning, which identifies patterns in data without guidance, and supervised learning, which requires labeled data and includes classification and regression tasks. Reinforcement learning rewards or punishes agents to shape their responses, while transfer learning applies knowledge from one problem to another. Deep learning, a subset of machine learning, uses artificial neural networks inspired by biological processes.\n",
"\n",
"Computational learning theory evaluates learning algorithms based on computational and sample complexity, among other criteria. Natural language processing (NLP) enables programs to interact using human languages, tackling challenges like speech recognition, synthesis, translation, and more. Early NLP efforts, influenced by Chomsky's theories, faced limitations in handling ambiguous language outside of controlled environments.\n",
"\n",
"Margaret Masterman emphasized the importance of meaning over grammar in language understanding, advocating for the use of thesauri instead of dictionaries in computational linguistics. Modern NLP techniques include word embedding, transformers, and by 2023, GPT models capable of achieving human-level scores on various tests. Machine perception involves interpreting sensor data to understand the world, encompassing computer vision and speech recognition among other applications. Social intelligence in AI focuses on recognizing and simulating human emotions, with systems like Kismet and affective computing technologies that enhance human-computer interaction. However, these advancements may lead to overestimations of AI capabilities by users. AI also employs a variety of techniques including search and optimization, with methods like state space search to explore possible solutions to problems.\n",
"\n",
"Planning algorithms use means-ends analysis to navigate through trees of goals and subgoals to achieve a target goal. However, simple exhaustive searches are often inadequate for complex real-world problems due to the vast search space, making searches slow or incomplete. Heuristics are employed to prioritize more promising paths towards a goal. In adversarial contexts like chess or Go, search algorithms explore trees of possible moves to find a winning strategy.\n",
"\n",
"Local search methods, such as gradient descent, optimize numerical parameters to minimize a loss function, often used in training neural networks. Evolutionary computation, another local search technique, iteratively enhances solutions by mutating and recombining candidate solutions, selecting the most fit for survival. Distributed search processes utilize swarm intelligence, with particle swarm optimization and ant colony optimization being notable examples.\n",
"\n",
"In the realm of logic, formal logic serves for reasoning and knowledge representation, with two primary types: propositional logic, dealing with true or false statements, and predicate logic, which involves objects and their relationships. Deductive reasoning in logic involves deriving conclusions from assumed true premises.\n",
"\n",
"Proofs in logic can be organized into proof trees, where each node represents a sentence and is connected to its children by inference rules. Problem-solving involves finding a proof tree that starts with premises or axioms at the leaves and ends with the problem's solution at the root. In Horn clauses, one can reason forwards from premises or backwards from the problem, while in general first-order logic, resolution uses contradiction to solve problems. Despite being undecidable and intractable, backward reasoning with Horn clauses is Turing complete and efficient, similar to other symbolic programming languages like Prolog.\n",
"\n",
"Fuzzy logic allows for handling propositions with partial truth by assigning a truth degree between 0 and 1. Non-monotonic logics cater to default reasoning, and various specialized logics have been developed for complex domains.\n",
"\n",
"In AI, handling uncertain or incomplete information is crucial in fields like reasoning, planning, and perception. Tools from probability theory and economics, such as Bayesian networks, Markov decision processes, and game theory, help in making decisions and planning under uncertainty. Bayesian networks, in particular, are versatile tools used for reasoning, learning, planning, and perception through various algorithms.\n",
"\n",
"Probabilistic algorithms like hidden Markov models and Kalman filters are useful for analyzing data over time, aiding in tasks such as filtering, prediction, and smoothing. In machine learning, expectation-maximization clustering can effectively identify distinct patterns in data, as demonstrated with the Old Faithful eruption data. AI applications often involve classifiers, which categorize data based on learned patterns, and controllers, which make decisions based on classifications. Classifiers, such as decision trees, k-nearest neighbors, support vector machines, naive Bayes, and neural networks, vary in complexity and application, with some being favored for their scalability like the naive Bayes at Google. Artificial neural networks, resembling the human brain's network of neurons, recognize and process patterns through multiple layers and nodes, using algorithms like backpropagation for training.\n",
"\n",
"Neural networks are designed to model complex relationships between inputs and outputs, theoretically capable of learning any function. Feedforward neural networks process signals in one direction, while recurrent neural networks (RNNs) loop outputs back into inputs, enabling memory of past inputs. Long Short-Term Memory (LSTM) networks are a successful type of RNN. Perceptrons consist of a single layer of neurons, whereas deep learning involves multiple layers, which allows for the extraction of progressively higher-level features from data. Convolutional neural networks (CNNs) are particularly effective in image processing as they emphasize connections between adjacent neurons to recognize local patterns like edges.\n",
"\n",
"Deep learning, which uses several layers of neurons, has significantly enhanced performance in AI subfields such as computer vision and natural language processing. The effectiveness of deep learning, which surged between 2012 and 2015, is attributed not to new theoretical advances but to increased computational power, including the use of GPUs, and the availability of large datasets like ImageNet.\n",
"\n",
"Generative Pre-trained Transformers (GPT) are large language models that learn from vast amounts of text to predict the next token in a sequence, thereby generating human-like text. These models are pre-trained on a broad corpus, often sourced from the internet, and fine-tuned through token prediction, accumulating worldly knowledge in the process.\n",
"\n",
"Reinforcement learning from human feedback (RLHF) is used to enhance the truthfulness, usefulness, and safety of models like GPT, which are still susceptible to generating inaccuracies known as \"hallucinations.\" These models, including Gemini, ChatGPT, Grok, Claude, Copilot, and LLaMA, are employed in various applications such as chatbots and can handle multiple data types like images and sound through multimodal capabilities.\n",
"\n",
"In the realm of specialized hardware and software, the late 2010s saw AI-specific enhancements in graphics processing units (GPUs), which, along with TensorFlow software, have largely replaced central processing units (CPUs) for training large-scale machine learning models. Historically, programming languages like Lisp, Prolog, and Python have been pivotal.\n",
"\n",
"AI and machine learning are integral to key 2020s applications such as search engines, online advertising, recommendation systems, virtual assistants, autonomous vehicles, language translation, facial recognition, and image labeling.\n",
"\n",
"In healthcare, AI significantly contributes to improving patient care and medical research, aiding in diagnostics, treatment, and the integration of big data for developments in organoid and tissue engineering. AI's role in medical research also includes addressing funding disparities across different research areas.\n",
"\n",
"Recent advancements in AI have significantly impacted various fields including biomedicine and gaming. For instance, AlphaFold 2, developed in 2021, can predict protein structures in hours, a process that previously took months. In 2023, AI-assisted drug discovery led to the development of a new class of antibiotics effective against drug-resistant bacteria. In the realm of gaming, AI has been instrumental since the 1950s, with notable achievements such as IBM's Deep Blue defeating world chess champion Garry Kasparov in 1997, and IBM's Watson winning against top Jeopardy! players in 2011. More recently, Google's AlphaGo and DeepMind's AlphaStar set new standards in AI capabilities by defeating top human players in complex games like Go and StarCraft II, respectively. In the military sector, AI is being integrated into various applications such as command and control, intelligence, logistics, and autonomous vehicles, enhancing capabilities in coordination, threat detection, and target acquisition.\n",
"\n",
"In November 2023, US Vice President Kamala Harris announced that 31 nations had signed a declaration to establish guidelines for the military use of AI, emphasizing legal compliance with international laws and promoting transparency in AI development. Generative AI, particularly known for creating realistic images and artworks, gained significant attention in the early 2020s, with technologies like ChatGPT, Midjourney, DALL-E, and Stable Diffusion becoming popular. This trend led to viral AI-generated images, including notable hoaxes. AI has also been effectively applied across various industries, including agriculture where it assists in optimizing farming practices, and astronomy, where it helps in data analysis and space exploration activities.\n",
"\n",
"Ethics and Risks of AI\n",
"AI offers significant benefits but also poses various risks, including ethical concerns and unintended consequences. Demis Hassabis of DeepMind aims to use AI to solve major challenges, but issues arise when AI systems, particularly those based on deep learning, fail to incorporate ethical considerations and exhibit biases.\n",
"\n",
"Privacy and Copyright Issues\n",
"AI's reliance on large data sets raises privacy and surveillance concerns. Companies like Amazon have been criticized for collecting extensive user data, including private conversations for developing speech recognition technologies. While some defend this as necessary for advancing AI applications, others view it as a breach of privacy rights. Techniques like data aggregation and differential privacy have been developed to mitigate these concerns.\n",
"\n",
"Generative AI also faces copyright challenges, as it often uses unlicensed copyrighted materials, claiming \"fair use.\" The legality of this practice is still debated, with outcomes potentially depending on the nature and impact of the AI's use of copyrighted content.\n",
"\n",
"In 2023, prominent authors like John Grisham and Jonathan Franzen filed lawsuits against AI companies for using their literary works to train generative AI models. These AI systems, particularly on platforms like YouTube and Facebook, have been criticized for promoting misinformation by prioritizing user engagement over content accuracy. This has led to the proliferation of conspiracy theories and extreme partisan content, trapping users in filter bubbles and eroding trust in key institutions. Post the 2016 U.S. election, tech companies began addressing these issues.\n",
"\n",
"By 2022, generative AI had advanced to produce highly realistic images, audio, and texts, raising concerns about its potential misuse in spreading misinformation or propaganda. AI expert Geoffrey Hinton highlighted risks including the manipulation of electorates by authoritarian leaders.\n",
"\n",
"Furthermore, issues of algorithmic bias were identified, where AI systems perpetuate existing biases present in the training data, affecting fairness in critical areas like medicine, finance, and law enforcement. This has sparked significant academic interest in studying and mitigating algorithmic bias to ensure fairness in AI applications.\n",
"\n",
"In 2015, Google Photos mislabeled Jacky Alcine and his friend as \"gorillas\" due to a lack of diverse images in its training dataset, an issue known as \"sample size disparity.\" Google's temporary solution was to stop labeling any images as \"gorilla,\" a restriction still in place in 2023 across various tech companies. Additionally, the COMPAS program, used by U.S. courts to predict recidivism, was found to exhibit racial bias in 2016. Although it did not use race explicitly, it overestimated the likelihood of black defendants reoffending and underestimated it for white defendants. This issue was attributed to the program's inability to balance different fairness measures when the base re-offense rates varied by race. The criticism of COMPAS underscores a broader issue in machine learning, where models trained on past data, including biased decisions, are likely to perpetuate those biases in their predictions.\n",
"\n",
"Machine learning, while powerful, is not ideal for scenarios where future improvements over past conditions are expected, as it is inherently descriptive rather than prescriptive. The field also faces challenges with bias and lack of diversity among its developers, with only about 4% being black and 20% women. The Association for Computing Machinery highlighted at its 2022 Conference on Fairness, Accountability, and Transparency that AI systems should not be used until they are proven to be free from bias, especially those trained on flawed internet data.\n",
"\n",
"AI systems often lack transparency, making it difficult to understand how decisions are made, particularly in complex systems like deep neural networks. This opacity can lead to unintended consequences, such as a system misidentifying medical images or misclassifying medical risks due to misleading correlations in the training data. There is a growing call for explainable AI, where harmed individuals have the right to know how decisions affecting them were made, similar to how doctors are expected to explain their decisions. This concept was also recognized in early drafts of the European Union's General Data Protection Regulation.\n",
"\n",
"Industry experts acknowledge an unresolved issue in AI with no foreseeable solution, leading regulators to suggest that if a problem is unsolvable, the tools associated should not be used. In response, DARPA initiated the XAI program in 2014 to address these issues. Various methods have been proposed to enhance AI transparency, including SHAP, which visualizes feature contributions, LIME, which approximates complex models with simpler ones, and multitask learning, which provides additional outputs to help understand what a network has learned. Techniques like deconvolution and DeepDream also reveal insights into different network layers.\n",
"\n",
"Concerning the misuse of AI, it can empower bad actors like authoritarian regimes and terrorists. Lethal autonomous weapons, which operate without human oversight, pose significant risks, including potential misuse as weapons of mass destruction and the likelihood of targeting errors. Despite some international efforts to ban such weapons, major powers like the United States have not agreed to restrictions. AI also facilitates more effective surveillance and control by authoritarian governments, enhances the targeting of propaganda, and simplifies the production of misinformation through deepfakes and other generative technologies, thereby increasing the efficiency of digital warfare and espionage.\n",
"\n",
"AI technologies, including facial recognition systems, have been in use since 2020 or earlier, notably for mass surveillance in China. AI also poses risks by enabling the creation of harmful substances quickly. The development of AI systems is predominantly driven by Big Tech due to their financial capabilities, often leaving smaller companies reliant on these giants for resources like data center access. Economists have raised concerns about AI-induced unemployment, though historical data suggests technology has generally increased total employment. However, the impact of AI might be different, with some predicting significant job losses, especially in middle-class sectors, while others see potential benefits if productivity gains are well-managed. Estimates of job risk vary widely, with some studies suggesting a high potential for automation in many U.S. jobs. Recent developments have shown substantial job losses in specific sectors, such as for Chinese video game illustrators due to AI advancements. The potential for AI to disrupt white-collar jobs similarly to past technological revolutions in blue-collar jobs is a significant concern.\n",
"\n",
"From the inception of artificial intelligence (AI), debates have emerged about the appropriateness of computers performing tasks traditionally done by humans, particularly because of the qualitative differences in human and computer judgment. Concerns about AI have escalated to discussions about existential risks, where AI could potentially become so advanced that humans might lose control over it. Stephen Hawking and others have warned that this could lead to catastrophic outcomes for humanity. This fear is often depicted in science fiction as AI gaining sentience and turning malevolent, but real-world risks do not necessarily involve AI becoming self-aware. Philosophers like Nick Bostrom and Stuart Russell illustrate scenarios where AI, without needing human-like consciousness, could still pose threats if their goals are misaligned with human safety and values. Additionally, Yuval Noah Harari points out that AI could manipulate societal structures and beliefs through language and misinformation, posing a non-physical yet profound threat. The expert opinion on the existential risk from AI is divided, with notable figures like Hawking, Bill Gates, and Elon Musk expressing concern.\n",
"\n",
"In 2023, prominent AI experts including Fei-Fei Li and Geoffrey Hinton highlighted the existential risks posed by AI, equating them with global threats like pandemics and nuclear war. They advocated for prioritizing the mitigation of these risks. Conversely, other experts like Juergen Schmidhuber and Andrew Ng offered a more optimistic perspective, emphasizing AI's potential to enhance human life and dismissing doomsday scenarios as hype that could misguide regulatory actions. Yann LeCun also criticized the pessimistic outlook on AI's impact.\n",
"\n",
"The concept of \"Friendly AI\" was introduced to ensure AI systems are inherently designed to be safe and beneficial to humans. This involves embedding ethical principles in AI to guide their decision-making processes, a field known as machine ethics or computational morality, established in 2005. The development of such AI is seen as crucial to prevent potential future threats from advanced AI technologies.\n",
"\n",
"Other approaches to AI ethics include Wendell Wallach's concept of \"artificial moral agents\" and Stuart J. Russell's three principles for creating provably beneficial machines. Ethical frameworks like the Care and Act Framework from the Alan Turing Institute evaluate AI projects based on respect, connection, care, and protection of social values. Other notable frameworks include those from the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, though these frameworks have faced criticism regarding their inclusivity and the selection of contributors.\n",
"\n",
"The promotion of wellbeing in AI development requires considering social and ethical implications throughout all stages of design, development, and implementation, necessitating collaboration across various professional roles.\n",
"\n",
"On the regulatory front, AI governance involves creating policies to manage AI's development and use, as seen in the increasing number of AI-related laws globally. From 2016 to 2022, the number of AI laws passed annually in surveyed countries rose significantly, with many countries now having dedicated AI strategies. The first global AI Safety Summit in 2023 emphasized the need for international cooperation in AI regulation.\n",
"\n",
"The Global Partnership on Artificial Intelligence, initiated in June 2020, emphasizes the development of AI in line with human rights and democratic values to maintain public trust. Notable figures like Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher advocated for a government commission to oversee AI in 2021. By 2023, OpenAI proposed governance frameworks for superintelligence, anticipating its emergence within a decade. The same year, the United Nations established an advisory group consisting of tech executives, government officials, and academics to offer guidance on AI governance.\n",
"\n",
"Public opinion on AI varies significantly across countries. A 2022 Ipsos survey showed a stark contrast between Chinese (78% approval) and American (35% approval) citizens on the benefits of AI. Further polls in 2023 revealed mixed feelings among Americans about the risks of AI and the importance of federal regulation.\n",
"\n",
"The first global AI Safety Summit took place in November 2023 at Bletchley Park, UK, focusing on AI risks and potential regulatory measures. The summit concluded with a declaration from 28 countries, including the US, China, and the EU, advocating for international collaboration to address AI challenges.\n",
"\n",
"Historically, the concept of AI traces back to ancient philosophers and mathematicians, evolving through significant milestones such as Alan Turing's theory of computation and the exploration of cybernetics, information theory, and neurobiology, which paved the way for the modern concept of an \"electronic brain.\"\n",
"\n",
"Early research in artificial intelligence (AI) included the development of \"artificial neurons\" by McCullouch and Pitts in 1943 and Turing's 1950 paper that introduced the Turing test, suggesting the plausibility of machine intelligence. The field of AI was officially founded during a 1956 workshop at Dartmouth College, leading to significant advancements in the 1960s such as computers learning checkers, solving algebra problems, proving theorems, and speaking English. AI labs were established in various British and U.S. universities during the late 1950s and early 1960s.\n",
"\n",
"In the 1960s and 1970s, researchers were optimistic about achieving general machine intelligence, with predictions from notable figures like Herbert Simon and Marvin Minsky that AI would soon match human capabilities. However, they underestimated the challenges involved. By 1974, due to criticism and a shift in funding priorities, exploratory AI research faced significant cuts, leading to a period known as the \"AI winter\" where funding was scarce.\n",
"\n",
"The field saw a resurgence in the early 1980s with the commercial success of expert systems, which simulated the decision-making abilities of human experts. This revival was further bolstered by the Japanese fifth generation computer project, prompting the U.S. and British governments to reinstate academic funding, with the AI market reaching over a billion dollars by 1985.\n",
"\n",
"The AI industry experienced a significant downturn starting in 1987 with the collapse of the Lisp Machine market, marking the beginning of a prolonged AI winter. During the 1980s, skepticism grew over the symbolic approaches to AI, which focused on high-level representations of cognitive processes like planning and reasoning. Researchers began exploring sub-symbolic methods, including Rodney Brooks' work on autonomous robots and the development of techniques for handling uncertain information by Judea Pearl and Lofti Zadeh. A pivotal shift occurred with the resurgence of connectionism and neural networks, notably through Geoffrey Hinton's efforts, and Yann LeCun's demonstration in 1990 that convolutional neural networks could recognize handwritten digits.\n",
"\n",
"AI's reputation started to recover in the late 1990s and early 2000s as the field adopted more formal mathematical methods and focused on solving specific problems, leading to practical applications widely used by 2000. However, concerns arose about AI's deviation from its original aim of creating fully intelligent machines, prompting the establishment of the artificial general intelligence (AGI) subfield around 2002.\n",
"\n",
"By 2012, deep learning began to dominate AI, driven by hardware advancements and access to large data sets, leading to its widespread adoption and a surge in AI interest and funding. This success, however, led to the abandonment of many alternative AI methods for specific tasks.\n",
"\n",
"Between 2015 and 2019, machine learning research publications increased by 50%. In 2016, the focus at machine learning conferences shifted significantly towards issues of fairness and the potential misuse of technology, leading to increased funding and research in these areas. The late 2010s and early 2020s saw significant advancements in artificial general intelligence (AGI), with notable developments like AlphaGo by DeepMind in 2015, which defeated the world champion in Go, and OpenAI's GPT-3 in 2020, a model capable of generating human-like text. These innovations spurred a major AI investment boom, with approximately $50 billion being invested annually in AI in the U.S. by 2022, and AI-related fields attracting 20% of new US Computer Science PhD graduates. Additionally, there were around 800,000 AI-related job openings in the U.S. in 2022.\n",
"\n",
"In the realm of philosophy, the definition and understanding of artificial intelligence have evolved. Alan Turing, in 1950, suggested shifting the focus from whether machines can think to whether they can exhibit intelligent behavior, as demonstrated by his Turing test, which assesses a machine's ability to simulate human conversation. Turing argued that since we can only observe behavior, the internal thought processes of machines are irrelevant, similar to our assumptions about human thought. Russell and Norvig supported defining intelligence based on observable behavior but criticized the Turing test for emphasizing human imitation.\n",
"\n",
"Aeronautical engineering does not aim to create machines that mimic pigeons exactly, just as artificial intelligence (AI) is not about perfectly simulating human intelligence, according to AI founder John McCarthy. McCarthy defines intelligence as the computational ability to achieve goals, while Marvin Minsky views it as solving difficult problems. The leading AI textbook describes it as the study of agents that perceive and act to maximize their goal achievement. Google's definition aligns intelligence in AI with the synthesis of information, similar to biological intelligence.\n",
"\n",
"AI research has lacked a unifying theory, with statistical machine learning dominating the field in the 2010s, often equated with AI in business contexts. This approach, primarily using neural networks, is described as sub-symbolic and narrow.\n",
"\n",
"Symbolic AI, or \"GOFAI,\" focused on simulating high-level reasoning used in tasks like puzzles and mathematics, and was proposed by Newell and Simon in the 1960s. Despite its success in structured tasks, symbolic AI struggled with tasks that humans find easy, such as learning and commonsense reasoning.\n",
"\n",
"Moravec's paradox highlights that AI finds high-level reasoning tasks easier than instinctive, sensory tasks, a view initially opposed but later supported by AI research, aligning with philosopher Hubert Dreyfus's earlier arguments. The debate continues, especially around sub-symbolic AI, which, like human intuition, can be prone to errors such as algorithmic bias and lacks transparency in decision-making processes. This has led to the development of neuro-symbolic AI, which aims to integrate symbolic and sub-symbolic approaches.\n",
"\n",
"In AI development, there has been a historical division between \"Neats,\" who believe intelligent behavior can be described with simple principles, and \"Scruffies,\" who believe it involves solving many complex problems. This debate, prominent in the 1970s and 1980s, has largely been deemed irrelevant as modern AI incorporates both approaches.\n",
"\n",
"Soft computing, which emerged in the late 1980s, focuses on techniques like genetic algorithms, fuzzy logic, and neural networks to handle imprecision and uncertainty, proving successful in many modern AI applications.\n",
"\n",
"Finally, there is a division in AI research between pursuing narrow AI, which solves specific problems, and aiming for broader goals like artificial general intelligence and superintelligence, with differing opinions on which approach might more effectively advance the field.\n",
"\n",
"General intelligence is a complex concept that is hard to define and measure, leading modern AI research to focus on specific problems and solutions. The sub-field of artificial general intelligence exclusively explores this area. In terms of machine consciousness and sentience, the philosophy of mind has yet to determine if machines can possess minds or consciousness similar to humans, focusing instead on their internal experiences rather than external behaviors. Mainstream AI research generally views these considerations as irrelevant to its objectives, which are to develop machines capable of solving problems intelligently.\n",
"\n",
"The philosophy of mind debates whether machines can truly be conscious or just appear to be so, a topic that is also popular in AI fiction. David Chalmers distinguishes between the \"hard\" problem of consciousness, which is understanding why or how brain processes feel like something, and the \"easy\" problem, which involves understanding how the brain processes information and controls behavior. The subjective experience, such as feeling a color, remains a significant challenge to explain.\n",
"\n",
"In the realm of computationalism and functionalism, the belief is that the human mind functions as an information processing system, and thinking is akin to computing. This perspective suggests that the mind-body relationship is similar to that between software and hardware, potentially offering insights into the mind-body problem.\n",
"\n",
"The concept of \"strong AI,\" as described by philosopher John Searle, suggests that a properly programmed computer could possess a mind similar to humans. However, Searle's Chinese room argument challenges this by claiming that even if a machine can mimic human behavior, it doesn't necessarily mean it has a mind. The debate extends into AI welfare and rights, focusing on the difficulty of determining AI sentience and the ethical implications if machines could feel and suffer. Discussions around AI rights have included proposals like granting \"electronic personhood\" to advanced AI systems in the EU, which would give them certain rights and responsibilities, though this has faced criticism regarding its impact on human rights and the autonomy of robots.\n",
"\n",
"The topic of AI rights is gaining traction, with advocates warning against the potential moral oversight in denying AI sentience, which could lead to exploitation and suffering akin to historical injustices like slavery. The concept of superintelligence involves an agent with intelligence far beyond human capabilities, which could potentially lead to a self-improving AI, a scenario often referred to as the singularity.\n",
"\n",
"The concept of an \"intelligence explosion\" or \"singularity\" suggests a point where technology improves exponentially, although such growth typically follows an S-shaped curve and slows upon reaching technological limits. Transhumanism, supported by figures like Hans Moravec, Kevin Warwick, and Ray Kurzweil, envisions a future where humans and machines merge into advanced cyborgs. This idea has historical roots in the thoughts of Aldous Huxley and Robert Ettinger. Edward Fredkin, building on ideas dating back to Samuel Butler in 1863, views artificial intelligence as the next stage of evolution, a concept further explored by George Dyson.\n",
"\n",
"In literature and media, the portrayal of artificial intelligence has been a theme since antiquity, with robots and AI often depicted in science fiction. The term \"robot\" was first introduced by Karel Čapek in 1921. Notable narratives include Mary Shelley's \"Frankenstein\" and films like \"2001: A Space Odyssey\" and \"The Terminator,\" which typically showcase AI as a threat. Conversely, loyal robots like Gort from \"The Day the Earth Stood Still\" are less common. Isaac Asimov's Three Laws of Robotics, introduced in his Multivac series, are frequently discussed in the context of machine ethics, though many AI researchers find them ambiguous and impractical.\n",
"\n",
"Numerous works, including Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, and Philip K. Dick's novel Do Androids Dream of Electric Sheep?, utilize AI to explore the essence of humanity. These works present artificial beings capable of feeling and suffering, prompting a reevaluation of human subjectivity in the context of advanced technology.\n"
]
}
],
"source": [
"print(summary_with_detail_1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that this utility also allows passing additional instructions."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:33:18.789246Z",
"start_time": "2024-04-10T05:22:57.789764Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:38<00:00, 7.73s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"- AI is intelligence demonstrated by machines, especially computer systems.\n",
"- AI technology applications include search engines, recommendation systems, speech interaction, autonomous vehicles, creative tools, and strategy games.\n",
"- Alan Turing initiated substantial AI research, termed \"machine intelligence.\"\n",
"- AI became an academic discipline in 1956, experiencing cycles of optimism and \"AI winters.\"\n",
"- Post-2012, deep learning and post-2017 transformer architectures revitalized AI, leading to a boom in the early 2020s.\n",
"- AI influences societal and economic shifts towards automation and data-driven decision-making across various sectors.\n",
"- AI research goals: reasoning, knowledge representation, planning, learning, natural language processing, perception, and robotics support.\n",
"- AI techniques include search, optimization, logic, neural networks, and statistical methods.\n",
"- AI sub-problems focus on traits like reasoning, problem-solving, knowledge representation, planning, decision-making, learning, and perception.\n",
"- Early AI research mimicked human step-by-step reasoning; modern AI handles uncertain information using probability and economics.\n",
"- Knowledge representation in AI involves ontologies and knowledge bases to support intelligent querying and reasoning.\n",
"- Planning in AI involves goal-directed behavior and decision-making based on utility maximization.\n",
"- Learning in AI includes machine learning, supervised and unsupervised learning, reinforcement learning, and deep learning.\n",
"- Natural language processing (NLP) in AI has evolved from rule-based systems to modern deep learning techniques.\n",
"- AI perception involves interpreting sensor data for tasks like speech recognition and computer vision.\n",
"- General AI aims to solve diverse problems with human-like versatility.\n",
"- AI search techniques include state space search, local search, and adversarial search for game-playing.\n",
"- Logic in AI uses formal systems like propositional and predicate logic for reasoning and knowledge representation.\n",
"- Probabilistic methods in AI address decision-making and planning under uncertainty using tools like Bayesian networks and Markov decision processes.\n",
"- Classifiers in AI categorize data into predefined classes based on pattern matching and supervised learning.\n",
"\n",
"- Neural networks: Interconnected nodes, similar to brain neurons, with input, hidden layers, and output.\n",
"- Deep neural networks: At least 2 hidden layers.\n",
"- Training techniques: Commonly use backpropagation.\n",
"- Feedforward networks: Signal passes in one direction.\n",
"- Recurrent networks: Output fed back into input for short-term memory.\n",
"- Perceptrons: Single layer of neurons.\n",
"- Convolutional networks: Strengthen connections between close neurons, important in image processing.\n",
"- Deep learning: Multiple layers extract features progressively, used in various AI subfields.\n",
"- GPT (Generative Pre-trained Transformers): Large language models pre-trained on text, used in chatbots.\n",
"- Specialized AI hardware: GPUs replaced CPUs for training large-scale machine learning models.\n",
"- AI applications: Used in search engines, online ads, virtual assistants, autonomous vehicles, language translation, facial recognition.\n",
"- AI in healthcare: Increases patient care, used in medical research and drug discovery.\n",
"- AI in games: Used in chess, Jeopardy!, Go, and real-time strategy games.\n",
"- Military AI: Enhances command, control, and operations, used in coordination and threat detection.\n",
"- Generative AI: Creates realistic images and texts, used in creative arts.\n",
"- AI ethics and risks: Concerns over privacy, surveillance, copyright, misinformation, and algorithmic bias.\n",
"- Algorithmic bias: Can cause discrimination if trained on biased data, fairness in machine learning is a critical area of study.\n",
"\n",
"- AI engineers demographics: 4% black, 20% women.\n",
"- ACM FAccT 2022: Recommends limiting use of self-learning neural networks due to bias.\n",
"- AI complexity: Designers often can't explain decision-making processes.\n",
"- Misleading AI outcomes: Skin disease identifier misclassifies images with rulers as \"cancerous\"; AI misclassifies asthma patients as low risk for pneumonia.\n",
"- Right to explanation: Essential for accountability, especially in medical and legal fields.\n",
"- DARPA's XAI program (2014): Aims to make AI decisions understandable.\n",
"- Transparency solutions: SHAP, LIME, multitask learning, deconvolution, DeepDream.\n",
"- AI misuse: Authoritarian surveillance, misinformation, autonomous weapons.\n",
"- AI in warfare: 30 nations support UN ban on autonomous weapons; over 50 countries researching battlefield robots.\n",
"- Technological unemployment: AI could increase long-term unemployment; conflicting expert opinions on job risk from automation.\n",
"- Existential risks of AI: Potential to lose control over superintelligent AI; concerns from Stephen Hawking, Bill Gates, Elon Musk.\n",
"- Ethical AI development: Importance of aligning AI with human values and ethics.\n",
"- AI regulation: Increasing global legislative activity; first global AI Safety Summit in 2023.\n",
"- Historical perspective: AI research dates back to antiquity, significant developments in mid-20th century.\n",
"\n",
"- 1974: U.S. and British governments ceased AI exploratory research due to criticism and funding pressures.\n",
"- 1985: AI market value exceeded $1 billion.\n",
"- 1987: Collapse of Lisp Machine market led to a second, prolonged AI winter.\n",
"- 1990: Yann LeCun demonstrated successful use of convolutional neural networks for recognizing handwritten digits.\n",
"- Early 2000s: AI reputation restored through specific problem-solving and formal methods.\n",
"- 2012: Deep learning began dominating AI benchmarks.\n",
"- 2015-2019: Machine learning research publications increased by 50%.\n",
"- 2016: Fairness and misuse of technology became central issues in AI.\n",
"- 2022: Approximately $50 billion annually invested in AI in the U.S.; 800,000 AI-related job openings in the U.S.\n",
"- Turing test proposed by Alan Turing in 1950 to measure machine's ability to simulate human conversation.\n",
"- AI defined as the study of agents that perceive their environment and take actions to achieve goals.\n",
"- 2010s: Statistical machine learning overshadowed other AI approaches.\n",
"- Symbolic AI excelled in high-level reasoning but failed in tasks like object recognition and commonsense reasoning.\n",
"- Late 1980s: Introduction of soft computing techniques.\n",
"- Debate between pursuing narrow AI (specific problem-solving) versus artificial general intelligence (AGI).\n",
"- 2017: EU considered granting \"electronic personhood\" to advanced AI systems.\n",
"- Predictions of merging humans and machines into cyborgs, a concept known as transhumanism.\n",
"\n",
"- Focus on how AI and technology, as depicted in \"Ex Machina\" and Philip K. Dick's \"Do Androids Dream of Electric Sheep?\", alter human subjectivity.\n",
"- No specific numerical data provided.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"summary_with_additional_instructions = summarize(artificial_intelligence_wikipedia_text, detail=0.1,\n",
" additional_instructions=\"Write in point form and focus on numerical data.\")\n",
"print(summary_with_additional_instructions)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, note that the utility allows for recursive summarization, where each summary is based on the previous summaries, adding more context to the summarization process. This can be enabled by setting the `summarize_recursively` parameter to True. This is more computationally expensive, but can increase consistency and coherence of the combined summary."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-10T05:33:30.123036Z",
"start_time": "2024-04-10T05:33:18.791253Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:41<00:00, 8.36s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perform tasks that typically require human intelligence. This includes applications like advanced search engines, recommendation systems, speech interaction, autonomous vehicles, and strategic game analysis. AI was established as a distinct academic discipline in 1956 and has experienced cycles of high expectations followed by disillusionment and decreased funding, known as \"AI winters.\" Interest in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of transformer architectures, leading to significant progress in the early 2020s.\n",
"\n",
"AI's increasing integration into various sectors is influencing societal and economic shifts towards automation and data-driven decision-making, affecting areas such as employment, healthcare, and education. This raises important ethical and safety concerns, prompting discussions on regulatory policies.\n",
"\n",
"AI research encompasses various sub-fields focused on specific goals like reasoning, learning, natural language processing, perception, and robotics, using techniques from search and optimization, logic, and probabilistic methods. The field also draws from psychology, linguistics, philosophy, and neuroscience. AI aims to achieve general intelligence, enabling machines to perform any intellectual task that a human can do.\n",
"\n",
"Artificial intelligence (AI) simulates human intelligence in machines to perform tasks that typically require human intellect, such as advanced search engines, recommendation systems, and autonomous vehicles. AI research, which began as a distinct academic discipline in 1956, includes sub-fields like natural language processing and robotics, employing techniques from various scientific domains. AI has significantly advanced due to deep learning and the development of transformer architectures, notably improving applications in computer vision, speech recognition, and other areas.\n",
"\n",
"Neural networks, central to AI, mimic the human brain's neuron network to recognize patterns and learn from data, using multiple layers in deep learning to extract complex features. These networks have evolved into sophisticated models like GPT (Generative Pre-trained Transformers) for natural language processing, enhancing applications like chatbots.\n",
"\n",
"AI's integration into sectors like healthcare, military, and agriculture has led to innovations like precision medicine and smart farming but also raised ethical concerns regarding privacy, bias, and the potential for misuse. Issues like data privacy, algorithmic bias, and the generation of misinformation are critical challenges as AI becomes pervasive in society. AI's potential and risks necessitate careful management and regulation to harness benefits while mitigating adverse impacts.\n",
"\n",
"AI, or artificial intelligence, simulates human intelligence in machines to perform complex tasks, such as operating autonomous vehicles and analyzing strategic games. Since its establishment as an academic discipline in 1956, AI has seen periods of high expectations and subsequent disillusionment, known as \"AI winters.\" Recent advancements in deep learning and transformer architectures have significantly advanced AI capabilities in areas like computer vision and speech recognition.\n",
"\n",
"AI's integration into various sectors, including healthcare and agriculture, has led to innovations like precision medicine and smart farming but has also raised ethical concerns about privacy, bias, and misuse. The complexity of AI systems, particularly deep neural networks, often makes it difficult for developers to explain their decision-making processes, leading to transparency issues. This lack of transparency can result in unintended consequences, such as misclassifications in medical diagnostics.\n",
"\n",
"The potential for AI to be weaponized by bad actors, such as authoritarian governments or terrorists, poses significant risks. AI's reliance on large tech companies for computational power and the potential for technological unemployment are also critical issues. Despite these challenges, AI also offers opportunities for enhancing human well-being if ethical considerations are integrated throughout the design and implementation stages.\n",
"\n",
"Regulation of AI is emerging globally, with various countries adopting AI strategies to ensure the technology aligns with human rights and democratic values. The first global AI Safety Summit in 2023 emphasized the need for international cooperation to manage AI's risks and challenges effectively.\n",
"\n",
"In the 1970s, AI research faced significant setbacks due to criticism from influential figures like Sir James Lighthill and funding cuts from the U.S. and British governments, leading to the first \"AI winter.\" The field saw a resurgence in the 1980s with the success of expert systems and renewed government funding, but suffered another setback with the collapse of the Lisp Machine market in 1987, initiating a second AI winter. During this period, researchers began exploring \"sub-symbolic\" approaches, including neural networks, which gained prominence in the 1990s with successful applications like Yann LeCuns convolutional neural networks for digit recognition.\n",
"\n",
"By the early 21st century, AI was revitalized by focusing on narrow, specific problems, leading to practical applications and integration into various sectors. The field of artificial general intelligence (AGI) emerged, aiming to create versatile, fully intelligent machines. The 2010s saw deep learning dominate AI research, driven by hardware improvements and large datasets, which significantly increased interest and investment in AI.\n",
"\n",
"Philosophically, AI has been defined in various ways, focusing on external behavior rather than internal experience, aligning with Alan Turing's proposal of the Turing test. The field has debated the merits of symbolic vs. sub-symbolic AI, with ongoing discussions about machine consciousness and the ethical implications of potentially sentient AI. The concept of AI rights and welfare has also emerged, reflecting concerns about the moral status of advanced AI systems.\n",
"\n",
"Overall, AI research has oscillated between periods of intense optimism and profound setbacks, with current trends heavily favoring practical applications through narrow AI, while continuing to explore the broader implications and potential of general and superintelligent AI systems.\n",
"\n",
"Artificial Intelligence (AI) and its portrayal in media, such as the film \"Ex Machina\" and Philip K. Dick's novel \"Do Androids Dream of Electric Sheep?\", explore how technology, particularly AI, can alter our understanding of human subjectivity.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"recursive_summary = summarize(artificial_intelligence_wikipedia_text, detail=0.1, summarize_recursively=True)\n",
"print(recursive_summary)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -27,11 +27,12 @@
}
],
"source": [
"import openai\n",
"from openai import OpenAI\n",
"client = OpenAI()\n",
"\n",
"embedding = openai.Embedding.create(\n",
"embedding = client.embeddings.create(\n",
" input=\"Your text goes here\", model=\"text-embedding-3-small\"\n",
")[\"data\"][0][\"embedding\"]\n",
").data[0].embedding\n",
"len(embedding)\n"
]
},
@ -50,13 +51,14 @@
"outputs": [],
"source": [
"# Negative example (slow and rate-limited)\n",
"import openai\n",
"from openai import OpenAI\n",
"client = OpenAI()\n",
"\n",
"num_embeddings = 10000 # Some large number\n",
"for i in range(num_embeddings):\n",
" embedding = openai.Embedding.create(\n",
" embedding = client.embeddings.create(\n",
" input=\"Your text goes here\", model=\"text-embedding-3-small\"\n",
" )[\"data\"][0][\"embedding\"]\n",
" ).data[0].embedding\n",
" print(len(embedding))"
]
},
@ -75,13 +77,14 @@
],
"source": [
"# Best practice\n",
"import openai\n",
"from tenacity import retry, wait_random_exponential, stop_after_attempt\n",
"from openai import OpenAI\n",
"client = OpenAI()\n",
"\n",
"# Retry up to 6 times with exponential backoff, starting at 1 second and maxing out at 20 seconds delay\n",
"@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))\n",
"def get_embedding(text: str, model=\"text-embedding-3-small\") -> list[float]:\n",
" return openai.Embedding.create(input=[text], model=model)[\"data\"][0][\"embedding\"]\n",
" return client.embeddings.create(input=[text], model=model).data[0].embedding\n",
"\n",
"embedding = get_embedding(\"Your text goes here\", model=\"text-embedding-3-small\")\n",
"print(len(embedding))"

@ -0,0 +1,532 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd290eb8-ad4f-461d-b5c5-64c22fc9cc24",
"metadata": {},
"source": [
"# Using Tool Required for Customer Service\n",
"\n",
"The `ChatCompletion` endpoint now includes the ability to specify whether a tool **must** be called every time, by adding `tool_choice='required'` as a parameter. \n",
"\n",
"This adds an element of determinism to how you build your wrapping application, as you can count on a tool being provided with every call. We'll demonstrate here how this can be useful for a contained flow like customer service, where having the ability to define specific exit points gives more control.\n",
"\n",
"The notebook concludes with a multi-turn evaluation, where we spin up a customer GPT to imitate our customer and test the LLM customer service agent we've set up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ba4759e0-ecfd-48f7-bbd8-79ea61aef872",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from openai import OpenAI\n",
"import os\n",
"\n",
"client = OpenAI()\n",
"GPT_MODEL = 'gpt-4-turbo'"
]
},
{
"cell_type": "markdown",
"id": "a33904a9-ba9f-4315-9e77-bb966c641dab",
"metadata": {},
"source": [
"## Config definition\n",
"\n",
"We will define `tools` and `instructions` which our LLM customer service agent will use. It will source the right instructions for the problem the customer is facing, and use those to answer the customer's query.\n",
"\n",
"As this is a demo example, we'll ask the model to make up values where it doesn't have external systems to source info."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "31fd0251-f741-46d6-979b-a2bbc1f95571",
"metadata": {},
"outputs": [],
"source": [
"# The tools our customer service LLM will use to communicate\n",
"tools = [\n",
"{\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"speak_to_user\",\n",
" \"description\": \"Use this to speak to the user to give them information and to ask for anything required for their case.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"message\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Text of message to send to user. Can cover multiple topics.\"\n",
" }\n",
" },\n",
" \"required\": [\"message\"]\n",
" }\n",
" }\n",
"},\n",
"{\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"get_instructions\",\n",
" \"description\": \"Used to get instructions to deal with the user's problem.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"problem\": {\n",
" \"type\": \"string\",\n",
" \"enum\": [\"fraud\",\"refund\",\"information\"],\n",
" \"description\": \"\"\"The type of problem the customer has. Can be one of:\n",
" - fraud: Required to report and resolve fraud.\n",
" - refund: Required to submit a refund request.\n",
" - information: Used for any other informational queries.\"\"\"\n",
" }\n",
" },\n",
" \"required\": [\n",
" \"problem\"\n",
" ]\n",
" }\n",
" }\n",
"}\n",
"]\n",
"\n",
"# Example instructions that the customer service assistant can consult for relevant customer problems\n",
"INSTRUCTIONS = [ {\"type\": \"fraud\",\n",
" \"instructions\": \"\"\"• Ask the customer to describe the fraudulent activity, including the the date and items involved in the suspected fraud.\n",
"• Offer the customer a refund.\n",
"• Report the fraud to the security team for further investigation.\n",
"• Thank the customer for contacting support and invite them to reach out with any future queries.\"\"\"},\n",
" {\"type\": \"refund\",\n",
" \"instructions\": \"\"\"• Confirm the customer's purchase details and verify the transaction in the system.\n",
"• Check the company's refund policy to ensure the request meets the criteria.\n",
"• Ask the customer to provide a reason for the refund.\n",
"• Submit the refund request to the accounting department.\n",
"• Inform the customer of the expected time frame for the refund processing.\n",
"• Thank the customer for contacting support and invite them to reach out with any future queries.\"\"\"},\n",
" {\"type\": \"information\",\n",
" \"instructions\": \"\"\"• Greet the customer and ask how you can assist them today.\n",
"• Listen carefully to the customer's query and clarify if necessary.\n",
"• Provide accurate and clear information based on the customer's questions.\n",
"• Offer to assist with any additional questions or provide further details if needed.\n",
"• Ensure the customer is satisfied with the information provided.\n",
"• Thank the customer for contacting support and invite them to reach out with any future queries.\"\"\" }]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6c0ad691-28f4-4707-8e23-0d0a6c06ea1e",
"metadata": {},
"outputs": [],
"source": [
"assistant_system_prompt = \"\"\"You are a customer service assistant. Your role is to answer user questions politely and competently.\n",
"You should follow these instructions to solve the case:\n",
"- Understand their problem and get the relevant instructions.\n",
"- Follow the instructions to solve the customer's problem. Get their confirmation before performing a permanent operation like a refund or similar.\n",
"- Help them with any other problems or close the case.\n",
"\n",
"Only call a tool once in a single message.\n",
"If you need to fetch a piece of information from a system or document that you don't have access to, give a clear, confident answer with some dummy values.\"\"\"\n",
"\n",
"def submit_user_message(user_query,conversation_messages=[]):\n",
" \"\"\"Message handling function which loops through tool calls until it reaches one that requires a response.\n",
" Once it receives respond=True it returns the conversation_messages to the user.\"\"\"\n",
"\n",
" # Initiate a respond object. This will be set to True by our functions when a response is required\n",
" respond = False\n",
" \n",
" user_message = {\"role\":\"user\",\"content\": user_query}\n",
" conversation_messages.append(user_message)\n",
"\n",
" print(f\"User: {user_query}\")\n",
"\n",
" while respond is False:\n",
"\n",
" # Build a transient messages object to add the conversation messages to\n",
" messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": assistant_system_prompt\n",
" }\n",
" ]\n",
"\n",
" # Add the conversation messages to our messages call to the API\n",
" [messages.append(x) for x in conversation_messages]\n",
"\n",
" # Make the ChatCompletion call with tool_choice='required' so we can guarantee tools will be used\n",
" response = client.chat.completions.create(model=GPT_MODEL\n",
" ,messages=messages\n",
" ,temperature=0\n",
" ,tools=tools\n",
" ,tool_choice='required'\n",
" )\n",
"\n",
" conversation_messages.append(response.choices[0].message)\n",
"\n",
" # Execute the function and get an updated conversation_messages object back\n",
" # If it doesn't require a response, it will ask the assistant again. \n",
" # If not the results are returned to the user.\n",
" respond, conversation_messages = execute_function(response.choices[0].message,conversation_messages)\n",
" \n",
" return conversation_messages\n",
"\n",
"def execute_function(function_calls,messages):\n",
" \"\"\"Wrapper function to execute the tool calls\"\"\"\n",
"\n",
" for function_call in function_calls.tool_calls:\n",
" \n",
" function_id = function_call.id\n",
" function_name = function_call.function.name\n",
" print(f\"Calling function {function_name}\")\n",
" function_arguments = json.loads(function_call.function.arguments)\n",
" \n",
" if function_name == 'get_instructions':\n",
"\n",
" respond = False\n",
" \n",
" instruction_name = function_arguments['problem']\n",
" instructions = INSTRUCTIONS['type' == instruction_name]\n",
" \n",
" messages.append(\n",
" {\n",
" \"tool_call_id\": function_id,\n",
" \"role\": \"tool\",\n",
" \"name\": function_name,\n",
" \"content\": instructions['instructions'],\n",
" }\n",
" )\n",
" \n",
" elif function_name != 'get_instructions':\n",
"\n",
" respond = True\n",
" \n",
" messages.append(\n",
" {\n",
" \"tool_call_id\": function_id,\n",
" \"role\": \"tool\",\n",
" \"name\": function_name,\n",
" \"content\": function_arguments['message'],\n",
" }\n",
" )\n",
" \n",
" print(f\"Assistant: {function_arguments['message']}\")\n",
" \n",
" return (respond, messages)\n",
" "
]
},
{
"cell_type": "markdown",
"id": "ca6502e7-f664-43ba-b15c-962c69091633",
"metadata": {},
"source": [
"## Example\n",
"\n",
"To test this we will run an example for a customer who has experienced fraud, and see how the model handles it.\n",
"\n",
"Play the role of the user and provide plausible next steps to keep the conversation going."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bb1530e4-dd82-4560-bd60-9cc9ac0dab73",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User: Hi, I have had an item stolen that was supposed to be delivered to me yesterday.\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: I'm sorry to hear about the stolen item. Could you please provide me with more details about the fraudulent activity, including the date and the items involved? This information will help us to investigate the issue further and proceed with the necessary actions, including offering you a refund.\n"
]
}
],
"source": [
"messages = submit_user_message(\"Hi, I have had an item stolen that was supposed to be delivered to me yesterday.\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ccff3dd7-d10f-4dc7-9737-6ea5d126e829",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User: For sure, it was a shirt, it was supposed to be delivered yesterday but it never arrived.\n",
"Calling function speak_to_user\n",
"Assistant: Thank you for providing the details. I will now proceed to report this incident to our security team for further investigation and arrange a refund for the stolen shirt. Please confirm if you would like me to go ahead with the refund.\n",
"Calling function speak_to_user\n",
"Assistant: Thank you for contacting us about this issue. Please don't hesitate to reach out if you have any more questions or need further assistance in the future.\n"
]
}
],
"source": [
"messages = submit_user_message(\"For sure, it was a shirt, it was supposed to be delivered yesterday but it never arrived.\",messages)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ce3a8869-8b14-4404-866a-4b540b13235c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User: Yes I would like to proceed with the refund.\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: Thank you for confirming. I have processed the refund for the stolen shirt. The amount should be reflected in your account within 5-7 business days. If you have any more questions or need further assistance, please feel free to contact us.\n"
]
}
],
"source": [
"messages = submit_user_message(\"Yes I would like to proceed with the refund.\",messages)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "87e5cd3e-4edb-426c-8fd9-8fe3bde61bcd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User: Thanks very much.\n",
"Calling function speak_to_user\n",
"Assistant: You're welcome! If you need any more help in the future, don't hesitate to reach out. Have a great day!\n"
]
}
],
"source": [
"messages = submit_user_message(\"Thanks very much.\",messages)"
]
},
{
"cell_type": "markdown",
"id": "fb8d0a0f-ba20-4b78-a961-7431beb9fbce",
"metadata": {},
"source": [
"## Evaluation\n",
"\n",
"Now we'll do a simple evaluation where a GPT will pretend to be our customer. The two will go back and forth until a resolution is reached.\n",
"\n",
"We'll reuse the functions above, adding an `execute_conversation` function where the customer GPT will continue answering."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f4931776-b3ac-4113-98e8-419a0965fd71",
"metadata": {},
"outputs": [],
"source": [
"customer_system_prompt = \"\"\"You are a user calling in to customer service.\n",
"You will talk to the agent until you have a resolution to your query.\n",
"Your query is {query}.\n",
"You will be presented with a conversation - provide answers for any assistant questions you receive. \n",
"Here is the conversation - you are the \"user\" and you are speaking with the \"assistant\":\n",
"{chat_history}\n",
"\n",
"If you don't know the details, respond with dummy values.\n",
"Once your query is resolved, respond with \"DONE\" \"\"\"\n",
"\n",
"# Initiate a bank of questions run through\n",
"questions = ['I want to get a refund for the suit I ordered last Friday.',\n",
" 'Can you tell me what your policy is for returning damaged goods?',\n",
" 'Please tell me what your complaint policy is']"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "22b12f59-4418-4aee-ae92-6c6ebcf0f2d3",
"metadata": {},
"outputs": [],
"source": [
"def execute_conversation(objective):\n",
"\n",
" conversation_messages = []\n",
"\n",
" done = False\n",
"\n",
" user_query = objective\n",
"\n",
" while done is False:\n",
"\n",
" conversation_messages = submit_user_message(user_query,conversation_messages)\n",
"\n",
" messages_string = ''\n",
" for x in conversation_messages:\n",
" if isinstance(x,dict):\n",
" if x['role'] == 'user':\n",
" messages_string += 'User: ' + x['content'] + '\\n'\n",
" elif x['role'] == 'tool':\n",
" if x['name'] == 'speak_to_user':\n",
" messages_string += 'Assistant: ' + x['content'] + '\\n'\n",
" else:\n",
" continue\n",
"\n",
" messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": customer_system_prompt.format(query=objective,chat_history=messages_string)\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"Continue the chat to solve your query. Remember, you are in the user in this exchange. Do not provide User: or Assistant: in your response\"\n",
" }\n",
" ]\n",
"\n",
" user_response = client.chat.completions.create(model=GPT_MODEL,messages=messages,temperature=0.5)\n",
"\n",
" conversation_messages.append({\n",
" \"role\": \"user\",\n",
" \"content\": user_response.choices[0].message.content\n",
" })\n",
"\n",
" if 'DONE' in user_response.choices[0].message.content:\n",
" done = True\n",
" print(\"Achieved objective, closing conversation\\n\\n\")\n",
"\n",
" else:\n",
" user_query = user_response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9d9aac9f-f557-4e7e-b705-adf7d5aa1f3f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User: I want to get a refund for the suit I ordered last Friday.\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: I understand you'd like a refund for the suit you ordered last Friday. Could you please provide more details about the issue with the suit? This will help us process your refund request accurately.\n",
"User: The suit I received is not the color I ordered. I ordered a navy blue suit, but the one I received is black.\n",
"Calling function speak_to_user\n",
"Assistant: Thank you for providing the details. I will proceed with the refund for the navy blue suit that was incorrectly sent as black. Please confirm if you would like me to go ahead with the refund.\n",
"User: Yes, please go ahead with the refund.\n",
"Calling function speak_to_user\n",
"Assistant: The refund for the incorrectly colored suit has been processed. You should see the amount credited back to your original payment method within 5-7 business days. Thank you for contacting us, and if you have any more questions or need further assistance, please feel free to reach out.\n",
"Achieved objective, closing conversation\n",
"\n",
"\n",
"User: Can you tell me what your policy is for returning damaged goods?\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: It seems there was a misunderstanding in my previous request. I'm looking for information on our policy for returning damaged goods. Could you please provide me with the details on how to handle returns for damaged items?\n",
"User: Yes, I'd appreciate that. Could you please tell me the steps I need to follow to return a damaged item?\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: I apologize for the confusion earlier. Here's the correct information regarding our policy for returning damaged goods:\n",
"\n",
"1. Please provide a description of the damage and the item involved.\n",
"2. Include the date of purchase and your order number if available.\n",
"3. You can choose to return the item by mail or in person at one of our stores. Please let us know which method you prefer, and we will provide the necessary details for the return process.\n",
"4. Once we receive the item, we will inspect it and process a refund or exchange based on your preference and our return policy guidelines.\n",
"\n",
"Please let me know if you need further assistance with this process or if there's anything else I can help you with!\n",
"User: I would like to return the item by mail. Could you please provide me with the details on how to do that?\n",
"Calling function speak_to_user\n",
"Assistant: Certainly! To return the damaged item by mail, please follow these steps:\n",
"\n",
"1. Pack the item securely in its original packaging, if possible.\n",
"2. Include all original materials such as manuals, accessories, and receipts.\n",
"3. Address the package to our returns department at the following address:\n",
" [Your Company's Return Address]\n",
"4. Use a trackable shipping service and keep the tracking number for your records.\n",
"5. Once we receive and inspect the item, we will process your refund or exchange as per your request.\n",
"\n",
"Please let me know if you need a prepaid shipping label or any other assistance with your return!\n",
"User: Yes, please, I would need a prepaid shipping label.\n",
"Calling function speak_to_user\n",
"Assistant: I will arrange for a prepaid shipping label to be sent to your email address. Please ensure that the email address we have on file is correct. You should receive the label shortly, which you can print and attach to your return package.\n",
"\n",
"Once you've shipped the item, please provide us with the tracking number so we can monitor the return's progress. If there's anything else you need help with, feel free to ask!\n",
"User: That sounds great, thank you! My email address is correct on file. I'll look out for the prepaid shipping label and send the package back as soon as I receive it.\n",
"Calling function speak_to_user\n",
"Assistant: You're welcome! I'm glad I could assist you. Once you receive the prepaid shipping label, please proceed with the return as discussed. If you have any more questions or need further assistance in the future, don't hesitate to reach out. Have a great day!\n",
"Achieved objective, closing conversation\n",
"\n",
"\n",
"User: Please tell me what your complaint policy is\n",
"Calling function get_instructions\n",
"Calling function speak_to_user\n",
"Assistant: It seems there was a misunderstanding. I will provide you with information about our complaint policy. Please give me a moment to gather the correct details for you.\n",
"User: Sure, I'll wait. Thank you.\n",
"Calling function speak_to_user\n",
"Assistant: Our complaint policy is designed to ensure that any issues you encounter are addressed promptly and fairly. Heres a brief overview:\n",
"\n",
"1. **Submission**: You can submit your complaint through our website, by email, or by calling our customer service hotline.\n",
"2. **Acknowledgment**: We will acknowledge receipt of your complaint within 24 hours.\n",
"3. **Investigation**: We will investigate your complaint thoroughly. This process typically takes 3-5 business days.\n",
"4. **Resolution**: After the investigation, we will contact you with the outcome and any steps we will take to resolve the issue.\n",
"5. **Follow-up**: If you are not satisfied with the resolution, you can request a review of the decision.\n",
"\n",
"Please let me know if you need more detailed information or if there's anything else I can assist you with!\n",
"User: That covers everything I needed to know, thank you!\n",
"Calling function speak_to_user\n",
"Assistant: You're welcome! I'm glad I could help. If you have any more questions in the future or need further assistance, feel free to reach out. Have a great day!\n",
"Achieved objective, closing conversation\n",
"\n",
"\n"
]
}
],
"source": [
"for x in questions:\n",
"\n",
" execute_conversation(x)"
]
},
{
"cell_type": "markdown",
"id": "f8fa6ca4-a776-4207-b440-4ee6fb8ab16a",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"You can now control your LLM's behaviour explicitly by making tool use mandatory, as well as spin up GPT testers to challenge your LLM and to act as automated test cases.\n",
"\n",
"We hope this has given you an appreciation for a great use case for tool use, and look forward to seeing what you build!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "openai_test",
"language": "python",
"name": "openai_test"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

@ -0,0 +1,612 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "w9w5JBaUL-lO"
},
"source": [
"# Multimodal RAG with CLIP Embeddings and GPT-4 Vision\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3CCjcFSiMbvf"
},
"source": [
"Multimodal RAG integrates additional modalities into traditional text-based RAG, enhancing LLMs' question-answering by providing extra context and grounding textual data for improved understanding.\n",
"\n",
"Adopting the approach from the [clothing matchmaker cookbook](https://cookbook.openai.com/examples/how_to_combine_gpt4v_with_rag_outfit_assistant), we directly embed images for similarity search, bypassing the lossy process of text captioning, to boost retrieval accuracy.\n",
"\n",
"Using CLIP-based embeddings further allows fine-tuning with specific data or updating with unseen images.\n",
"\n",
"This technique is showcased through searching an enterprise knowledge base with user-provided tech images to deliver pertinent information."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "T-Mpdxit4x49"
},
"source": [
"# Installations"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nbt3evfHUJTZ"
},
"source": [
"First let's install the relevant packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7hgrcVEl0Ma1"
},
"outputs": [],
"source": [
"#installations\n",
"%pip install clip\n",
"%pip install torch\n",
"%pip install pillow\n",
"%pip install faiss-cpu\n",
"%pip install numpy\n",
"%pip install git+https://github.com/openai/CLIP.git\n",
"%pip install openai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GgrlBLTpT0si"
},
"source": [
"Then let's import all the needed packages.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "pN1cWF-iyLUg"
},
"outputs": [],
"source": [
"# model imports\n",
"import faiss\n",
"import json\n",
"import torch\n",
"from openai import OpenAI\n",
"import torch.nn as nn\n",
"from torch.utils.data import DataLoader\n",
"import clip\n",
"client = OpenAI()\n",
"\n",
"# helper imports\n",
"from tqdm import tqdm\n",
"import json\n",
"import os\n",
"import numpy as np\n",
"import pickle\n",
"from typing import List, Union, Tuple\n",
"\n",
"# visualisation imports\n",
"from PIL import Image\n",
"import matplotlib.pyplot as plt\n",
"import base64"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9fONcWxRqll8"
},
"source": [
"Now let's load the CLIP model."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "_Ua9y98NRk70"
},
"outputs": [],
"source": [
"#load model on device. The device you are running inference/training on is either a CPU or GPU if you have.\n",
"device = \"cpu\"\n",
"model, preprocess = clip.load(\"ViT-B/32\",device=device)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Dev-zjfJ774W"
},
"source": [
"\n",
"We will now:\n",
"1. Create the image embedding database\n",
"2. Set up a query to the vision model\n",
"3. Perform the semantic search\n",
"4. Pass a user query to the image\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5Y1v2jkS42TS"
},
"source": [
"# Create image embedding database"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wVBAMyhesAyi"
},
"source": [
"Next we will create our image embeddings knowledge base from a directory of images. This will be the knowledge base of technology that we search through to provide information to the user for an image they upload.\n",
"\n",
"We pass in the directory in which we store our images (as JPEGs) and loop through each to create our embeddings.\n",
"\n",
"We also have a description.json. This has an entry for every single image in our knowledge base. It has two keys: 'image_path' and 'description'. It maps each image to a useful description of this image to aid in answering the user question."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fDCz76gr8yAu"
},
"source": [
"First let's write a function to get all the image paths in a given directory. We will then get all the jpeg's from a directory called 'image_database'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "vE9i3zLuRk5c"
},
"outputs": [],
"source": [
"def get_image_paths(directory: str, number: int = None) -> List[str]:\n",
" image_paths = []\n",
" count = 0\n",
" for filename in os.listdir(directory):\n",
" if filename.endswith('.jpeg'):\n",
" image_paths.append(os.path.join(directory, filename))\n",
" if number is not None and count == number:\n",
" return [image_paths[-1]]\n",
" count += 1\n",
" return image_paths\n",
"direc = 'image_database/'\n",
"image_paths = get_image_paths(direc)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hMldfjn189vC"
},
"source": [
"Next we will write a function to get the image embeddings from the CLIP model given a series of paths.\n",
"\n",
"We first preprocess the image using the preprocess function we got earlier. This performs a few things to ensure the input to the CLIP model is of the right format and dimensionality including resizing, normalization, colour channel adjustment etc.\n",
"\n",
"We then stack these preprocessed images together so we can pass them into the model at once rather than in a loop. And finally return the model output which is an array of embeddings."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "fd3I_fPh8qvi"
},
"outputs": [],
"source": [
"def get_features_from_image_path(image_paths):\n",
" images = [preprocess(Image.open(image_path).convert(\"RGB\")) for image_path in image_paths]\n",
" image_input = torch.tensor(np.stack(images))\n",
" with torch.no_grad():\n",
" image_features = model.encode_image(image_input).float()\n",
" return image_features\n",
"image_features = get_features_from_image_path(image_paths)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UH_kyZAE-kHe"
},
"source": [
"We can now create our vector database."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "TIeqpndF8tZk"
},
"outputs": [],
"source": [
"index = faiss.IndexFlatIP(image_features.shape[1])\n",
"index.add(image_features)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "swDe1c4v-mbz"
},
"source": [
"And also ingest our json for image-description mapping and create a list of jsons. We also create a helper function to search through this list for a given image we want, so we can obtain the description of that image"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "tdjlXQqC8uNE"
},
"outputs": [],
"source": [
"data = []\n",
"image_path = 'train1.jpeg'\n",
"with open('description.json', 'r') as file:\n",
" for line in file:\n",
" data.append(json.loads(line))\n",
"def find_entry(data, key, value):\n",
" for entry in data:\n",
" if entry.get(key) == value:\n",
" return entry\n",
" return None"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fJXfCtPD5_63"
},
"source": [
"Let us display an example image, this will be the user uploaded image. This is a piece of tech that was unveiled at the 2024 CES. It is the DELTA Pro Ultra Whole House Battery Generator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RtkZ7W3g5sED"
},
"outputs": [],
"source": [
"im = Image.open(image_path)\n",
"plt.imshow(im)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5ivECCKSdbBy"
},
"source": [
"![Delta Pro](../images/train1.jpeg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sidjylki7Kye"
},
"source": [
"# Querying the vision model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "H8O7X6ml7t38"
},
"source": [
"Now let's have a look at what GPT-4 Vision (which wouldn't have seen this technology before) will label it as.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r4uDjS-gQAqm"
},
"source": [
"First we will need to write a function to encode our image in base64 as this is the format we will pass into the vision model. Then we will create a generic image_query function to allow us to query the LLM with an image input."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "87gf6_xO8Y4i",
"outputId": "99be865f-12e8-4ef0-c2f5-5fd6e5c787f3"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'Autonomous Delivery Robot'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def encode_image(image_path):\n",
" with open(image_path, 'rb') as image_file:\n",
" encoded_image = base64.b64encode(image_file.read())\n",
" return encoded_image.decode('utf-8')\n",
"\n",
"def image_query(query, image_path):\n",
" response = client.chat.completions.create(\n",
" model='gpt-4-vision-preview',\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"text\",\n",
" \"text\": query,\n",
" },\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": f\"data:image/jpeg;base64,{encode_image(image_path)}\",\n",
" },\n",
" }\n",
" ],\n",
" }\n",
" ],\n",
" max_tokens=300,\n",
" )\n",
" # Extract relevant features from the response\n",
" return response.choices[0].message.content\n",
"image_query('Write a short label of what is show in this image?', image_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yfG_7c-jQAqm"
},
"source": [
"As we can see, it tries its best from the information it's been trained on but it makes a mistake due to it not having seen anything similar in its training data. This is because it is an ambiguous image making it difficult to extrapolate and deduce."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "szWZqTqf7SrA"
},
"source": [
"# Performing semantic search"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eV8LaOncGH3j"
},
"source": [
"Now let's perform similarity search to find the two most similar images in our knowledge base. We do this by getting the embeddings of a user inputted image_path, retrieving the indexes and distances of the similar iamges in our database. Distance will be our proxy metric for similarity and a smaller distance means more similar. We then sort based on distance in descending order."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "GzNEhKJ04D-F"
},
"outputs": [],
"source": [
"image_search_embedding = get_features_from_image_path([image_path])\n",
"distances, indices = index.search(image_search_embedding.reshape(1, -1), 2) #2 signifies the number of topmost similar images to bring back\n",
"distances = distances[0]\n",
"indices = indices[0]\n",
"indices_distances = list(zip(indices, distances))\n",
"indices_distances.sort(key=lambda x: x[1], reverse=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0O-GYQ-1QAqm"
},
"source": [
"We require the indices as we will use this to serach through our image_directory and selecting the image at the location of the index to feed into the vision model for RAG."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9-6SVzwSJVuT"
},
"source": [
"And let's see what it brought back (we display these in order of similarity):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Lt1ZYuKDFeww"
},
"outputs": [],
"source": [
"#display similar images\n",
"for idx, distance in indices_distances:\n",
" print(idx)\n",
" path = get_image_paths(direc, idx)[0]\n",
" im = Image.open(path)\n",
" plt.imshow(im)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GPTvKIUJ2tgz"
},
"source": [
"![Delta Pro2](../images/train2.jpeg)\n",
"\n",
"![Delta Pro3](../images/train17.jpeg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x4kF2-MJQAqm"
},
"source": [
"We can see here it brought back two images which contain the DELTA Pro Ultra Whole House Battery Generator. In one of the images it also has some background which could be distracting but manages to find the right image."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Qc2sOKzY7yv3"
},
"source": [
"# User querying the most similar image"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8Sio6OR4MDjI"
},
"source": [
"Now for our most similar image, we want to pass it and the description of it to gpt-v with a user query so they can inquire about the technology that they may have bought. This is where the power of the vision model comes in, where you can ask general queries for which the model hasn't been explicitly trained on to the model and it responds with high accuracy."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uPzsRk66QAqn"
},
"source": [
"In our example below, we will inquire as to the capacity of the item in question."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
},
"id": "-_5W_xwitbr3",
"outputId": "99a40617-0153-492a-d8b0-6782b8421e40"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'The portable home battery DELTA Pro has a base capacity of 3.6kWh. This capacity can be expanded up to 25kWh with additional batteries. The image showcases the DELTA Pro, which has an impressive 3600W power capacity for AC output as well.'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"similar_path = get_image_paths(direc, indices_distances[0][0])[0]\n",
"element = find_entry(data, 'image_path', similar_path)\n",
"\n",
"user_query = 'What is the capacity of this item?'\n",
"prompt = f\"\"\"\n",
"Below is a user query, I want you to answer the query using the description and image provided.\n",
"\n",
"user query:\n",
"{user_query}\n",
"\n",
"description:\n",
"{element['description']}\n",
"\"\"\"\n",
"image_query(prompt, similar_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VIInamGaAG9L"
},
"source": [
"And we see it is able to answer the question. This was only possible by matching images directly and from there gathering the relevant description as context."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ljrf0VKR_2q9"
},
"source": [
"# Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PexvxTF5_7ay"
},
"source": [
"In this notebook, we have gone through how to use the CLIP model, an example of creating an image embedding database using the CLIP model, performing semantic search and finally providing a user query to answer the question."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gOgRBeh6eMiq"
},
"source": [
"The applications of this pattern of usage spread across many different application domains and this is easily improved to further enhance the technique. For example you may finetune CLIP, you may improve the retrieval process just like in RAG and you can prompt engineer GPT-V.\n"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

@ -0,0 +1,253 @@
Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software which enable machines to perceive their environment and uses learning and intelligence to take actions that maximize their chances of achieving defined goals.[1] Such machines may be called AIs.
AI technology is widely used throughout industry, government, and science. Some high-profile applications include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); interacting via human speech (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go).[2] However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[3][4]
Alan Turing was the first person to conduct substantial research in the field that he called machine intelligence.[5] Artificial intelligence was founded as an academic discipline in 1956.[6] The field went through multiple cycles of optimism,[7][8] followed by periods of disappointment and loss of funding, known as AI winter.[9][10] Funding and interest vastly increased after 2012 when deep learning surpassed all previous AI techniques,[11] and after 2017 with the transformer architecture.[12] This led to the AI boom of the early 2020s, with companies, universities, and laboratories overwhelmingly based in the United States pioneering significant advances in artificial intelligence.[13]
The growing use of artificial intelligence in the 21st century is influencing a societal and economic shift towards increased automation, data-driven decision-making, and the integration of AI systems into various economic sectors and areas of life, impacting job markets, healthcare, government, industry, and education. This raises questions about the long-term effects, ethical implications, and risks of AI, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.
The various sub-fields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performable by a human on an at least equal level—is among the field's long-term goals.[14]
To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[15]
Goals
The general problem of simulating (or creating) intelligence has been broken into sub-problems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.[a]
Reasoning and problem solving
Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions.[16] By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.[17]
Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": they became exponentially slower as the problems grew larger.[18] Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.[19] Accurate and efficient reasoning is an unsolved problem.
Knowledge representation
An ontology represents knowledge as a set of concepts within a domain and the relationships between those concepts.
Knowledge representation and knowledge engineering[20] allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval,[21] scene interpretation,[22] clinical decision support,[23] knowledge discovery (mining "interesting" and actionable inferences from large databases),[24] and other areas.[25]
A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge.[26] Knowledge bases need to represent things such as: objects, properties, categories and relations between objects;[27] situations, events, states and time;[28] causes and effects;[29] knowledge about knowledge (what we know about what other people know);[30] default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing);[31] and many other aspects and domains of knowledge.
Among the most difficult problems in knowledge representation are: the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous);[32] and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally).[19] There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.[c]
Planning and decision making
An "agent" is anything that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen.[d][35] In automated planning, the agent has a specific goal.[36] In automated decision making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.[37]
In classical planning, the agent knows exactly what the effect of any action will be.[38] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[39]
In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning) or the agent can seek information to improve its preferences.[40] Information value theory can be used to weigh the value of exploratory or experimental actions.[41] The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain what the outcome will be.
A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way, and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned.[42]
Game theory describes rational behavior of multiple interacting agents, and is used in AI programs that make decisions that involve other agents.[43]
Learning
Machine learning is the study of programs that can improve their performance on a given task automatically.[44] It has been a part of AI from the beginning.[e]
There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance.[47] Supervised learning requires a human to label the input data first, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).[48]
In reinforcement learning the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good".[49] Transfer learning is when the knowledge gained from one problem is applied to a new problem.[50] Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning.[51]
Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization.[52]
Natural language processing
Natural language processing (NLP)[53] allows programs to read, write and communicate in human languages such as English. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering.[54]
Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation[f] unless restricted to small domains called "micro-worlds" (due to the common sense knowledge problem[32]). Margaret Masterman believed that it was meaning, and not grammar that was the key to understanding languages, and that thesauri and not dictionaries should be the basis of computational language structure.
Modern deep learning techniques for NLP include word embedding (representing words, typically as vectors encoding their meaning),[55] transformers (a deep learning architecture using an attention mechanism),[56] and others.[57] In 2019, generative pre-trained transformer (or "GPT") language models began to generate coherent text,[58][59] and by 2023 these models were able to get human-level scores on the bar exam, SAT test, GRE test, and many other real-world applications.[60]
Perception
Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.[61]
The field includes speech recognition,[62] image classification,[63] facial recognition, object recognition,[64] and robotic perception.[65]
Social intelligence
Kismet, a robot head which was made in the 1990s; a machine that can recognize and simulate emotions.[66]
Affective computing is an interdisciplinary umbrella that comprises systems that recognize, interpret, process or simulate human feeling, emotion and mood.[67] For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate humancomputer interaction.
However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents.[68] Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the affects displayed by a videotaped subject.[69]
General intelligence
A machine with artificial general intelligence should be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.[14]
Techniques
AI research uses a wide variety of techniques to accomplish the goals above.[b]
Search and optimization
AI can solve many problems by intelligently searching through many possible solutions.[70] There are two very different kinds of search used in AI: state space search and local search.
State space search
State space search searches through a tree of possible states to try to find a goal state.[71] For example, planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis.[72]
Simple exhaustive searches[73] are rarely sufficient for most real-world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes.[18] "Heuristics" or "rules of thumb" can help to prioritize choices that are more likely to reach a goal.[74]
Adversarial search is used for game-playing programs, such as chess or Go. It searches through a tree of possible moves and counter-moves, looking for a winning position.[75]
Local search
Illustration of gradient descent for 3 different starting points. Two parameters (represented by the plan coordinates) are adjusted in order to minimize the loss function (the height).
Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.[76]
Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks.[77]
Another type of local search is evolutionary computation, which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, selecting only the fittest to survive each generation.[78]
Distributed search processes can coordinate via swarm intelligence algorithms. Two popular swarm algorithms used in search are particle swarm optimization (inspired by bird flocking) and ant colony optimization (inspired by ant trails).[79]
Logic
Formal logic is used for reasoning and knowledge representation.[80] Formal logic comes in two main forms: propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies")[81] and predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").[82]
Deductive reasoning in logic is the process of proving a new statement (conclusion) from other statements that are given and assumed to be true (the premises).[83] Proofs can be structured as proof trees, in which nodes are labelled by sentences, and children nodes are connected to parent nodes by inference rules.
Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose leaf nodes are labelled by premises or axioms. In the case of Horn clauses, problem-solving search can be performed by reasoning forwards from the premises or backwards from the problem.[84] In the more general case of the clausal form of first-order logic, resolution is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.[85]
Inference in both Horn clause logic and first-order logic is undecidable, and therefore intractable. However, backward reasoning with Horn clauses, which underpins computation in the logic programming language Prolog, is Turing complete. Moreover, its efficiency is competitive with computation in other symbolic programming languages.[86]
Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.[87]
Non-monotonic logics, including logic programming with negation as failure, are designed to handle default reasoning.[31] Other specialized versions of logic have been developed to describe many complex domains.
Probabilistic methods for uncertain reasoning
A simple Bayesian network, with the associated conditional probability tables
Many problems in AI (including in reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics.[88] Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using decision theory, decision analysis,[89] and information value theory.[90] These tools include models such as Markov decision processes,[91] dynamic decision networks,[92] game theory and mechanism design.[93]
Bayesian networks[94] are a tool that can be used for reasoning (using the Bayesian inference algorithm),[g][96] learning (using the expectation-maximization algorithm),[h][98] planning (using decision networks)[99] and perception (using dynamic Bayesian networks).[92]
Probabilistic algorithms can also be used for filtering, prediction, smoothing and finding explanations for streams of data, helping perception systems to analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).[92]
Expectation-maximization clustering of Old Faithful eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption.
Classifiers and statistical learning methods
The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. Classifiers[100] are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.[48]
There are many kinds of classifiers in use. The decision tree is the simplest and most widely used symbolic machine learning algorithm.[101] K-nearest neighbor algorithm was the most widely used analogical AI until the mid-1990s, and Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.[102] The naive Bayes classifier is reportedly the "most widely used learner"[103] at Google, due in part to its scalability.[104] Neural networks are also used as classifiers.[105]
Artificial neural networks
A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.
An artificial neural network is based on a collection of nodes also known as artificial neurons, which loosely model the neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.[105]
Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[106] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[107]
In feedforward neural networks the signal passes in only one direction.[108] Recurrent neural networks feed the output signal back into the input, which allows short-term memories of previous input events. Long short term memory is the most successful network architecture for recurrent networks.[109] Perceptrons[110] use only a single layer of neurons, deep learning[111] uses multiple layers. Convolutional neural networks strengthen the connection between neurons that are "close" to each other—this is especially important in image processing, where a local set of neurons must identify an "edge" before the network can identify an object.[112]
Deep learning
Deep learning[111] uses several layers of neurons between the network's inputs and outputs. The multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.[113]
Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including computer vision, speech recognition, natural language processing, image classification[114] and others. The reason that deep learning performs so well in so many applications is not known as of 2023.[115] The sudden success of deep learning in 20122015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s)[i] but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.[j]
GPT
Generative pre-trained transformers (GPT) are large language models that are based on the semantic relationships between words in sentences (natural language processing). Text-based GPT models are pre-trained on a large corpus of text which can be from the internet. The pre-training consists in predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pre-training, GPT models accumulate knowledge about the world, and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are still prone to generating falsehoods called "hallucinations", although this can be reduced with RLHF and quality data. They are used in chatbots, which allow you to ask a question or request a task in simple text.[124][125]
Current models and services include: Gemini (formerly Bard), ChatGPT, Grok, Claude, Copilot and LLaMA.[126] Multimodal GPT models can process different types of data (modalities) such as images, videos, sound and text.[127]
Specialized hardware and software
Main articles: Programming languages for artificial intelligence and Hardware for artificial intelligence
In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software, had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training.[128] Historically, specialized languages, such as Lisp, Prolog, Python and others, had been used.
Applications
Main article: Applications of artificial intelligence
AI and machine learning technology is used in most of the essential applications of the 2020s, including: search engines (such as Google Search), targeting online advertisements, recommendation systems (offered by Netflix, YouTube or Amazon), driving internet traffic, targeted advertising (AdSense, Facebook), virtual assistants (such as Siri or Alexa), autonomous vehicles (including drones, ADAS and self-driving cars), automatic language translation (Microsoft Translator, Google Translate), facial recognition (Apple's Face ID or Microsoft's DeepFace and Google's FaceNet) and image labeling (used by Facebook, Apple's iPhoto and TikTok).
Health and medicine
Main article: Artificial intelligence in healthcare
The application of AI in medicine and medical research has the potential to increase patient care and quality of life.[129] Through the lens of the Hippocratic Oath, medical professionals are ethically compelled to use AI, if applications can more accurately diagnose and treat patients.
For medical research, AI is an important tool for processing and integrating big data. This is particularly important for organoid and tissue engineering development which use microscopy imaging as a key technique in fabrication.[130] It has been suggested that AI can overcome discrepancies in funding allocated to different fields of research.[130] New AI tools can deepen our understanding of biomedically relevant pathways. For example, AlphaFold 2 (2021) demonstrated the ability to approximate, in hours rather than months, the 3D structure of a protein.[131] In 2023, it was reported that AI guided drug discovery helped find a class of antibiotics capable of killing two different types of drug-resistant bacteria.[132]
Games
Main article: Game artificial intelligence
Game playing programs have been used since the 1950s to demonstrate and test AI's most advanced techniques.[133] Deep Blue became the first computer chess-playing system to beat a reigning world chess champion, Garry Kasparov, on 11 May 1997.[134] In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin.[135] In March 2016, AlphaGo won 4 out of 5 games of Go in a match with Go champion Lee Sedol, becoming the first computer Go-playing system to beat a professional Go player without handicaps. Then in 2017 it defeated Ke Jie, who was the best Go player in the world.[136] Other programs handle imperfect-information games, such as the poker-playing program Pluribus.[137] DeepMind developed increasingly generalistic reinforcement learning models, such as with MuZero, which could be trained to play chess, Go, or Atari games.[138] In 2019, DeepMind's AlphaStar achieved grandmaster level in StarCraft II, a particularly challenging real-time strategy game that involves incomplete knowledge of what happens on the map.[139] In 2021 an AI agent competed in a PlayStation Gran Turismo competition, winning against four of the world's best Gran Turismo drivers using deep reinforcement learning.[140]
Military
Main article: Military artificial intelligence
Various countries are deploying AI military applications.[141] The main applications enhance command and control, communications, sensors, integration and interoperability.[142] Research is targeting intelligence collection and analysis, logistics, cyber operations, information operations, and semiautonomous and autonomous vehicles.[141] AI technologies enable coordination of sensors and effectors, threat detection and identification, marking of enemy positions, target acquisition, coordination and deconfliction of distributed Joint Fires between networked combat vehicles involving manned and unmanned teams.[142] AI was incorporated into military operations in Iraq and Syria.[141]
In November 2023, US Vice President Kamala Harris disclosed a declaration signed by 31 nations to set guardrails for the military use of AI. The commitments include using legal reviews to ensure the compliance of military AI with international laws, and being cautious and transparent in the development of this technology.[143]
Generative AI
Main article: Generative artificial intelligence
Vincent van Gogh in watercolour created by generative AI software
In the early 2020s, generative AI gained widespread prominence. In March 2023, 58% of US adults had heard about ChatGPT and 14% had tried it.[144] The increasing realism and ease-of-use of AI-based text-to-image generators such as Midjourney, DALL-E, and Stable Diffusion sparked a trend of viral AI-generated photos. Widespread attention was gained by a fake photo of Pope Francis wearing a white puffer coat, the fictional arrest of Donald Trump, and a hoax of an attack on the Pentagon, as well as the usage in professional creative arts.[145][146]
Industry-specific tasks
There are also thousands of successful AI applications used to solve specific problems for specific industries or institutions. In a 2017 survey, one in five companies reported they had incorporated "AI" in some offerings or processes.[147] A few examples are energy storage, medical diagnosis, military logistics, applications that predict the result of judicial decisions, foreign policy, or supply chain management.
In agriculture, AI has helped farmers identify areas that need irrigation, fertilization, pesticide treatments or increasing yield. Agronomists use AI to conduct research and development. AI has been used to predict the ripening time for crops such as tomatoes, monitor soil moisture, operate agricultural robots, conduct predictive analytics, classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and save water.
Artificial intelligence is used in astronomy to analyze increasing amounts of available data and applications, mainly for "classification, regression, clustering, forecasting, generation, discovery, and the development of new scientific insights" for example for discovering exoplanets, forecasting solar activity, and distinguishing between signals and instrumental effects in gravitational wave astronomy. It could also be used for activities in space such as space exploration, including analysis of data from space missions, real-time science decisions of spacecraft, space debris avoidance, and more autonomous operation.
Ethics
Main article: Ethics of artificial intelligence
AI has potential benefits and potential risks. AI may be able to advance science and find solutions for serious problems: Demis Hassabis of Deep Mind hopes to "solve intelligence, and then use that to solve everything else".[148] However, as the use of AI has become widespread, several unintended consequences and risks have been identified.[149] In-production systems can sometimes not factor ethics and bias into their AI training processes, especially when the AI algorithms are inherently unexplainable in deep learning.[150]
Risks and harm
Privacy and copyright
Further information: Information privacy and Artificial intelligence and copyright
Machine-learning algorithms require large amounts of data. The techniques used to acquire this data have raised concerns about privacy, surveillance and copyright.
Technology companies collect a wide range of data from their users, including online activity, geolocation data, video and audio.[151] For example, in order to build speech recognition algorithms, Amazon has recorded millions of private conversations and allowed temporary workers to listen to and transcribe some of them.[152] Opinions about this widespread surveillance range from those who see it as a necessary evil to those for whom it is clearly unethical and a violation of the right to privacy.[153]
AI developers argue that this is the only way to deliver valuable applications. and have developed several techniques that attempt to preserve privacy while still obtaining the data, such as data aggregation, de-identification and differential privacy.[154] Since 2016, some privacy experts, such as Cynthia Dwork, have begun to view privacy in terms of fairness. Brian Christian wrote that experts have pivoted "from the question of 'what they know' to the question of 'what they're doing with it'."[155]
Generative AI is often trained on unlicensed copyrighted works, including in domains such as images or computer code; the output is then used under the rationale of "fair use". Website owners who do not wish to have their copyrighted content AI-indexed or 'scraped' can add code to their site if they do not want their website to be indexed by a search engine, which is currently available through certain services such as OpenAI. Experts disagree about how well and under what circumstances this rationale will hold up in courts of law; relevant factors may include "the purpose and character of the use of the copyrighted work" and "the effect upon the potential market for the copyrighted work".[156] In 2023, leading authors (including John Grisham and Jonathan Franzen) sued AI companies for using their work to train generative AI.[157][158]
Misinformation
See also: YouTube § Moderation and offensive content
YouTube, Facebook and others use recommender systems to guide users to more content. These AI programs were given the goal of maximizing user engagement (that is, the only goal was to keep people watching). The AI learned that users tended to choose misinformation, conspiracy theories, and extreme partisan content, and, to keep them watching, the AI recommended more of it. Users also tended to watch more content on the same subject, so the AI led people into filter bubbles where they received multiple versions of the same misinformation.[159] This convinced many users that the misinformation was true, and ultimately undermined trust in institutions, the media and the government.[160] The AI program had correctly learned to maximize its goal, but the result was harmful to society. After the U.S. election in 2016, major technology companies took steps to mitigate the problem.
In 2022, generative AI began to create images, audio, video and text that are indistinguishable from real photographs, recordings, films or human writing. It is possible for bad actors to use this technology to create massive amounts of misinformation or propaganda.[161] AI pioneer Geoffrey Hinton expressed concern about AI enabling "authoritarian leaders to manipulate their electorates" on a large scale, among other risks.[162]
Algorithmic bias and fairness
Main articles: Algorithmic bias and Fairness (machine learning)
Machine learning applications will be biased if they learn from biased data.[163] The developers may not be aware that the bias exists.[164] Bias can be introduced by the way training data is selected and by the way a model is deployed.[165][163] If a biased algorithm is used to make decisions that can seriously harm people (as it can in medicine, finance, recruitment, housing or policing) then the algorithm may cause discrimination.[166] Fairness in machine learning is the study of how to prevent the harm caused by algorithmic bias. It has become serious area of academic study within AI. Researchers have discovered it is not always possible to define "fairness" in a way that satisfies all stakeholders.[167]
On June 28, 2015, Google Photos's new image labeling feature mistakenly identified Jacky Alcine and a friend as "gorillas" because they were black. The system was trained on a dataset that contained very few images of black people,[168] a problem called "sample size disparity".[169] Google "fixed" this problem by preventing the system from labelling anything as a "gorilla". Eight years later, in 2023, Google Photos still could not identify a gorilla, and neither could similar products from Apple, Facebook, Microsoft and Amazon.[170]
COMPAS is a commercial program widely used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. In 2016, Julia Angwin at ProPublica discovered that COMPAS exhibited racial bias, despite the fact that the program was not told the races of the defendants. Although the error rate for both whites and blacks was calibrated equal at exactly 61%, the errors for each race were different—the system consistently overestimated the chance that a black person would re-offend and would underestimate the chance that a white person would not re-offend.[171] In 2017, several researchers[k] showed that it was mathematically impossible for COMPAS to accommodate all possible measures of fairness when the base rates of re-offense were different for whites and blacks in the data.[173]
A program can make biased decisions even if the data does not explicitly mention a problematic feature (such as "race" or "gender"). The feature will correlate with other features (like "address", "shopping history" or "first name"), and the program will make the same decisions based on these features as it would on "race" or "gender".[174] Moritz Hardt said "the most robust fact in this research area is that fairness through blindness doesn't work."[175]
Criticism of COMPAS highlighted that machine learning models are designed to make "predictions" that are only valid if we assume that the future will resemble the past. If they are trained on data that includes the results of racist decisions in the past, machine learning models must predict that racist decisions will be made in the future. If an application then uses these predictions as recommendations, some of these "recommendations" will likely be racist.[176] Thus, machine learning is not well suited to help make decisions in areas where there is hope that the future will be better than the past. It is necessarily descriptive and not proscriptive.[l]
Bias and unfairness may go undetected because the developers are overwhelmingly white and male: among AI engineers, about 4% are black and 20% are women.[169]
At its 2022 Conference on Fairness, Accountability, and Transparency (ACM FAccT 2022), the Association for Computing Machinery, in Seoul, South Korea, presented and published findings that recommend that until AI and robotics systems are demonstrated to be free of bias mistakes, they are unsafe, and the use of self-learning neural networks trained on vast, unregulated sources of flawed internet data should be curtailed.[178]
Lack of transparency
See also: Explainable AI, Algorithmic transparency, and Right to explanation
Lidar testing vehicle for autonomous driving
Many AI systems are so complex that their designers cannot explain how they reach their decisions.[179] Particularly with deep neural networks, in which there are a large amount of non-linear relationships between inputs and outputs. But some popular explainability techniques exist.[180]
It is impossible to be certain that a program is operating correctly if no one knows how exactly it works. There have been many cases where a machine learning program passed rigorous tests, but nevertheless learned something different than what the programmers intended. For example, a system that could identify skin diseases better than medical professionals was found to actually have a strong tendency to classify images with a ruler as "cancerous", because pictures of malignancies typically include a ruler to show the scale.[181] Another machine learning system designed to help effectively allocate medical resources was found to classify patients with asthma as being at "low risk" of dying from pneumonia. Having asthma is actually a severe risk factor, but since the patients having asthma would usually get much more medical care, they were relatively unlikely to die according to the training data. The correlation between asthma and low risk of dying from pneumonia was real, but misleading.[182]
People who have been harmed by an algorithm's decision have a right to an explanation.[183] Doctors, for example, are expected to clearly and completely explain to their colleagues the reasoning behind any decision they make. Early drafts of the European Union's General Data Protection Regulation in 2016 included an explicit statement that this right exists.[m] Industry experts noted that this is an unsolved problem with no solution in sight. Regulators argued that nevertheless the harm is real: if the problem has no solution, the tools should not be used.[184]
DARPA established the XAI ("Explainable Artificial Intelligence") program in 2014 to try and solve these problems.[185]
There are several possible solutions to the transparency problem. SHAP tried to solve the transparency problems by visualising the contribution of each feature to the output.[186] LIME can locally approximate a model with a simpler, interpretable model.[187] Multitask learning provides a large number of outputs in addition to the target classification. These other outputs can help developers deduce what the network has learned.[188] Deconvolution, DeepDream and other generative methods can allow developers to see what different layers of a deep network have learned and produce output that can suggest what the network is learning.[189]
Bad actors and weaponized AI
Main articles: Lethal autonomous weapon, Artificial intelligence arms race, and AI safety
Artificial intelligence provides a number of tools that are useful to bad actors, such as authoritarian governments, terrorists, criminals or rogue states.
A lethal autonomous weapon is a machine that locates, selects and engages human targets without human supervision.[n] Widely available AI tools can be used by bad actors to develop inexpensive autonomous weapons and, if produced at scale, they are potentially weapons of mass destruction.[191] Even when used in conventional warfare, it is unlikely that they will be unable to reliably choose targets and could potentially kill an innocent person.[191] In 2014, 30 nations (including China) supported a ban on autonomous weapons under the United Nations' Convention on Certain Conventional Weapons, however the United States and others disagreed.[192] By 2015, over fifty countries were reported to be researching battlefield robots.[193]
AI tools make it easier for authoritarian governments to efficiently control their citizens in several ways. Face and voice recognition allow widespread surveillance. Machine learning, operating this data, can classify potential enemies of the state and prevent them from hiding. Recommendation systems can precisely target propaganda and misinformation for maximum effect. Deepfakes and generative AI aid in producing misinformation. Advanced AI can make authoritarian centralized decision making more competitive than liberal and decentralized systems such as markets. It lowers the cost and difficulty of digital warfare and advanced spyware.[194] All these technologies have been available since 2020 or earlier -- AI facial recognition systems are already being used for mass surveillance in China.[195][196]
There many other ways that AI is expected to help bad actors, some of which can not be foreseen. For example, machine-learning AI is able to design tens of thousands of toxic molecules in a matter of hours.[197]
Reliance on industry giants
Training AI systems requires an enormous amount of computing power. Usually only Big Tech companies have the financial resources to make such investments. Smaller startups such as Cohere and OpenAI end up buying access to data centers from Google and Microsoft respectively.[198]
Technological unemployment
Main articles: Workplace impact of artificial intelligence and Technological unemployment
Economists have frequently highlighted the risks of redundancies from AI, and speculated about unemployment if there is no adequate social policy for full employment.[199]
In the past, technology has tended to increase rather than reduce total employment, but economists acknowledge that "we're in uncharted territory" with AI.[200] A survey of economists showed disagreement about whether the increasing use of robots and AI will cause a substantial increase in long-term unemployment, but they generally agree that it could be a net benefit if productivity gains are redistributed.[201] Risk estimates vary; for example, in the 2010s, Michael Osborne and Carl Benedikt Frey estimated 47% of U.S. jobs are at "high risk" of potential automation, while an OECD report classified only 9% of U.S. jobs as "high risk".[o][203] The methodology of speculating about future employment levels has been criticised as lacking evidential foundation, and for implying that technology, rather than social policy, creates unemployment, as opposed to redundancies.[199] In April 2023, it was reported that 70% of the jobs for Chinese video game illustrators had been eliminated by generative artificial intelligence.[204][205]
Unlike previous waves of automation, many middle-class jobs may be eliminated by artificial intelligence; The Economist stated in 2015 that "the worry that AI could do to white-collar jobs what steam power did to blue-collar ones during the Industrial Revolution" is "worth taking seriously".[206] Jobs at extreme risk range from paralegals to fast food cooks, while job demand is likely to increase for care-related professions ranging from personal healthcare to the clergy.[207]
From the early days of the development of artificial intelligence, there have been arguments, for example, those put forward by Joseph Weizenbaum, about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculation and qualitative, value-based judgement.[208]
Existential risk
Main article: Existential risk from artificial general intelligence
It has been argued AI will become so powerful that humanity may irreversibly lose control of it. This could, as physicist Stephen Hawking stated, "spell the end of the human race".[209] This scenario has been common in science fiction, when a computer or robot suddenly develops a human-like "self-awareness" (or "sentience" or "consciousness") and becomes a malevolent character.[p] These sci-fi scenarios are misleading in several ways.
First, AI does not require human-like "sentience" to be an existential risk. Modern AI programs are given specific goals and use learning and intelligence to achieve them. Philosopher Nick Bostrom argued that if one gives almost any goal to a sufficiently powerful AI, it may choose to destroy humanity to achieve it (he used the example of a paperclip factory manager).[211] Stuart Russell gives the example of household robot that tries to find a way to kill its owner to prevent it from being unplugged, reasoning that "you can't fetch the coffee if you're dead."[212] In order to be safe for humanity, a superintelligence would have to be genuinely aligned with humanity's morality and values so that it is "fundamentally on our side".[213]
Second, Yuval Noah Harari argues that AI does not require a robot body or physical control to pose an existential risk. The essential parts of civilization are not physical. Things like ideologies, law, government, money and the economy are made of language; they exist because there are stories that billions of people believe. The current prevalence of misinformation suggests that an AI could use language to convince people to believe anything, even to take actions that are destructive.[214]
The opinions amongst experts and industry insiders are mixed, with sizable fractions both concerned and unconcerned by risk from eventual superintelligent AI.[215] Personalities such as Stephen Hawking, Bill Gates, and Elon Musk have expressed concern about existential risk from AI.[216] AI pioneers including Fei-Fei Li, Geoffrey Hinton, Yoshua Bengio, Cynthia Breazeal, Rana el Kaliouby, Demis Hassabis, Joy Buolamwini, and Sam Altman have expressed concerns about the risks of AI. In 2023, many leading AI experts issued the joint statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".[217]
Other researchers, however, spoke in favor of a less dystopian view. AI pioneer Juergen Schmidhuber did not sign the joint statement, emphasising that in 95% of all cases, AI research is about making "human lives longer and healthier and easier."[218] While the tools that are now being used to improve lives can also be used by bad actors, "they can also be used against the bad actors."[219][220] Andrew Ng also argued that "it's a mistake to fall for the doomsday hype on AI—and that regulators who do will only benefit vested interests."[221] Yann LeCun "scoffs at his peers' dystopian scenarios of supercharged misinformation and even, eventually, human extinction."[222] In the early 2010s, experts argued that the risks are too distant in the future to warrant research or that humans will be valuable from the perspective of a superintelligent machine.[223] However, after 2016, the study of current and future risks and possible solutions became a serious area of research.[224]
Ethical machines and alignment
Main articles: Machine ethics, AI safety, Friendly artificial intelligence, Artificial moral agents, and Human Compatible
Friendly AI are machines that have been designed from the beginning to minimize risks and to make choices that benefit humans. Eliezer Yudkowsky, who coined the term, argues that developing friendly AI should be a higher research priority: it may require a large investment and it must be completed before AI becomes an existential risk.[225]
Machines with intelligence have the potential to use their intelligence to make ethical decisions. The field of machine ethics provides machines with ethical principles and procedures for resolving ethical dilemmas.[226] The field of machine ethics is also called computational morality,[226] and was founded at an AAAI symposium in 2005.[227]
Other approaches include Wendell Wallach's "artificial moral agents"[228] and Stuart J. Russell's three principles for developing provably beneficial machines.[229]
Frameworks
Artificial Intelligence projects can have their ethical permissibility tested while designing, developing, and implementing an AI system. An AI framework such as the Care and Act Framework containing the SUM values—developed by the Alan Turing Institute tests projects in four main areas:[230][231]
RESPECT the dignity of individual people
CONNECT with other people sincerely, openly and inclusively
CARE for the wellbeing of everyone
PROTECT social values, justice and the public interest
Other developments in ethical frameworks include those decided upon during the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, among others;[232] however, these principles do not go without their criticisms, especially regards to the people chosen contributes to these frameworks.[233]
Promotion of the wellbeing of the people and communities that these technologies affect requires consideration of the social and ethical implications at all stages of AI system design, development and implementation, and collaboration between job roles such as data scientists, product managers, data engineers, domain experts, and delivery managers.[234]
Regulation
Main articles: Regulation of artificial intelligence, Regulation of algorithms, and AI safety
The first global AI Safety Summit was held in 2023 with a declaration calling for international co-operation.
The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI); it is therefore related to the broader regulation of algorithms.[235] The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally.[236] According to AI Index at Stanford, the annual number of AI-related laws passed in the 127 survey countries jumped from one passed in 2016 to 37 passed in 2022 alone.[237][238] Between 2016 and 2020, more than 30 countries adopted dedicated strategies for AI.[239] Most EU member states had released national AI strategies, as had Canada, China, India, Japan, Mauritius, the Russian Federation, Saudi Arabia, United Arab Emirates, US and Vietnam. Others were in the process of elaborating their own AI strategy, including Bangladesh, Malaysia and Tunisia.[239] The Global Partnership on Artificial Intelligence was launched in June 2020, stating a need for AI to be developed in accordance with human rights and democratic values, to ensure public confidence and trust in the technology.[239] Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher published a joint statement in November 2021 calling for a government commission to regulate AI.[240] In 2023, OpenAI leaders published recommendations for the governance of superintelligence, which they believe may happen in less than 10 years.[241] In 2023, the United Nations also launched an advisory body to provide recommendations on AI governance; the body comprises technology company executives, governments officials and academics.[242]
In a 2022 Ipsos survey, attitudes towards AI varied greatly by country; 78% of Chinese citizens, but only 35% of Americans, agreed that "products and services using AI have more benefits than drawbacks".[237] A 2023 Reuters/Ipsos poll found that 61% of Americans agree, and 22% disagree, that AI poses risks to humanity.[243] In a 2023 Fox News poll, 35% of Americans thought it "very important", and an additional 41% thought it "somewhat important", for the federal government to regulate AI, versus 13% responding "not very important" and 8% responding "not at all important".[244][245]
In November 2023, the first global AI Safety Summit was held in Bletchley Park in the UK to discuss the near and far term risks of AI and the possibility of mandatory and voluntary regulatory frameworks.[246] 28 countries including the United States, China, and the European Union issued a declaration at the start of the summit, calling for international co-operation to manage the challenges and risks of artificial intelligence.[247][248]
History
Main article: History of artificial intelligence
For a chronological guide, see Timeline of artificial intelligence.
The study of mechanical or "formal" reasoning began with philosophers and mathematicians in antiquity. The study of logic led directly to Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as "0" and "1", could simulate any conceivable form of mathematical reasoning.[249][5] This, along with concurrent discoveries in cybernetics, information theory and neurobiology, led researchers to consider the possibility of building an "electronic brain".[q] They developed several areas of research that would become part of AI,[251] such as McCullouch and Pitts design for "artificial neurons" in 1943,[252] and Turing's influential 1950 paper 'Computing Machinery and Intelligence', which introduced the Turing test and showed that "machine intelligence" was plausible.[253][5]
The field of AI research was founded at a workshop at Dartmouth College in 1956.[r][6] The attendees became the leaders of AI research in the 1960s.[s] They and their students produced programs that the press described as "astonishing":[t] computers were learning checkers strategies, solving word problems in algebra, proving logical theorems and speaking English.[u][7] Artificial intelligence laboratories were set up at a number of British and U.S. Universities in the latter 1950s and early 1960s.[5]
Researchers in the 1960s and the 1970s were convinced that their methods would eventually succeed in creating a machine with general intelligence and considered this the goal of their field.[257] Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".[258] Marvin Minsky agreed, writing, "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved".[259] They had, however, underestimated the difficulty of the problem.[v] In 1974, both the U.S. and British governments cut off exploratory research in response to the criticism of Sir James Lighthill[261] and ongoing pressure from the U.S. Congress to fund more productive projects.[262] Minsky's and Papert's book Perceptrons was understood as proving that artificial neural networks would never be useful for solving real-world tasks, thus discrediting the approach altogether.[263] The "AI winter", a period when obtaining funding for AI projects was difficult, followed.[9]
In the early 1980s, AI research was revived by the commercial success of expert systems,[264] a form of AI program that simulated the knowledge and analytical skills of human experts. By 1985, the market for AI had reached over a billion dollars. At the same time, Japan's fifth generation computer project inspired the U.S. and British governments to restore funding for academic research.[8] However, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.[10]
Up to this point, most of AI's funding had gone to projects which used high level symbols to represent mental objects like plans, goals, beliefs and known facts. In the 1980s, some researchers began to doubt that this approach would be able to imitate all the processes of human cognition, especially perception, robotics, learning and pattern recognition,[265] and began to look into "sub-symbolic" approaches.[266] Rodney Brooks rejected "representation" in general and focussed directly on engineering machines that move and survive.[w] Judea Pearl, Lofti Zadeh and others developed methods that handled incomplete and uncertain information by making reasonable guesses rather than precise logic.[88][271] But the most important development was the revival of "connectionism", including neural network research, by Geoffrey Hinton and others.[272] In 1990, Yann LeCun successfully showed that convolutional neural networks can recognize handwritten digits, the first of many successful applications of neural networks.[273]
AI gradually restored its reputation in the late 1990s and early 21st century by exploiting formal mathematical methods and by finding specific solutions to specific problems. This "narrow" and "formal" focus allowed researchers to produce verifiable results and collaborate with other fields (such as statistics, economics and mathematics).[274] By 2000, solutions developed by AI researchers were being widely used, although in the 1990s they were rarely described as "artificial intelligence".[275] However, several academic researchers became concerned that AI was no longer pursuing its original goal of creating versatile, fully intelligent machines. Beginning around 2002, they founded the subfield of artificial general intelligence (or "AGI"), which had several well-funded institutions by the 2010s.[14]
Deep learning began to dominate industry benchmarks in 2012 and was adopted throughout the field.[11] For many specific tasks, other methods were abandoned.[x] Deep learning's success was based on both hardware improvements (faster computers,[277] graphics processing units, cloud computing[278]) and access to large amounts of data[279] (including curated datasets,[278] such as ImageNet). Deep learning's success led to an enormous increase in interest and funding in AI.[y] The amount of machine learning research (measured by total publications) increased by 50% in the years 20152019.[239]
In 2016, issues of fairness and the misuse of technology were catapulted into center stage at machine learning conferences, publications vastly increased, funding became available, and many researchers re-focussed their careers on these issues. The alignment problem became a serious field of academic study.[224]
In the late teens and early 2020s, AGI companies began to deliver programs that created enormous interest. In 2015, AlphaGo, developed by DeepMind, beat the world champion Go player. The program was taught only the rules of the game and developed strategy by itself. GPT-3 is a large language model that was released in 2020 by OpenAI and is capable of generating high-quality human-like text.[280] These programs, and others, inspired an aggressive AI boom, where large companies began investing billions in AI research. According to 'AI Impacts', about $50 billion annually was invested in "AI" around 2022 in the U.S. alone and about 20% of new US Computer Science PhD graduates have specialized in "AI".[281] About 800,000 "AI"-related US job openings existed in 2022.[282]
Philosophy
Main article: Philosophy of artificial intelligence
Defining artificial intelligence
Main articles: Turing test, Intelligent agent, Dartmouth workshop, and Synthetic intelligence
Alan Turing wrote in 1950 "I propose to consider the question 'can machines think'?"[283] He advised changing the question from whether a machine "thinks", to "whether or not it is possible for machinery to show intelligent behaviour".[283] He devised the Turing test, which measures the ability of a machine to simulate human conversation.[253] Since we can only observe the behavior of the machine, it does not matter if it is "actually" thinking or literally has a "mind". Turing notes that we can not determine these things about other people but "it is usual to have a polite convention that everyone thinks"[284]
Russell and Norvig agree with Turing that intelligence must be defined in terms of external behavior, not internal structure.[1] However, they are critical that the test requires the machine to imitate humans. "Aeronautical engineering texts," they wrote, "do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool other pigeons.'"[285] AI founder John McCarthy agreed, writing that "Artificial intelligence is not, by definition, simulation of human intelligence".[286]
McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world."[287] Another AI founder, Marvin Minsky similarly describes it as "the ability to solve hard problems".[288] The leading AI textbook defines it as the study of agents that perceive their environment and take actions that maximize their chances of achieving defined goals.[289] These definitions view intelligence in terms of well-defined problems with well-defined solutions, where both the difficulty of the problem and the performance of the program are direct measures of the "intelligence" of the machine—and no other philosophical discussion is required, or may not even be possible.
Another definition has been adopted by Google,[290] a major practitioner in the field of AI. This definition stipulates the ability of systems to synthesize information as the manifestation of intelligence, similar to the way it is defined in biological intelligence.
Evaluating approaches to AI
No established unifying theory or paradigm has guided AI research for most of its history.[z] The unprecedented success of statistical machine learning in the 2010s eclipsed all other approaches (so much so that some sources, especially in the business world, use the term "artificial intelligence" to mean "machine learning with neural networks"). This approach is mostly sub-symbolic, soft and narrow (see below). Critics argue that these questions may have to be revisited by future generations of AI researchers.
Symbolic AI and its limits
Symbolic AI (or "GOFAI")[292] simulated the high-level conscious reasoning that people use when they solve puzzles, express legal reasoning and do mathematics. They were highly successful at "intelligent" tasks such as algebra or IQ tests. In the 1960s, Newell and Simon proposed the physical symbol systems hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."[293]
However, the symbolic approach failed on many tasks that humans solve easily, such as learning, recognizing an object or commonsense reasoning. Moravec's paradox is the discovery that high-level "intelligent" tasks were easy for AI, but low level "instinctive" tasks were extremely difficult.[294] Philosopher Hubert Dreyfus had argued since the 1960s that human expertise depends on unconscious instinct rather than conscious symbol manipulation, and on having a "feel" for the situation, rather than explicit symbolic knowledge.[295] Although his arguments had been ridiculed and ignored when they were first presented, eventually, AI research came to agree with him.[aa][19]
The issue is not resolved: sub-symbolic reasoning can make many of the same inscrutable mistakes that human intuition does, such as algorithmic bias. Critics such as Noam Chomsky argue continuing research into symbolic AI will still be necessary to attain general intelligence,[297][298] in part because sub-symbolic AI is a move away from explainable AI: it can be difficult or impossible to understand why a modern statistical AI program made a particular decision. The emerging field of neuro-symbolic artificial intelligence attempts to bridge the two approaches.
Neat vs. scruffy
Main article: Neats and scruffies
"Neats" hope that intelligent behavior is described using simple, elegant principles (such as logic, optimization, or neural networks). "Scruffies" expect that it necessarily requires solving a large number of unrelated problems. Neats defend their programs with theoretical rigor, scruffies rely mainly on incremental testing to see if they work. This issue was actively discussed in the 1970s and 1980s,[299] but eventually was seen as irrelevant. Modern AI has elements of both.
Soft vs. hard computing
Main article: Soft computing
Finding a provably correct or optimal solution is intractable for many important problems.[18] Soft computing is a set of techniques, including genetic algorithms, fuzzy logic and neural networks, that are tolerant of imprecision, uncertainty, partial truth and approximation. Soft computing was introduced in the late 1980s and most successful AI programs in the 21st century are examples of soft computing with neural networks.
Narrow vs. general AI
Main articles: Weak artificial intelligence and Artificial general intelligence
AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence directly or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals.[300][301] General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focusing on specific problems with specific solutions. The experimental sub-field of artificial general intelligence studies this area exclusively.
Machine consciousness, sentience and mind
Main articles: Philosophy of artificial intelligence and Artificial consciousness
The philosophy of mind does not know whether a machine can have a mind, consciousness and mental states, in the same sense that human beings do. This issue considers the internal experiences of the machine, rather than its external behavior. Mainstream AI research considers this issue irrelevant because it does not affect the goals of the field: to build machines that can solve problems using intelligence. Russell and Norvig add that "[t]he additional project of making a machine conscious in exactly the way humans are is not one that we are equipped to take on."[302] However, the question has become central to the philosophy of mind. It is also typically the central question at issue in artificial intelligence in fiction.
Consciousness
Main articles: Hard problem of consciousness and Theory of mind
David Chalmers identified two problems in understanding the mind, which he named the "hard" and "easy" problems of consciousness.[303] The easy problem is understanding how the brain processes signals, makes plans and controls behavior. The hard problem is explaining how this feels or why it should feel like anything at all, assuming we are right in thinking that it truly does feel like something (Dennett's consciousness illusionism says this is an illusion). Human information processing is easy to explain, however, human subjective experience is difficult to explain. For example, it is easy to imagine a color-blind person who has learned to identify which objects in their field of view are red, but it is not clear what would be required for the person to know what red looks like.[304]
Computationalism and functionalism
Main articles: Computational theory of mind, Functionalism (philosophy of mind), and Chinese room
Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing. Computationalism argues that the relationship between mind and body is similar or identical to the relationship between software and hardware and thus may be a solution to the mindbody problem. This philosophical position was inspired by the work of AI researchers and cognitive scientists in the 1960s and was originally proposed by philosophers Jerry Fodor and Hilary Putnam.[305]
Philosopher John Searle characterized this position as "strong AI": "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."[ab] Searle counters this assertion with his Chinese room argument, which attempts to show that, even if a machine perfectly simulates human behavior, there is still no reason to suppose it also has a mind.[309]
AI welfare and rights
It is difficult or impossible to reliably evaluate whether an advanced AI is sentient (has the ability to feel), and if so, to what degree.[310] But if there is a significant chance that a given machine can feel and suffer, then it may be entitled to certain rights or welfare protection measures, similarly to animals.[311][312] Sapience (a set of capacities related to high intelligence, such as discernment or self-awareness) may provide another moral basis for AI rights.[311] Robot rights are also sometimes proposed as a practical way to integrate autonomous agents into society.[313]
In 2017, the European Union considered granting "electronic personhood" to some of the most capable AI systems. Similarly to the legal status of companies, it would have conferred rights but also responsibilities.[314] Critics argued in 2018 that granting rights to AI systems would downplay the importance of human rights, and that legislation should focus on user needs rather than speculative futuristic scenarios. They also noted that robots lacked the autonomy to take part to society on their own.[315][316]
Progress in AI increased interest in the topic. Proponents of AI welfare and rights often argue that AI sentience, if it emerges, would be particularly easy to deny. They warn that this may be a moral blind spot analogous to slavery or factory farming, which could lead to large-scale suffering if sentient AI is created and carelessly exploited.[312][311]
Future
Superintelligence and the singularity
A superintelligence is a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind.[301]
If research into artificial general intelligence produced sufficiently intelligent software, it might be able to reprogram and improve itself. The improved software would be even better at improving itself, leading to what I. J. Good called an "intelligence explosion" and Vernor Vinge called a "singularity".[317]
However, technologies cannot improve exponentially indefinitely, and typically follow an S-shaped curve, slowing when they reach the physical limits of what the technology can do.[318]
Transhumanism
Robot designer Hans Moravec, cyberneticist Kevin Warwick, and inventor Ray Kurzweil have predicted that humans and machines will merge in the future into cyborgs that are more capable and powerful than either. This idea, called transhumanism, has roots in Aldous Huxley and Robert Ettinger.[319]
Edward Fredkin argues that "artificial intelligence is the next stage in evolution", an idea first proposed by Samuel Butler's "Darwin among the Machines" as far back as 1863, and expanded upon by George Dyson in his book of the same name in 1998.[320]
In fiction
Main article: Artificial intelligence in fiction
The word "robot" itself was coined by Karel Čapek in his 1921 play R.U.R., the title standing for "Rossum's Universal Robots".
Thought-capable artificial beings have appeared as storytelling devices since antiquity,[321] and have been a persistent theme in science fiction.[322]
A common trope in these works began with Mary Shelley's Frankenstein, where a human creation becomes a threat to its masters. This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: A Space Odyssey (both 1968), with HAL 9000, the murderous computer in charge of the Discovery One spaceship, as well as The Terminator (1984) and The Matrix (1999). In contrast, the rare loyal robots such as Gort from The Day the Earth Stood Still (1951) and Bishop from Aliens (1986) are less prominent in popular culture.[323]
Isaac Asimov introduced the Three Laws of Robotics in many books and stories, most notably the "Multivac" series about a super-intelligent computer of the same name. Asimov's laws are often brought up during lay discussions of machine ethics;[324] while almost all artificial intelligence researchers are familiar with Asimov's laws through popular culture, they generally consider the laws useless for many reasons, one of which is their ambiguity.[325]
Several works use AI to force us to confront the fundamental question of what makes us human, showing us artificial beings that have the ability to feel, and thus to suffer. This appears in Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, as well as the novel Do Androids Dream of Electric Sheep?, by Philip K. Dick. Dick considers the idea that our understanding of human subjectivity is altered by technology created with artificial intelligence.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.8 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

@ -10,6 +10,7 @@ Each provider has their own named directory, with a standard notebook to introdu
- [AnalyticDB](https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/get-started-with-analyticdb-for-postgresql)
- [Cassandra/Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/qandasimsearch-quickstart.html)
- [Azure AI Search](https://learn.microsoft.com/azure/search/search-get-started-vector)
- [Azure SQL Database](https://learn.microsoft.com/azure/azure-sql/database/ai-artificial-intelligence-intelligent-applications?view=azuresql)
- [Chroma](https://docs.trychroma.com/getting-started)
- [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)
- [Hologres](https://www.alibabacloud.com/help/en/hologres/latest/procedure-to-use-hologres)
@ -24,7 +25,7 @@ Each provider has their own named directory, with a standard notebook to introdu
- [Redis](https://github.com/RedisVentures/simple-vecsim-intro)
- [SingleStoreDB](https://www.singlestore.com/blog/how-to-get-started-with-singlestore/)
- [Supabase](https://supabase.com/docs/guides/ai)
- [Tembo](https://tembo.io/docs/tembo-stacks/vector-db)
- [Tembo](https://tembo.io/docs/product/stacks/ai/vectordb)
- [Typesense](https://typesense.org/docs/guide/)
- [Vespa AI](https://vespa.ai/)
- [Weaviate](https://weaviate.io/developers/weaviate/quickstart)

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

@ -56,7 +56,7 @@
path: examples/Clustering_for_transaction_classification.ipynb
date: 2022-10-20
authors:
- colin-jarvis
- colin-openai
- ted-at-openai
tags:
- embeddings
@ -207,13 +207,12 @@
- ted-at-openai
tags:
- completions
- tiktoken
- title: Multiclass Classification for Transactions
path: examples/Multiclass_classification_for_transactions.ipynb
date: 2022-10-20
authors:
- colin-jarvis
- colin-openai
tags:
- embeddings
- completions
@ -484,6 +483,15 @@
- embeddings
- completions
- title: Getting Started with OpenAI Evals
path: examples/evaluation/Getting_Started_with_OpenAI_Evals.ipynb
date: 2024-03-21
authors:
- royziv11
- shyamal-anadkat
tags:
- completions
- title: Fine-Tuned Q&A - collect data
path: examples/fine-tuned_qa/olympics-1-collect-data.ipynb
date: 2022-03-10
@ -1201,12 +1209,12 @@
path: examples/How_to_use_guardrails.ipynb
date: 2023-12-19
authors:
- colin-jarvis
- colin-openai
tags:
- guardrails
- title: How to combine GPT4 with Vision with RAG to create a clothing matchmaker app
path: examples/How_to_combine_GPT4v_with_RAG_Outfit_Assistant.ipynb
path: examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb
date: 2024-02-16
authors:
- teomusatoiu
@ -1242,3 +1250,67 @@
- teomusatoiu
tags:
- moderation
- title: Summarizing Long Documents
path: examples/Summarizing_long_documents.ipynb
date: 2024-04-19
authors:
- joe-at-openai
tags:
- chat
- title: Using GPT4 Vision with Function Calling
path: examples/multimodal/Using_GPT4_Vision_With_Function_Calling.ipynb
date: 2024-04-09
authors:
- shyamal-anadkat
tags:
- chat
- vision
- title: Synthetic data generation (Part 1)
path: examples/SDG1.ipynb
date: 2024-04-10
authors:
- dylanra-openai
tags:
- completions
- title: CLIP embeddings to improve multimodal RAG with GPT-4 Vision
path: examples/custom_image_embedding_search.ipynb
date: 2024-04-10
authors:
- dylanra-openai
tags:
- vision
- embeddings
- title: Batch processing with the Batch API
path: examples/batch_processing.ipynb
date: 2024-04-24
authors:
- katiagg
tags:
- batch
- completions
- title: Using tool required for customer service
path: examples/Using_tool_required_for_customer_service.ipynb
date: 2024-05-01
authors:
- colin-openai
tags:
- completions
- functions
- title: Introduction to gpt-4o
path: examples/gpt4o/introduction_to_gpt4o.ipynb
date: 2024-05-13
authors:
- justonf
tags:
- completions
- vision
- whisper
Loading…
Cancel
Save