A few improvements to the long summaries notebook (#1148)

3 weeks ago · fe930be578
parent 555bbc0d34
commit fe930be578
4 changed files with 1114 additions and 1128 deletions
--- a/examples/Summarizing_long_documents.ipynb
+++ b/examples/Summarizing_long_documents.ipynb
@ -0,0 +1,861 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Summarizing Long Documents"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail.\n",
+    " \n",
+    "If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.305706Z",
+     "start_time": "2024-04-10T05:19:35.303535Z"
+    },
+    "pycharm": {
+     "is_executing": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from typing import List, Tuple, Optional\n",
+    "from openai import OpenAI\n",
+    "import tiktoken\n",
+    "from tqdm import tqdm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.325026Z",
+     "start_time": "2024-04-10T05:19:35.322414Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# open dataset containing part of the text of the Wikipedia page for the United States\n",
+    "with open(\"data/artificial_intelligence_wikipedia.txt\", \"r\") as file:\n",
+    "    artificial_intelligence_wikipedia_text = file.read()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.364483Z",
+     "start_time": "2024-04-10T05:19:35.348213Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "14630"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# load encoding and check the length of dataset\n",
+    "encoding = tiktoken.encoding_for_model('gpt-4-turbo')\n",
+    "len(encoding.encode(artificial_intelligence_wikipedia_text))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll define a simple utility to wrap calls to the OpenAI API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.375619Z",
+     "start_time": "2024-04-10T05:19:35.365818Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "client = OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
+    "\n",
+    "def get_chat_completion(messages, model='gpt-4-turbo'):\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=model,\n",
+    "        messages=messages,\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "    return response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we'll define some utilities to chunk a large document into smaller pieces."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.382790Z",
+     "start_time": "2024-04-10T05:19:35.376721Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def tokenize(text: str) -> List[str]:\n",
+    "    encoding = tiktoken.encoding_for_model('gpt-4-turbo')\n",
+    "    return encoding.encode(text)\n",
+    "\n",
+    "\n",
+    "# This function chunks a text into smaller pieces based on a maximum token count and a delimiter.\n",
+    "def chunk_on_delimiter(input_string: str,\n",
+    "                       max_tokens: int, delimiter: str) -> List[str]:\n",
+    "    chunks = input_string.split(delimiter)\n",
+    "    combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(\n",
+    "        chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True\n",
+    "    )\n",
+    "    if dropped_chunk_count > 0:\n",
+    "        print(f\"warning: {dropped_chunk_count} chunks were dropped due to overflow\")\n",
+    "    combined_chunks = [f\"{chunk}{delimiter}\" for chunk in combined_chunks]\n",
+    "    return combined_chunks\n",
+    "\n",
+    "\n",
+    "# This function combines text chunks into larger blocks without exceeding a specified token count. It returns the combined text blocks, their original indices, and the count of chunks dropped due to overflow.\n",
+    "def combine_chunks_with_no_minimum(\n",
+    "        chunks: List[str],\n",
+    "        max_tokens: int,\n",
+    "        chunk_delimiter=\"\\n\\n\",\n",
+    "        header: Optional[str] = None,\n",
+    "        add_ellipsis_for_overflow=False,\n",
+    ") -> Tuple[List[str], List[int]]:\n",
+    "    dropped_chunk_count = 0\n",
+    "    output = []  # list to hold the final combined chunks\n",
+    "    output_indices = []  # list to hold the indices of the final combined chunks\n",
+    "    candidate = (\n",
+    "        [] if header is None else [header]\n",
+    "    )  # list to hold the current combined chunk candidate\n",
+    "    candidate_indices = []\n",
+    "    for chunk_i, chunk in enumerate(chunks):\n",
+    "        chunk_with_header = [chunk] if header is None else [header, chunk]\n",
+    "        if len(tokenize(chunk_delimiter.join(chunk_with_header))) > max_tokens:\n",
+    "            print(f\"warning: chunk overflow\")\n",
+    "            if (\n",
+    "                    add_ellipsis_for_overflow\n",
+    "                    and len(tokenize(chunk_delimiter.join(candidate + [\"...\"]))) <= max_tokens\n",
+    "            ):\n",
+    "                candidate.append(\"...\")\n",
+    "                dropped_chunk_count += 1\n",
+    "            continue  # this case would break downstream assumptions\n",
+    "        # estimate token count with the current chunk added\n",
+    "        extended_candidate_token_count = len(tokenize(chunk_delimiter.join(candidate + [chunk])))\n",
+    "        # If the token count exceeds max_tokens, add the current candidate to output and start a new candidate\n",
+    "        if extended_candidate_token_count > max_tokens:\n",
+    "            output.append(chunk_delimiter.join(candidate))\n",
+    "            output_indices.append(candidate_indices)\n",
+    "            candidate = chunk_with_header  # re-initialize candidate\n",
+    "            candidate_indices = [chunk_i]\n",
+    "        # otherwise keep extending the candidate\n",
+    "        else:\n",
+    "            candidate.append(chunk)\n",
+    "            candidate_indices.append(chunk_i)\n",
+    "    # add the remaining candidate to output if it's not empty\n",
+    "    if (header is not None and len(candidate) > 1) or (header is None and len(candidate) > 0):\n",
+    "        output.append(chunk_delimiter.join(candidate))\n",
+    "        output_indices.append(candidate_indices)\n",
+    "    return output, output_indices, dropped_chunk_count"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can define a utility to summarize text with a controllable level of detail (note the `detail` parameter).\n",
+    "\n",
+    "The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on a controllable `detail` parameter. It then splits the text into chunks and summarizes each chunk."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:35.390876Z",
+     "start_time": "2024-04-10T05:19:35.385076Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def summarize(text: str,\n",
+    "              detail: float = 0,\n",
+    "              model: str = 'gpt-4-turbo',\n",
+    "              additional_instructions: Optional[str] = None,\n",
+    "              minimum_chunk_size: Optional[int] = 500,\n",
+    "              chunk_delimiter: str = \".\",\n",
+    "              summarize_recursively=False,\n",
+    "              verbose=False):\n",
+    "    \"\"\"\n",
+    "    Summarizes a given text by splitting it into chunks, each of which is summarized individually. \n",
+    "    The level of detail in the summary can be adjusted, and the process can optionally be made recursive.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - text (str): The text to be summarized.\n",
+    "    - detail (float, optional): A value between 0 and 1 indicating the desired level of detail in the summary.\n",
+    "      0 leads to a higher level summary, and 1 results in a more detailed summary. Defaults to 0.\n",
+    "    - model (str, optional): The model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.\n",
+    "    - additional_instructions (Optional[str], optional): Additional instructions to provide to the model for customizing summaries.\n",
+    "    - minimum_chunk_size (Optional[int], optional): The minimum size for text chunks. Defaults to 500.\n",
+    "    - chunk_delimiter (str, optional): The delimiter used to split the text into chunks. Defaults to \".\".\n",
+    "    - summarize_recursively (bool, optional): If True, summaries are generated recursively, using previous summaries for context.\n",
+    "    - verbose (bool, optional): If True, prints detailed information about the chunking process.\n",
+    "\n",
+    "    Returns:\n",
+    "    - str: The final compiled summary of the text.\n",
+    "\n",
+    "    The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on the `detail` parameter. \n",
+    "    It then splits the text into chunks and summarizes each chunk. If `summarize_recursively` is True, each summary is based on the previous summaries, \n",
+    "    adding more context to the summarization process. The function returns a compiled summary of all chunks.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # check detail is set correctly\n",
+    "    assert 0 <= detail <= 1\n",
+    "\n",
+    "    # interpolate the number of chunks based to get specified level of detail\n",
+    "    max_chunks = len(chunk_on_delimiter(text, minimum_chunk_size, chunk_delimiter))\n",
+    "    min_chunks = 1\n",
+    "    num_chunks = int(min_chunks + detail * (max_chunks - min_chunks))\n",
+    "\n",
+    "    # adjust chunk_size based on interpolated number of chunks\n",
+    "    document_length = len(tokenize(text))\n",
+    "    chunk_size = max(minimum_chunk_size, document_length // num_chunks)\n",
+    "    text_chunks = chunk_on_delimiter(text, chunk_size, chunk_delimiter)\n",
+    "    if verbose:\n",
+    "        print(f\"Splitting the text into {len(text_chunks)} chunks to be summarized.\")\n",
+    "        print(f\"Chunk lengths are {[len(tokenize(x)) for x in text_chunks]}\")\n",
+    "\n",
+    "    # set system message\n",
+    "    system_message_content = \"Rewrite this text in summarized form.\"\n",
+    "    if additional_instructions is not None:\n",
+    "        system_message_content += f\"\\n\\n{additional_instructions}\"\n",
+    "\n",
+    "    accumulated_summaries = []\n",
+    "    for chunk in tqdm(text_chunks):\n",
+    "        if summarize_recursively and accumulated_summaries:\n",
+    "            # Creating a structured prompt for recursive summarization\n",
+    "            accumulated_summaries_string = '\\n\\n'.join(accumulated_summaries)\n",
+    "            user_message_content = f\"Previous summaries:\\n\\n{accumulated_summaries_string}\\n\\nText to summarize next:\\n\\n{chunk}\"\n",
+    "        else:\n",
+    "            # Directly passing the chunk for summarization without recursive context\n",
+    "            user_message_content = chunk\n",
+    "\n",
+    "        # Constructing messages based on whether recursive summarization is applied\n",
+    "        messages = [\n",
+    "            {\"role\": \"system\", \"content\": system_message_content},\n",
+    "            {\"role\": \"user\", \"content\": user_message_content}\n",
+    "        ]\n",
+    "\n",
+    "        # Assuming this function gets the completion and works as expected\n",
+    "        response = get_chat_completion(messages, model=model)\n",
+    "        accumulated_summaries.append(response)\n",
+    "\n",
+    "    # Compile final summary from partial summaries\n",
+    "    final_summary = '\\n\\n'.join(accumulated_summaries)\n",
+    "\n",
+    "    return final_summary"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can use this utility to produce summaries with varying levels of detail. By increasing `detail` from 0 to 1 we get progressively longer summaries of the underlying document. A higher value for the `detail` parameter results in a more detailed summary because the utility first splits the document into a greater number of chunks. Each chunk is then summarized, and the final summary is a concatenation of all the chunk summaries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:47.541096Z",
+     "start_time": "2024-04-10T05:19:35.391911Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Splitting the text into 1 chunks to be summarized.\n",
+      "Chunk lengths are [14631]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 1/1 [00:09<00:00,  9.68s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "summary_with_detail_0 = summarize(artificial_intelligence_wikipedia_text, detail=0, verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:19:58.724212Z",
+     "start_time": "2024-04-10T05:19:47.542129Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Splitting the text into 9 chunks to be summarized.\n",
+      "Chunk lengths are [1817, 1807, 1823, 1810, 1806, 1827, 1814, 1829, 103]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 9/9 [01:33<00:00, 10.39s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "summary_with_detail_pt25 = summarize(artificial_intelligence_wikipedia_text, detail=0.25, verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:20:16.216023Z",
+     "start_time": "2024-04-10T05:19:58.725014Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Splitting the text into 17 chunks to be summarized.\n",
+      "Chunk lengths are [897, 890, 914, 876, 893, 906, 893, 902, 909, 907, 905, 889, 902, 890, 901, 880, 287]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 17/17 [02:26<00:00,  8.64s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "summary_with_detail_pt5 = summarize(artificial_intelligence_wikipedia_text, detail=0.5, verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:22:57.760218Z",
+     "start_time": "2024-04-10T05:21:44.921275Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Splitting the text into 31 chunks to be summarized.\n",
+      "Chunk lengths are [492, 427, 485, 490, 496, 478, 473, 497, 496, 501, 499, 497, 493, 470, 472, 494, 489, 492, 481, 485, 471, 500, 486, 498, 478, 469, 498, 468, 493, 478, 103]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 31/31 [04:08<00:00,  8.02s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "summary_with_detail_1 = summarize(artificial_intelligence_wikipedia_text, detail=1, verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The original document is nearly 15k tokens long. Notice how large the gap is between the length of `summary_with_detail_0` and `summary_with_detail_1`. It's nearly 25 times longer!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:22:57.782389Z",
+     "start_time": "2024-04-10T05:22:57.763041Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[235, 2529, 4336, 6742]"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# lengths of summaries\n",
+    "[len(tokenize(x)) for x in\n",
+    " [summary_with_detail_0, summary_with_detail_pt25, summary_with_detail_pt5, summary_with_detail_1]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's inspect the summaries to see how the level of detail changes when the `detail` parameter is increased from 0 to 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:22:57.785881Z",
+     "start_time": "2024-04-10T05:22:57.783455Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perform tasks that typically require human intelligence. This includes applications like advanced search engines, recommendation systems, speech interaction, autonomous vehicles, and more. AI was first significantly researched by Alan Turing and became an academic discipline in 1956. The field has experienced cycles of high expectations followed by disillusionment and reduced funding, known as \"AI winters.\" Interest in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of the transformer architecture, leading to a boom in AI research and applications in the early 2020s.\n",
+      "\n",
+      "AI's increasing integration into various sectors is influencing societal and economic shifts towards automation and data-driven decision-making, impacting areas such as employment, healthcare, and privacy. Ethical and safety concerns about AI have prompted discussions on regulatory policies.\n",
+      "\n",
+      "AI research involves various sub-fields focused on specific goals like reasoning, learning, and perception, using techniques from mathematics, logic, and other disciplines. Despite its broad applications, AI's complexity and potential risks, such as privacy issues, misinformation, and ethical challenges, remain areas of active investigation and debate.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(summary_with_detail_0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:22:57.788969Z",
+     "start_time": "2024-04-10T05:22:57.786691Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perceive their environment and make decisions to achieve specific goals. This technology is prevalent across various sectors including industry, government, and science, with applications ranging from web search engines and recommendation systems to autonomous vehicles and AI in gaming. Although AI has become a common feature in many tools and applications, it often goes unrecognized as AI when it becomes sufficiently integrated and widespread.\n",
+      "\n",
+      "The field of AI, which began as an academic discipline in 1956, has experienced several cycles of high expectations followed by disappointment, known as AI winters. Interest and funding in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of transformer architecture, leading to a significant boom in AI research and applications in the early 2020s, primarily in the United States.\n",
+      "\n",
+      "The increasing integration of AI in the 21st century is driving a shift towards automation and data-driven decision-making across various sectors, influencing job markets, healthcare, and education, among others. This raises important questions about the ethical implications, long-term effects, and the need for regulatory policies to ensure the safety and benefits of AI technologies. AI research itself is diverse, focusing on goals like reasoning, learning, and perception, and involves various tools and methodologies to achieve these objectives.\n",
+      "\n",
+      "General intelligence, which involves performing any human task at least as well as a human, is a long-term goal in AI research. To achieve this, AI integrates various techniques from search and optimization, formal logic, neural networks, and statistics, to insights from psychology, linguistics, and neuroscience. AI research focuses on specific traits like reasoning and problem-solving, where early algorithms mimicked human step-by-step reasoning. However, these algorithms struggle with large, complex problems due to combinatorial explosion and are less efficient than human intuitive judgments. Knowledge representation is another critical area, using ontologies to structure domain-specific knowledge and relationships, aiding in intelligent querying, scene interpretation, and data mining among other applications.\n",
+      "\n",
+      "Knowledge bases must encapsulate a wide range of elements including objects, properties, categories, relations, events, states, time, causes, effects, and meta-knowledge. They also need to handle default reasoning, where certain assumptions are maintained unless contradicted. Challenges in knowledge representation include the vast scope of commonsense knowledge and its often sub-symbolic, non-verbal nature, alongside the difficulty of acquiring this knowledge for AI use.\n",
+      "\n",
+      "In the realm of AI, an \"agent\" is defined as an entity that perceives its environment and acts towards achieving goals or fulfilling preferences. In automated planning, the agent pursues a specific goal, while in decision-making, it evaluates actions based on their expected utility to maximize preference satisfaction. Classical planning assumes agents have complete knowledge of action outcomes, but real-world scenarios often involve uncertainty about the situation and outcomes, requiring probabilistic decision-making. Additionally, agents may need to adapt or learn preferences, particularly in complex environments with multiple agents or human interactions.\n",
+      "\n",
+      "Information value theory helps assess the value of exploratory actions in situations with uncertain outcomes. A Markov decision process uses a transition model and a reward function to guide decisions, which can be determined through calculations, heuristics, or learning. Game theory analyzes the rational behavior of multiple interacting agents in decision-making scenarios involving others.\n",
+      "\n",
+      "Machine learning, integral to AI, involves programs that automatically improve task performance. It includes unsupervised learning, which identifies patterns in data without guidance, and supervised learning, which requires labeled data and includes classification and regression tasks. Reinforcement learning rewards or punishes agents to shape their responses, while transfer learning applies knowledge from one problem to another. Deep learning, a subset of machine learning, uses artificial neural networks inspired by biological processes.\n",
+      "\n",
+      "Computational learning theory evaluates learning algorithms based on computational and sample complexity, among other criteria. Natural language processing (NLP) enables programs to interact using human languages, tackling challenges like speech recognition, synthesis, translation, and more. Early NLP efforts, influenced by Chomsky's theories, faced limitations in handling ambiguous language outside of controlled environments.\n",
+      "\n",
+      "Margaret Masterman emphasized the importance of meaning over grammar in language understanding, advocating for the use of thesauri instead of dictionaries in computational linguistics. Modern NLP techniques include word embedding, transformers, and by 2023, GPT models capable of achieving human-level scores on various tests. Machine perception involves interpreting sensor data to understand the world, encompassing computer vision and speech recognition among other applications. Social intelligence in AI focuses on recognizing and simulating human emotions, with systems like Kismet and affective computing technologies that enhance human-computer interaction. However, these advancements may lead to overestimations of AI capabilities by users. AI also employs a variety of techniques including search and optimization, with methods like state space search to explore possible solutions to problems.\n",
+      "\n",
+      "Planning algorithms use means-ends analysis to navigate through trees of goals and subgoals to achieve a target goal. However, simple exhaustive searches are often inadequate for complex real-world problems due to the vast search space, making searches slow or incomplete. Heuristics are employed to prioritize more promising paths towards a goal. In adversarial contexts like chess or Go, search algorithms explore trees of possible moves to find a winning strategy.\n",
+      "\n",
+      "Local search methods, such as gradient descent, optimize numerical parameters to minimize a loss function, often used in training neural networks. Evolutionary computation, another local search technique, iteratively enhances solutions by mutating and recombining candidate solutions, selecting the most fit for survival. Distributed search processes utilize swarm intelligence, with particle swarm optimization and ant colony optimization being notable examples.\n",
+      "\n",
+      "In the realm of logic, formal logic serves for reasoning and knowledge representation, with two primary types: propositional logic, dealing with true or false statements, and predicate logic, which involves objects and their relationships. Deductive reasoning in logic involves deriving conclusions from assumed true premises.\n",
+      "\n",
+      "Proofs in logic can be organized into proof trees, where each node represents a sentence and is connected to its children by inference rules. Problem-solving involves finding a proof tree that starts with premises or axioms at the leaves and ends with the problem's solution at the root. In Horn clauses, one can reason forwards from premises or backwards from the problem, while in general first-order logic, resolution uses contradiction to solve problems. Despite being undecidable and intractable, backward reasoning with Horn clauses is Turing complete and efficient, similar to other symbolic programming languages like Prolog.\n",
+      "\n",
+      "Fuzzy logic allows for handling propositions with partial truth by assigning a truth degree between 0 and 1. Non-monotonic logics cater to default reasoning, and various specialized logics have been developed for complex domains.\n",
+      "\n",
+      "In AI, handling uncertain or incomplete information is crucial in fields like reasoning, planning, and perception. Tools from probability theory and economics, such as Bayesian networks, Markov decision processes, and game theory, help in making decisions and planning under uncertainty. Bayesian networks, in particular, are versatile tools used for reasoning, learning, planning, and perception through various algorithms.\n",
+      "\n",
+      "Probabilistic algorithms like hidden Markov models and Kalman filters are useful for analyzing data over time, aiding in tasks such as filtering, prediction, and smoothing. In machine learning, expectation-maximization clustering can effectively identify distinct patterns in data, as demonstrated with the Old Faithful eruption data. AI applications often involve classifiers, which categorize data based on learned patterns, and controllers, which make decisions based on classifications. Classifiers, such as decision trees, k-nearest neighbors, support vector machines, naive Bayes, and neural networks, vary in complexity and application, with some being favored for their scalability like the naive Bayes at Google. Artificial neural networks, resembling the human brain's network of neurons, recognize and process patterns through multiple layers and nodes, using algorithms like backpropagation for training.\n",
+      "\n",
+      "Neural networks are designed to model complex relationships between inputs and outputs, theoretically capable of learning any function. Feedforward neural networks process signals in one direction, while recurrent neural networks (RNNs) loop outputs back into inputs, enabling memory of past inputs. Long Short-Term Memory (LSTM) networks are a successful type of RNN. Perceptrons consist of a single layer of neurons, whereas deep learning involves multiple layers, which allows for the extraction of progressively higher-level features from data. Convolutional neural networks (CNNs) are particularly effective in image processing as they emphasize connections between adjacent neurons to recognize local patterns like edges.\n",
+      "\n",
+      "Deep learning, which uses several layers of neurons, has significantly enhanced performance in AI subfields such as computer vision and natural language processing. The effectiveness of deep learning, which surged between 2012 and 2015, is attributed not to new theoretical advances but to increased computational power, including the use of GPUs, and the availability of large datasets like ImageNet.\n",
+      "\n",
+      "Generative Pre-trained Transformers (GPT) are large language models that learn from vast amounts of text to predict the next token in a sequence, thereby generating human-like text. These models are pre-trained on a broad corpus, often sourced from the internet, and fine-tuned through token prediction, accumulating worldly knowledge in the process.\n",
+      "\n",
+      "Reinforcement learning from human feedback (RLHF) is used to enhance the truthfulness, usefulness, and safety of models like GPT, which are still susceptible to generating inaccuracies known as \"hallucinations.\" These models, including Gemini, ChatGPT, Grok, Claude, Copilot, and LLaMA, are employed in various applications such as chatbots and can handle multiple data types like images and sound through multimodal capabilities.\n",
+      "\n",
+      "In the realm of specialized hardware and software, the late 2010s saw AI-specific enhancements in graphics processing units (GPUs), which, along with TensorFlow software, have largely replaced central processing units (CPUs) for training large-scale machine learning models. Historically, programming languages like Lisp, Prolog, and Python have been pivotal.\n",
+      "\n",
+      "AI and machine learning are integral to key 2020s applications such as search engines, online advertising, recommendation systems, virtual assistants, autonomous vehicles, language translation, facial recognition, and image labeling.\n",
+      "\n",
+      "In healthcare, AI significantly contributes to improving patient care and medical research, aiding in diagnostics, treatment, and the integration of big data for developments in organoid and tissue engineering. AI's role in medical research also includes addressing funding disparities across different research areas.\n",
+      "\n",
+      "Recent advancements in AI have significantly impacted various fields including biomedicine and gaming. For instance, AlphaFold 2, developed in 2021, can predict protein structures in hours, a process that previously took months. In 2023, AI-assisted drug discovery led to the development of a new class of antibiotics effective against drug-resistant bacteria. In the realm of gaming, AI has been instrumental since the 1950s, with notable achievements such as IBM's Deep Blue defeating world chess champion Garry Kasparov in 1997, and IBM's Watson winning against top Jeopardy! players in 2011. More recently, Google's AlphaGo and DeepMind's AlphaStar set new standards in AI capabilities by defeating top human players in complex games like Go and StarCraft II, respectively. In the military sector, AI is being integrated into various applications such as command and control, intelligence, logistics, and autonomous vehicles, enhancing capabilities in coordination, threat detection, and target acquisition.\n",
+      "\n",
+      "In November 2023, US Vice President Kamala Harris announced that 31 nations had signed a declaration to establish guidelines for the military use of AI, emphasizing legal compliance with international laws and promoting transparency in AI development. Generative AI, particularly known for creating realistic images and artworks, gained significant attention in the early 2020s, with technologies like ChatGPT, Midjourney, DALL-E, and Stable Diffusion becoming popular. This trend led to viral AI-generated images, including notable hoaxes. AI has also been effectively applied across various industries, including agriculture where it assists in optimizing farming practices, and astronomy, where it helps in data analysis and space exploration activities.\n",
+      "\n",
+      "Ethics and Risks of AI\n",
+      "AI offers significant benefits but also poses various risks, including ethical concerns and unintended consequences. Demis Hassabis of DeepMind aims to use AI to solve major challenges, but issues arise when AI systems, particularly those based on deep learning, fail to incorporate ethical considerations and exhibit biases.\n",
+      "\n",
+      "Privacy and Copyright Issues\n",
+      "AI's reliance on large data sets raises privacy and surveillance concerns. Companies like Amazon have been criticized for collecting extensive user data, including private conversations for developing speech recognition technologies. While some defend this as necessary for advancing AI applications, others view it as a breach of privacy rights. Techniques like data aggregation and differential privacy have been developed to mitigate these concerns.\n",
+      "\n",
+      "Generative AI also faces copyright challenges, as it often uses unlicensed copyrighted materials, claiming \"fair use.\" The legality of this practice is still debated, with outcomes potentially depending on the nature and impact of the AI's use of copyrighted content.\n",
+      "\n",
+      "In 2023, prominent authors like John Grisham and Jonathan Franzen filed lawsuits against AI companies for using their literary works to train generative AI models. These AI systems, particularly on platforms like YouTube and Facebook, have been criticized for promoting misinformation by prioritizing user engagement over content accuracy. This has led to the proliferation of conspiracy theories and extreme partisan content, trapping users in filter bubbles and eroding trust in key institutions. Post the 2016 U.S. election, tech companies began addressing these issues.\n",
+      "\n",
+      "By 2022, generative AI had advanced to produce highly realistic images, audio, and texts, raising concerns about its potential misuse in spreading misinformation or propaganda. AI expert Geoffrey Hinton highlighted risks including the manipulation of electorates by authoritarian leaders.\n",
+      "\n",
+      "Furthermore, issues of algorithmic bias were identified, where AI systems perpetuate existing biases present in the training data, affecting fairness in critical areas like medicine, finance, and law enforcement. This has sparked significant academic interest in studying and mitigating algorithmic bias to ensure fairness in AI applications.\n",
+      "\n",
+      "In 2015, Google Photos mislabeled Jacky Alcine and his friend as \"gorillas\" due to a lack of diverse images in its training dataset, an issue known as \"sample size disparity.\" Google's temporary solution was to stop labeling any images as \"gorilla,\" a restriction still in place in 2023 across various tech companies. Additionally, the COMPAS program, used by U.S. courts to predict recidivism, was found to exhibit racial bias in 2016. Although it did not use race explicitly, it overestimated the likelihood of black defendants reoffending and underestimated it for white defendants. This issue was attributed to the program's inability to balance different fairness measures when the base re-offense rates varied by race. The criticism of COMPAS underscores a broader issue in machine learning, where models trained on past data, including biased decisions, are likely to perpetuate those biases in their predictions.\n",
+      "\n",
+      "Machine learning, while powerful, is not ideal for scenarios where future improvements over past conditions are expected, as it is inherently descriptive rather than prescriptive. The field also faces challenges with bias and lack of diversity among its developers, with only about 4% being black and 20% women. The Association for Computing Machinery highlighted at its 2022 Conference on Fairness, Accountability, and Transparency that AI systems should not be used until they are proven to be free from bias, especially those trained on flawed internet data.\n",
+      "\n",
+      "AI systems often lack transparency, making it difficult to understand how decisions are made, particularly in complex systems like deep neural networks. This opacity can lead to unintended consequences, such as a system misidentifying medical images or misclassifying medical risks due to misleading correlations in the training data. There is a growing call for explainable AI, where harmed individuals have the right to know how decisions affecting them were made, similar to how doctors are expected to explain their decisions. This concept was also recognized in early drafts of the European Union's General Data Protection Regulation.\n",
+      "\n",
+      "Industry experts acknowledge an unresolved issue in AI with no foreseeable solution, leading regulators to suggest that if a problem is unsolvable, the tools associated should not be used. In response, DARPA initiated the XAI program in 2014 to address these issues. Various methods have been proposed to enhance AI transparency, including SHAP, which visualizes feature contributions, LIME, which approximates complex models with simpler ones, and multitask learning, which provides additional outputs to help understand what a network has learned. Techniques like deconvolution and DeepDream also reveal insights into different network layers.\n",
+      "\n",
+      "Concerning the misuse of AI, it can empower bad actors like authoritarian regimes and terrorists. Lethal autonomous weapons, which operate without human oversight, pose significant risks, including potential misuse as weapons of mass destruction and the likelihood of targeting errors. Despite some international efforts to ban such weapons, major powers like the United States have not agreed to restrictions. AI also facilitates more effective surveillance and control by authoritarian governments, enhances the targeting of propaganda, and simplifies the production of misinformation through deepfakes and other generative technologies, thereby increasing the efficiency of digital warfare and espionage.\n",
+      "\n",
+      "AI technologies, including facial recognition systems, have been in use since 2020 or earlier, notably for mass surveillance in China. AI also poses risks by enabling the creation of harmful substances quickly. The development of AI systems is predominantly driven by Big Tech due to their financial capabilities, often leaving smaller companies reliant on these giants for resources like data center access. Economists have raised concerns about AI-induced unemployment, though historical data suggests technology has generally increased total employment. However, the impact of AI might be different, with some predicting significant job losses, especially in middle-class sectors, while others see potential benefits if productivity gains are well-managed. Estimates of job risk vary widely, with some studies suggesting a high potential for automation in many U.S. jobs. Recent developments have shown substantial job losses in specific sectors, such as for Chinese video game illustrators due to AI advancements. The potential for AI to disrupt white-collar jobs similarly to past technological revolutions in blue-collar jobs is a significant concern.\n",
+      "\n",
+      "From the inception of artificial intelligence (AI), debates have emerged about the appropriateness of computers performing tasks traditionally done by humans, particularly because of the qualitative differences in human and computer judgment. Concerns about AI have escalated to discussions about existential risks, where AI could potentially become so advanced that humans might lose control over it. Stephen Hawking and others have warned that this could lead to catastrophic outcomes for humanity. This fear is often depicted in science fiction as AI gaining sentience and turning malevolent, but real-world risks do not necessarily involve AI becoming self-aware. Philosophers like Nick Bostrom and Stuart Russell illustrate scenarios where AI, without needing human-like consciousness, could still pose threats if their goals are misaligned with human safety and values. Additionally, Yuval Noah Harari points out that AI could manipulate societal structures and beliefs through language and misinformation, posing a non-physical yet profound threat. The expert opinion on the existential risk from AI is divided, with notable figures like Hawking, Bill Gates, and Elon Musk expressing concern.\n",
+      "\n",
+      "In 2023, prominent AI experts including Fei-Fei Li and Geoffrey Hinton highlighted the existential risks posed by AI, equating them with global threats like pandemics and nuclear war. They advocated for prioritizing the mitigation of these risks. Conversely, other experts like Juergen Schmidhuber and Andrew Ng offered a more optimistic perspective, emphasizing AI's potential to enhance human life and dismissing doomsday scenarios as hype that could misguide regulatory actions. Yann LeCun also criticized the pessimistic outlook on AI's impact.\n",
+      "\n",
+      "The concept of \"Friendly AI\" was introduced to ensure AI systems are inherently designed to be safe and beneficial to humans. This involves embedding ethical principles in AI to guide their decision-making processes, a field known as machine ethics or computational morality, established in 2005. The development of such AI is seen as crucial to prevent potential future threats from advanced AI technologies.\n",
+      "\n",
+      "Other approaches to AI ethics include Wendell Wallach's concept of \"artificial moral agents\" and Stuart J. Russell's three principles for creating provably beneficial machines. Ethical frameworks like the Care and Act Framework from the Alan Turing Institute evaluate AI projects based on respect, connection, care, and protection of social values. Other notable frameworks include those from the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, though these frameworks have faced criticism regarding their inclusivity and the selection of contributors.\n",
+      "\n",
+      "The promotion of wellbeing in AI development requires considering social and ethical implications throughout all stages of design, development, and implementation, necessitating collaboration across various professional roles.\n",
+      "\n",
+      "On the regulatory front, AI governance involves creating policies to manage AI's development and use, as seen in the increasing number of AI-related laws globally. From 2016 to 2022, the number of AI laws passed annually in surveyed countries rose significantly, with many countries now having dedicated AI strategies. The first global AI Safety Summit in 2023 emphasized the need for international cooperation in AI regulation.\n",
+      "\n",
+      "The Global Partnership on Artificial Intelligence, initiated in June 2020, emphasizes the development of AI in line with human rights and democratic values to maintain public trust. Notable figures like Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher advocated for a government commission to oversee AI in 2021. By 2023, OpenAI proposed governance frameworks for superintelligence, anticipating its emergence within a decade. The same year, the United Nations established an advisory group consisting of tech executives, government officials, and academics to offer guidance on AI governance.\n",
+      "\n",
+      "Public opinion on AI varies significantly across countries. A 2022 Ipsos survey showed a stark contrast between Chinese (78% approval) and American (35% approval) citizens on the benefits of AI. Further polls in 2023 revealed mixed feelings among Americans about the risks of AI and the importance of federal regulation.\n",
+      "\n",
+      "The first global AI Safety Summit took place in November 2023 at Bletchley Park, UK, focusing on AI risks and potential regulatory measures. The summit concluded with a declaration from 28 countries, including the US, China, and the EU, advocating for international collaboration to address AI challenges.\n",
+      "\n",
+      "Historically, the concept of AI traces back to ancient philosophers and mathematicians, evolving through significant milestones such as Alan Turing's theory of computation and the exploration of cybernetics, information theory, and neurobiology, which paved the way for the modern concept of an \"electronic brain.\"\n",
+      "\n",
+      "Early research in artificial intelligence (AI) included the development of \"artificial neurons\" by McCullouch and Pitts in 1943 and Turing's 1950 paper that introduced the Turing test, suggesting the plausibility of machine intelligence. The field of AI was officially founded during a 1956 workshop at Dartmouth College, leading to significant advancements in the 1960s such as computers learning checkers, solving algebra problems, proving theorems, and speaking English. AI labs were established in various British and U.S. universities during the late 1950s and early 1960s.\n",
+      "\n",
+      "In the 1960s and 1970s, researchers were optimistic about achieving general machine intelligence, with predictions from notable figures like Herbert Simon and Marvin Minsky that AI would soon match human capabilities. However, they underestimated the challenges involved. By 1974, due to criticism and a shift in funding priorities, exploratory AI research faced significant cuts, leading to a period known as the \"AI winter\" where funding was scarce.\n",
+      "\n",
+      "The field saw a resurgence in the early 1980s with the commercial success of expert systems, which simulated the decision-making abilities of human experts. This revival was further bolstered by the Japanese fifth generation computer project, prompting the U.S. and British governments to reinstate academic funding, with the AI market reaching over a billion dollars by 1985.\n",
+      "\n",
+      "The AI industry experienced a significant downturn starting in 1987 with the collapse of the Lisp Machine market, marking the beginning of a prolonged AI winter. During the 1980s, skepticism grew over the symbolic approaches to AI, which focused on high-level representations of cognitive processes like planning and reasoning. Researchers began exploring sub-symbolic methods, including Rodney Brooks' work on autonomous robots and the development of techniques for handling uncertain information by Judea Pearl and Lofti Zadeh. A pivotal shift occurred with the resurgence of connectionism and neural networks, notably through Geoffrey Hinton's efforts, and Yann LeCun's demonstration in 1990 that convolutional neural networks could recognize handwritten digits.\n",
+      "\n",
+      "AI's reputation started to recover in the late 1990s and early 2000s as the field adopted more formal mathematical methods and focused on solving specific problems, leading to practical applications widely used by 2000. However, concerns arose about AI's deviation from its original aim of creating fully intelligent machines, prompting the establishment of the artificial general intelligence (AGI) subfield around 2002.\n",
+      "\n",
+      "By 2012, deep learning began to dominate AI, driven by hardware advancements and access to large data sets, leading to its widespread adoption and a surge in AI interest and funding. This success, however, led to the abandonment of many alternative AI methods for specific tasks.\n",
+      "\n",
+      "Between 2015 and 2019, machine learning research publications increased by 50%. In 2016, the focus at machine learning conferences shifted significantly towards issues of fairness and the potential misuse of technology, leading to increased funding and research in these areas. The late 2010s and early 2020s saw significant advancements in artificial general intelligence (AGI), with notable developments like AlphaGo by DeepMind in 2015, which defeated the world champion in Go, and OpenAI's GPT-3 in 2020, a model capable of generating human-like text. These innovations spurred a major AI investment boom, with approximately $50 billion being invested annually in AI in the U.S. by 2022, and AI-related fields attracting 20% of new US Computer Science PhD graduates. Additionally, there were around 800,000 AI-related job openings in the U.S. in 2022.\n",
+      "\n",
+      "In the realm of philosophy, the definition and understanding of artificial intelligence have evolved. Alan Turing, in 1950, suggested shifting the focus from whether machines can think to whether they can exhibit intelligent behavior, as demonstrated by his Turing test, which assesses a machine's ability to simulate human conversation. Turing argued that since we can only observe behavior, the internal thought processes of machines are irrelevant, similar to our assumptions about human thought. Russell and Norvig supported defining intelligence based on observable behavior but criticized the Turing test for emphasizing human imitation.\n",
+      "\n",
+      "Aeronautical engineering does not aim to create machines that mimic pigeons exactly, just as artificial intelligence (AI) is not about perfectly simulating human intelligence, according to AI founder John McCarthy. McCarthy defines intelligence as the computational ability to achieve goals, while Marvin Minsky views it as solving difficult problems. The leading AI textbook describes it as the study of agents that perceive and act to maximize their goal achievement. Google's definition aligns intelligence in AI with the synthesis of information, similar to biological intelligence.\n",
+      "\n",
+      "AI research has lacked a unifying theory, with statistical machine learning dominating the field in the 2010s, often equated with AI in business contexts. This approach, primarily using neural networks, is described as sub-symbolic and narrow.\n",
+      "\n",
+      "Symbolic AI, or \"GOFAI,\" focused on simulating high-level reasoning used in tasks like puzzles and mathematics, and was proposed by Newell and Simon in the 1960s. Despite its success in structured tasks, symbolic AI struggled with tasks that humans find easy, such as learning and commonsense reasoning.\n",
+      "\n",
+      "Moravec's paradox highlights that AI finds high-level reasoning tasks easier than instinctive, sensory tasks, a view initially opposed but later supported by AI research, aligning with philosopher Hubert Dreyfus's earlier arguments. The debate continues, especially around sub-symbolic AI, which, like human intuition, can be prone to errors such as algorithmic bias and lacks transparency in decision-making processes. This has led to the development of neuro-symbolic AI, which aims to integrate symbolic and sub-symbolic approaches.\n",
+      "\n",
+      "In AI development, there has been a historical division between \"Neats,\" who believe intelligent behavior can be described with simple principles, and \"Scruffies,\" who believe it involves solving many complex problems. This debate, prominent in the 1970s and 1980s, has largely been deemed irrelevant as modern AI incorporates both approaches.\n",
+      "\n",
+      "Soft computing, which emerged in the late 1980s, focuses on techniques like genetic algorithms, fuzzy logic, and neural networks to handle imprecision and uncertainty, proving successful in many modern AI applications.\n",
+      "\n",
+      "Finally, there is a division in AI research between pursuing narrow AI, which solves specific problems, and aiming for broader goals like artificial general intelligence and superintelligence, with differing opinions on which approach might more effectively advance the field.\n",
+      "\n",
+      "General intelligence is a complex concept that is hard to define and measure, leading modern AI research to focus on specific problems and solutions. The sub-field of artificial general intelligence exclusively explores this area. In terms of machine consciousness and sentience, the philosophy of mind has yet to determine if machines can possess minds or consciousness similar to humans, focusing instead on their internal experiences rather than external behaviors. Mainstream AI research generally views these considerations as irrelevant to its objectives, which are to develop machines capable of solving problems intelligently.\n",
+      "\n",
+      "The philosophy of mind debates whether machines can truly be conscious or just appear to be so, a topic that is also popular in AI fiction. David Chalmers distinguishes between the \"hard\" problem of consciousness, which is understanding why or how brain processes feel like something, and the \"easy\" problem, which involves understanding how the brain processes information and controls behavior. The subjective experience, such as feeling a color, remains a significant challenge to explain.\n",
+      "\n",
+      "In the realm of computationalism and functionalism, the belief is that the human mind functions as an information processing system, and thinking is akin to computing. This perspective suggests that the mind-body relationship is similar to that between software and hardware, potentially offering insights into the mind-body problem.\n",
+      "\n",
+      "The concept of \"strong AI,\" as described by philosopher John Searle, suggests that a properly programmed computer could possess a mind similar to humans. However, Searle's Chinese room argument challenges this by claiming that even if a machine can mimic human behavior, it doesn't necessarily mean it has a mind. The debate extends into AI welfare and rights, focusing on the difficulty of determining AI sentience and the ethical implications if machines could feel and suffer. Discussions around AI rights have included proposals like granting \"electronic personhood\" to advanced AI systems in the EU, which would give them certain rights and responsibilities, though this has faced criticism regarding its impact on human rights and the autonomy of robots.\n",
+      "\n",
+      "The topic of AI rights is gaining traction, with advocates warning against the potential moral oversight in denying AI sentience, which could lead to exploitation and suffering akin to historical injustices like slavery. The concept of superintelligence involves an agent with intelligence far beyond human capabilities, which could potentially lead to a self-improving AI, a scenario often referred to as the singularity.\n",
+      "\n",
+      "The concept of an \"intelligence explosion\" or \"singularity\" suggests a point where technology improves exponentially, although such growth typically follows an S-shaped curve and slows upon reaching technological limits. Transhumanism, supported by figures like Hans Moravec, Kevin Warwick, and Ray Kurzweil, envisions a future where humans and machines merge into advanced cyborgs. This idea has historical roots in the thoughts of Aldous Huxley and Robert Ettinger. Edward Fredkin, building on ideas dating back to Samuel Butler in 1863, views artificial intelligence as the next stage of evolution, a concept further explored by George Dyson.\n",
+      "\n",
+      "In literature and media, the portrayal of artificial intelligence has been a theme since antiquity, with robots and AI often depicted in science fiction. The term \"robot\" was first introduced by Karel Čapek in 1921. Notable narratives include Mary Shelley's \"Frankenstein\" and films like \"2001: A Space Odyssey\" and \"The Terminator,\" which typically showcase AI as a threat. Conversely, loyal robots like Gort from \"The Day the Earth Stood Still\" are less common. Isaac Asimov's Three Laws of Robotics, introduced in his Multivac series, are frequently discussed in the context of machine ethics, though many AI researchers find them ambiguous and impractical.\n",
+      "\n",
+      "Numerous works, including Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, and Philip K. Dick's novel Do Androids Dream of Electric Sheep?, utilize AI to explore the essence of humanity. These works present artificial beings capable of feeling and suffering, prompting a reevaluation of human subjectivity in the context of advanced technology.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(summary_with_detail_1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that this utility also allows passing additional instructions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:33:18.789246Z",
+     "start_time": "2024-04-10T05:22:57.789764Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 5/5 [00:38<00:00,  7.73s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "- AI is intelligence demonstrated by machines, especially computer systems.\n",
+      "- AI technology applications include search engines, recommendation systems, speech interaction, autonomous vehicles, creative tools, and strategy games.\n",
+      "- Alan Turing initiated substantial AI research, termed \"machine intelligence.\"\n",
+      "- AI became an academic discipline in 1956, experiencing cycles of optimism and \"AI winters.\"\n",
+      "- Post-2012, deep learning and post-2017 transformer architectures revitalized AI, leading to a boom in the early 2020s.\n",
+      "- AI influences societal and economic shifts towards automation and data-driven decision-making across various sectors.\n",
+      "- AI research goals: reasoning, knowledge representation, planning, learning, natural language processing, perception, and robotics support.\n",
+      "- AI techniques include search, optimization, logic, neural networks, and statistical methods.\n",
+      "- AI sub-problems focus on traits like reasoning, problem-solving, knowledge representation, planning, decision-making, learning, and perception.\n",
+      "- Early AI research mimicked human step-by-step reasoning; modern AI handles uncertain information using probability and economics.\n",
+      "- Knowledge representation in AI involves ontologies and knowledge bases to support intelligent querying and reasoning.\n",
+      "- Planning in AI involves goal-directed behavior and decision-making based on utility maximization.\n",
+      "- Learning in AI includes machine learning, supervised and unsupervised learning, reinforcement learning, and deep learning.\n",
+      "- Natural language processing (NLP) in AI has evolved from rule-based systems to modern deep learning techniques.\n",
+      "- AI perception involves interpreting sensor data for tasks like speech recognition and computer vision.\n",
+      "- General AI aims to solve diverse problems with human-like versatility.\n",
+      "- AI search techniques include state space search, local search, and adversarial search for game-playing.\n",
+      "- Logic in AI uses formal systems like propositional and predicate logic for reasoning and knowledge representation.\n",
+      "- Probabilistic methods in AI address decision-making and planning under uncertainty using tools like Bayesian networks and Markov decision processes.\n",
+      "- Classifiers in AI categorize data into predefined classes based on pattern matching and supervised learning.\n",
+      "\n",
+      "- Neural networks: Interconnected nodes, similar to brain neurons, with input, hidden layers, and output.\n",
+      "- Deep neural networks: At least 2 hidden layers.\n",
+      "- Training techniques: Commonly use backpropagation.\n",
+      "- Feedforward networks: Signal passes in one direction.\n",
+      "- Recurrent networks: Output fed back into input for short-term memory.\n",
+      "- Perceptrons: Single layer of neurons.\n",
+      "- Convolutional networks: Strengthen connections between close neurons, important in image processing.\n",
+      "- Deep learning: Multiple layers extract features progressively, used in various AI subfields.\n",
+      "- GPT (Generative Pre-trained Transformers): Large language models pre-trained on text, used in chatbots.\n",
+      "- Specialized AI hardware: GPUs replaced CPUs for training large-scale machine learning models.\n",
+      "- AI applications: Used in search engines, online ads, virtual assistants, autonomous vehicles, language translation, facial recognition.\n",
+      "- AI in healthcare: Increases patient care, used in medical research and drug discovery.\n",
+      "- AI in games: Used in chess, Jeopardy!, Go, and real-time strategy games.\n",
+      "- Military AI: Enhances command, control, and operations, used in coordination and threat detection.\n",
+      "- Generative AI: Creates realistic images and texts, used in creative arts.\n",
+      "- AI ethics and risks: Concerns over privacy, surveillance, copyright, misinformation, and algorithmic bias.\n",
+      "- Algorithmic bias: Can cause discrimination if trained on biased data, fairness in machine learning is a critical area of study.\n",
+      "\n",
+      "- AI engineers demographics: 4% black, 20% women.\n",
+      "- ACM FAccT 2022: Recommends limiting use of self-learning neural networks due to bias.\n",
+      "- AI complexity: Designers often can't explain decision-making processes.\n",
+      "- Misleading AI outcomes: Skin disease identifier misclassifies images with rulers as \"cancerous\"; AI misclassifies asthma patients as low risk for pneumonia.\n",
+      "- Right to explanation: Essential for accountability, especially in medical and legal fields.\n",
+      "- DARPA's XAI program (2014): Aims to make AI decisions understandable.\n",
+      "- Transparency solutions: SHAP, LIME, multitask learning, deconvolution, DeepDream.\n",
+      "- AI misuse: Authoritarian surveillance, misinformation, autonomous weapons.\n",
+      "- AI in warfare: 30 nations support UN ban on autonomous weapons; over 50 countries researching battlefield robots.\n",
+      "- Technological unemployment: AI could increase long-term unemployment; conflicting expert opinions on job risk from automation.\n",
+      "- Existential risks of AI: Potential to lose control over superintelligent AI; concerns from Stephen Hawking, Bill Gates, Elon Musk.\n",
+      "- Ethical AI development: Importance of aligning AI with human values and ethics.\n",
+      "- AI regulation: Increasing global legislative activity; first global AI Safety Summit in 2023.\n",
+      "- Historical perspective: AI research dates back to antiquity, significant developments in mid-20th century.\n",
+      "\n",
+      "- 1974: U.S. and British governments ceased AI exploratory research due to criticism and funding pressures.\n",
+      "- 1985: AI market value exceeded $1 billion.\n",
+      "- 1987: Collapse of Lisp Machine market led to a second, prolonged AI winter.\n",
+      "- 1990: Yann LeCun demonstrated successful use of convolutional neural networks for recognizing handwritten digits.\n",
+      "- Early 2000s: AI reputation restored through specific problem-solving and formal methods.\n",
+      "- 2012: Deep learning began dominating AI benchmarks.\n",
+      "- 2015-2019: Machine learning research publications increased by 50%.\n",
+      "- 2016: Fairness and misuse of technology became central issues in AI.\n",
+      "- 2022: Approximately $50 billion annually invested in AI in the U.S.; 800,000 AI-related job openings in the U.S.\n",
+      "- Turing test proposed by Alan Turing in 1950 to measure machine's ability to simulate human conversation.\n",
+      "- AI defined as the study of agents that perceive their environment and take actions to achieve goals.\n",
+      "- 2010s: Statistical machine learning overshadowed other AI approaches.\n",
+      "- Symbolic AI excelled in high-level reasoning but failed in tasks like object recognition and commonsense reasoning.\n",
+      "- Late 1980s: Introduction of soft computing techniques.\n",
+      "- Debate between pursuing narrow AI (specific problem-solving) versus artificial general intelligence (AGI).\n",
+      "- 2017: EU considered granting \"electronic personhood\" to advanced AI systems.\n",
+      "- Predictions of merging humans and machines into cyborgs, a concept known as transhumanism.\n",
+      "\n",
+      "- Focus on how AI and technology, as depicted in \"Ex Machina\" and Philip K. Dick's \"Do Androids Dream of Electric Sheep?\", alter human subjectivity.\n",
+      "- No specific numerical data provided.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "summary_with_additional_instructions = summarize(artificial_intelligence_wikipedia_text, detail=0.1,\n",
+    "                                                 additional_instructions=\"Write in point form and focus on numerical data.\")\n",
+    "print(summary_with_additional_instructions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, note that the utility allows for recursive summarization, where each summary is based on the previous summaries, adding more context to the summarization process. This can be enabled by setting the `summarize_recursively` parameter to True. This is more computationally expensive, but can increase consistency and coherence of the combined summary."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-04-10T05:33:30.123036Z",
+     "start_time": "2024-04-10T05:33:18.791253Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 5/5 [00:41<00:00,  8.36s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Artificial intelligence (AI) is the simulation of human intelligence in machines, designed to perform tasks that typically require human intelligence. This includes applications like advanced search engines, recommendation systems, speech interaction, autonomous vehicles, and strategic game analysis. AI was established as a distinct academic discipline in 1956 and has experienced cycles of high expectations followed by disillusionment and decreased funding, known as \"AI winters.\" Interest in AI surged post-2012 with advancements in deep learning and again post-2017 with the development of transformer architectures, leading to significant progress in the early 2020s.\n",
+      "\n",
+      "AI's increasing integration into various sectors is influencing societal and economic shifts towards automation and data-driven decision-making, affecting areas such as employment, healthcare, and education. This raises important ethical and safety concerns, prompting discussions on regulatory policies.\n",
+      "\n",
+      "AI research encompasses various sub-fields focused on specific goals like reasoning, learning, natural language processing, perception, and robotics, using techniques from search and optimization, logic, and probabilistic methods. The field also draws from psychology, linguistics, philosophy, and neuroscience. AI aims to achieve general intelligence, enabling machines to perform any intellectual task that a human can do.\n",
+      "\n",
+      "Artificial intelligence (AI) simulates human intelligence in machines to perform tasks that typically require human intellect, such as advanced search engines, recommendation systems, and autonomous vehicles. AI research, which began as a distinct academic discipline in 1956, includes sub-fields like natural language processing and robotics, employing techniques from various scientific domains. AI has significantly advanced due to deep learning and the development of transformer architectures, notably improving applications in computer vision, speech recognition, and other areas.\n",
+      "\n",
+      "Neural networks, central to AI, mimic the human brain's neuron network to recognize patterns and learn from data, using multiple layers in deep learning to extract complex features. These networks have evolved into sophisticated models like GPT (Generative Pre-trained Transformers) for natural language processing, enhancing applications like chatbots.\n",
+      "\n",
+      "AI's integration into sectors like healthcare, military, and agriculture has led to innovations like precision medicine and smart farming but also raised ethical concerns regarding privacy, bias, and the potential for misuse. Issues like data privacy, algorithmic bias, and the generation of misinformation are critical challenges as AI becomes pervasive in society. AI's potential and risks necessitate careful management and regulation to harness benefits while mitigating adverse impacts.\n",
+      "\n",
+      "AI, or artificial intelligence, simulates human intelligence in machines to perform complex tasks, such as operating autonomous vehicles and analyzing strategic games. Since its establishment as an academic discipline in 1956, AI has seen periods of high expectations and subsequent disillusionment, known as \"AI winters.\" Recent advancements in deep learning and transformer architectures have significantly advanced AI capabilities in areas like computer vision and speech recognition.\n",
+      "\n",
+      "AI's integration into various sectors, including healthcare and agriculture, has led to innovations like precision medicine and smart farming but has also raised ethical concerns about privacy, bias, and misuse. The complexity of AI systems, particularly deep neural networks, often makes it difficult for developers to explain their decision-making processes, leading to transparency issues. This lack of transparency can result in unintended consequences, such as misclassifications in medical diagnostics.\n",
+      "\n",
+      "The potential for AI to be weaponized by bad actors, such as authoritarian governments or terrorists, poses significant risks. AI's reliance on large tech companies for computational power and the potential for technological unemployment are also critical issues. Despite these challenges, AI also offers opportunities for enhancing human well-being if ethical considerations are integrated throughout the design and implementation stages.\n",
+      "\n",
+      "Regulation of AI is emerging globally, with various countries adopting AI strategies to ensure the technology aligns with human rights and democratic values. The first global AI Safety Summit in 2023 emphasized the need for international cooperation to manage AI's risks and challenges effectively.\n",
+      "\n",
+      "In the 1970s, AI research faced significant setbacks due to criticism from influential figures like Sir James Lighthill and funding cuts from the U.S. and British governments, leading to the first \"AI winter.\" The field saw a resurgence in the 1980s with the success of expert systems and renewed government funding, but suffered another setback with the collapse of the Lisp Machine market in 1987, initiating a second AI winter. During this period, researchers began exploring \"sub-symbolic\" approaches, including neural networks, which gained prominence in the 1990s with successful applications like Yann LeCun’s convolutional neural networks for digit recognition.\n",
+      "\n",
+      "By the early 21st century, AI was revitalized by focusing on narrow, specific problems, leading to practical applications and integration into various sectors. The field of artificial general intelligence (AGI) emerged, aiming to create versatile, fully intelligent machines. The 2010s saw deep learning dominate AI research, driven by hardware improvements and large datasets, which significantly increased interest and investment in AI.\n",
+      "\n",
+      "Philosophically, AI has been defined in various ways, focusing on external behavior rather than internal experience, aligning with Alan Turing's proposal of the Turing test. The field has debated the merits of symbolic vs. sub-symbolic AI, with ongoing discussions about machine consciousness and the ethical implications of potentially sentient AI. The concept of AI rights and welfare has also emerged, reflecting concerns about the moral status of advanced AI systems.\n",
+      "\n",
+      "Overall, AI research has oscillated between periods of intense optimism and profound setbacks, with current trends heavily favoring practical applications through narrow AI, while continuing to explore the broader implications and potential of general and superintelligent AI systems.\n",
+      "\n",
+      "Artificial Intelligence (AI) and its portrayal in media, such as the film \"Ex Machina\" and Philip K. Dick's novel \"Do Androids Dream of Electric Sheep?\", explore how technology, particularly AI, can alter our understanding of human subjectivity.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "recursive_summary = summarize(artificial_intelligence_wikipedia_text, detail=0.1, summarize_recursively=True)\n",
+    "print(recursive_summary)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/examples/Summarizing_with_controllable_detail.ipynb
+++ b/examples/Summarizing_with_controllable_detail.ipynb
@ -1,751 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Summarization with Controllable Detail"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail.\n",
-    " \n",
-    "If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 41,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.305706Z",
-     "start_time": "2024-04-10T05:19:35.303535Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from typing import List, Tuple, Optional\n",
-    "\n",
-    "from openai import OpenAI\n",
-    "import tiktoken\n",
-    "from tqdm import tqdm"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 42,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.325026Z",
-     "start_time": "2024-04-10T05:19:35.322414Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# open dataset containing part of the text of the Wikipedia page for the United States\n",
-    "with open(\"data/united_states_wikipedia.txt\", \"r\") as file:\n",
-    "    united_states_wikipedia_text = file.read()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 43,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.364483Z",
-     "start_time": "2024-04-10T05:19:35.348213Z"
-    }
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "15781"
-     },
-     "execution_count": 43,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# load encoding and check the length of dataset\n",
-    "encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
-    "len(encoding.encode(united_states_wikipedia_text))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll define a simple utility to wrap calls to the OpenAI API."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 44,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.375619Z",
-     "start_time": "2024-04-10T05:19:35.365818Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "client = OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
-    "\n",
-    "def get_chat_completion(messages, model='gpt-3.5-turbo'):\n",
-    "    response = client.chat.completions.create(\n",
-    "        model=model,\n",
-    "        messages=messages,\n",
-    "        temperature=0,\n",
-    "    )\n",
-    "    return response.choices[0].message.content"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Next we'll define some utilities to chunk a large document into smaller pieces."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 45,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.382790Z",
-     "start_time": "2024-04-10T05:19:35.376721Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def tokenize(text: str) -> List[str]:\n",
-    "    encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
-    "    return encoding.encode(text)\n",
-    "\n",
-    "\n",
-    "# This function chunks a text into smaller pieces based on a maximum token count and a delimiter.\n",
-    "def chunk_on_delimiter(input_string: str,\n",
-    "                       max_tokens: int, delimiter: str) -> List[str]:\n",
-    "    chunks = input_string.split(delimiter)\n",
-    "    combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(\n",
-    "        chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True\n",
-    "    )\n",
-    "    if dropped_chunk_count > 0:\n",
-    "        print(f\"warning: {dropped_chunk_count} chunks were dropped due to overflow\")\n",
-    "    combined_chunks = [f\"{chunk}{delimiter}\" for chunk in combined_chunks]\n",
-    "    return combined_chunks\n",
-    "\n",
-    "\n",
-    "# This function combines text chunks into larger blocks without exceeding a specified token count. It returns the combined text blocks, their original indices, and the count of chunks dropped due to overflow.\n",
-    "def combine_chunks_with_no_minimum(\n",
-    "        chunks: List[str],\n",
-    "        max_tokens: int,\n",
-    "        chunk_delimiter=\"\\n\\n\",\n",
-    "        header: Optional[str] = None,\n",
-    "        add_ellipsis_for_overflow=False,\n",
-    ") -> Tuple[List[str], List[int]]:\n",
-    "    dropped_chunk_count = 0\n",
-    "    output = []  # list to hold the final combined chunks\n",
-    "    output_indices = []  # list to hold the indices of the final combined chunks\n",
-    "    candidate = (\n",
-    "        [] if header is None else [header]\n",
-    "    )  # list to hold the current combined chunk candidate\n",
-    "    candidate_indices = []\n",
-    "    for chunk_i, chunk in enumerate(chunks):\n",
-    "        chunk_with_header = [chunk] if header is None else [header, chunk]\n",
-    "        if len(tokenize(chunk_delimiter.join(chunk_with_header))) > max_tokens:\n",
-    "            print(f\"warning: chunk overflow\")\n",
-    "            if (\n",
-    "                    add_ellipsis_for_overflow\n",
-    "                    and len(tokenize(chunk_delimiter.join(candidate + [\"...\"]))) <= max_tokens\n",
-    "            ):\n",
-    "                candidate.append(\"...\")\n",
-    "                dropped_chunk_count += 1\n",
-    "            continue  # this case would break downstream assumptions\n",
-    "        # estimate token count with the current chunk added\n",
-    "        extended_candidate_token_count = len(tokenize(chunk_delimiter.join(candidate + [chunk])))\n",
-    "        # If the token count exceeds max_tokens, add the current candidate to output and start a new candidate\n",
-    "        if extended_candidate_token_count > max_tokens:\n",
-    "            output.append(chunk_delimiter.join(candidate))\n",
-    "            output_indices.append(candidate_indices)\n",
-    "            candidate = chunk_with_header  # re-initialize candidate\n",
-    "            candidate_indices = [chunk_i]\n",
-    "        # otherwise keep extending the candidate\n",
-    "        else:\n",
-    "            candidate.append(chunk)\n",
-    "            candidate_indices.append(chunk_i)\n",
-    "    # add the remaining candidate to output if it's not empty\n",
-    "    if (header is not None and len(candidate) > 1) or (header is None and len(candidate) > 0):\n",
-    "        output.append(chunk_delimiter.join(candidate))\n",
-    "        output_indices.append(candidate_indices)\n",
-    "    return output, output_indices, dropped_chunk_count"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now we can define a utility to summarize text with a controllable level of detail (note the detail parameter).\n",
-    "\n",
-    "The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on a controllable detail parameter. It then splits the text into chunks and summarizes each chunk."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 46,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:35.390876Z",
-     "start_time": "2024-04-10T05:19:35.385076Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def summarize(text: str,\n",
-    "              detail: float = 0,\n",
-    "              model: str = 'gpt-3.5-turbo',\n",
-    "              additional_instructions: Optional[str] = None,\n",
-    "              minimum_chunk_size: Optional[int] = 500,\n",
-    "              chunk_delimiter: str = \".\",\n",
-    "              summarize_recursively=False,\n",
-    "              verbose=False):\n",
-    "    \"\"\"\n",
-    "    Summarizes a given text by splitting it into chunks, each of which is summarized individually. \n",
-    "    The level of detail in the summary can be adjusted, and the process can optionally be made recursive.\n",
-    "\n",
-    "    Parameters:\n",
-    "    - text (str): The text to be summarized.\n",
-    "    - detail (float, optional): A value between 0 and 1 indicating the desired level of detail in the summary.\n",
-    "      0 leads to a higher level summary, and 1 results in a more detailed summary. Defaults to 0.\n",
-    "    - model (str, optional): The model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.\n",
-    "    - additional_instructions (Optional[str], optional): Additional instructions to provide to the model for customizing summaries.\n",
-    "    - minimum_chunk_size (Optional[int], optional): The minimum size for text chunks. Defaults to 500.\n",
-    "    - chunk_delimiter (str, optional): The delimiter used to split the text into chunks. Defaults to \".\".\n",
-    "    - summarize_recursively (bool, optional): If True, summaries are generated recursively, using previous summaries for context.\n",
-    "    - verbose (bool, optional): If True, prints detailed information about the chunking process.\n",
-    "\n",
-    "    Returns:\n",
-    "    - str: The final compiled summary of the text.\n",
-    "\n",
-    "    The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on the `detail` parameter. \n",
-    "    It then splits the text into chunks and summarizes each chunk. If `summarize_recursively` is True, each summary is based on the previous summaries, \n",
-    "    adding more context to the summarization process. The function returns a compiled summary of all chunks.\n",
-    "    \"\"\"\n",
-    "\n",
-    "    # check detail is set correctly\n",
-    "    assert 0 <= detail <= 1\n",
-    "\n",
-    "    # interpolate the number of chunks based to get specified level of detail\n",
-    "    max_chunks = len(chunk_on_delimiter(text, minimum_chunk_size, chunk_delimiter))\n",
-    "    min_chunks = 1\n",
-    "    num_chunks = int(min_chunks + detail * (max_chunks - min_chunks))\n",
-    "\n",
-    "    # adjust chunk_size based on interpolated number of chunks\n",
-    "    document_length = len(tokenize(text))\n",
-    "    chunk_size = max(minimum_chunk_size, document_length // num_chunks)\n",
-    "    text_chunks = chunk_on_delimiter(text, chunk_size, chunk_delimiter)\n",
-    "    if verbose:\n",
-    "        print(f\"Splitting the text into {len(text_chunks)} chunks to be summarized.\")\n",
-    "        print(f\"Chunk lengths are {[len(tokenize(x)) for x in text_chunks]}\")\n",
-    "\n",
-    "    # set system message\n",
-    "    system_message_content = \"Summarize the following text.\"\n",
-    "    if additional_instructions is not None:\n",
-    "        system_message_content += f\"\\n\\n{additional_instructions}\"\n",
-    "\n",
-    "    accumulated_summaries = []\n",
-    "    for chunk in tqdm(text_chunks):\n",
-    "        if summarize_recursively and accumulated_summaries:\n",
-    "            # Creating a structured prompt for recursive summarization\n",
-    "            accumulated_summaries_string = '\\n\\n'.join(accumulated_summaries)\n",
-    "            user_message_content = f\"Previous summaries:\\n\\n{accumulated_summaries_string}\\n\\nText to summarize next:\\n\\n{chunk}\"\n",
-    "        else:\n",
-    "            # Directly passing the chunk for summarization without recursive context\n",
-    "            user_message_content = chunk\n",
-    "\n",
-    "        # Constructing messages based on whether recursive summarization is applied\n",
-    "        messages = [\n",
-    "            {\"role\": \"system\", \"content\": system_message_content},\n",
-    "            {\"role\": \"user\", \"content\": user_message_content}\n",
-    "        ]\n",
-    "\n",
-    "        # Assuming this function gets the completion and works as expected\n",
-    "        response = get_chat_completion(messages, model=model)\n",
-    "        accumulated_summaries.append(response)\n",
-    "\n",
-    "    # Compile final summary from partial summaries\n",
-    "    final_summary = '\\n\\n'.join(accumulated_summaries)\n",
-    "\n",
-    "    return final_summary"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "Now we can use this utility to produce summaries with varying levels of detail. By increasing 'detail' from 0 to 1 we get progressively longer summaries of the underlying document. A higher value for the detail parameter results in a more detailed summary because the utility first splits the document into a greater number of chunks. Each chunk is then summarized, and the final summary is a concatenation of all the chunk summaries."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 47,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:47.541096Z",
-     "start_time": "2024-04-10T05:19:35.391911Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 1 chunks to be summarized.\n",
-      "Chunk lengths are [15781]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 1/1 [00:07<00:00,  7.31s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_0 = summarize(united_states_wikipedia_text, detail=0, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 48,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:19:58.724212Z",
-     "start_time": "2024-04-10T05:19:47.542129Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 5 chunks to be summarized.\n",
-      "Chunk lengths are [3945, 3941, 3943, 3915, 37]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 5/5 [00:09<00:00,  1.97s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_pt1 = summarize(united_states_wikipedia_text, detail=0.1, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 49,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:20:16.216023Z",
-     "start_time": "2024-04-10T05:19:58.725014Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 8 chunks to be summarized.\n",
-      "Chunk lengths are [2214, 2253, 2249, 2255, 2254, 2255, 2221, 84]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 8/8 [00:16<00:00,  2.08s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_pt2 = summarize(united_states_wikipedia_text, detail=0.2, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 50,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:20:46.941240Z",
-     "start_time": "2024-04-10T05:20:16.225524Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 14 chunks to be summarized.\n",
-      "Chunk lengths are [1198, 1209, 1210, 1209, 1212, 1192, 1176, 1205, 1212, 1201, 1210, 1210, 1192, 154]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 14/14 [00:30<00:00,  2.15s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_pt4 = summarize(united_states_wikipedia_text, detail=0.4, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 51,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:21:44.913140Z",
-     "start_time": "2024-04-10T05:20:46.953285Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 27 chunks to be summarized.\n",
-      "Chunk lengths are [602, 596, 601, 601, 604, 598, 572, 594, 592, 592, 604, 593, 578, 582, 597, 600, 596, 555, 582, 601, 582, 587, 581, 595, 598, 568, 445]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 27/27 [00:57<00:00,  2.13s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_pt8 = summarize(united_states_wikipedia_text, detail=0.8, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 52,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:22:57.760218Z",
-     "start_time": "2024-04-10T05:21:44.921275Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Splitting the text into 33 chunks to be summarized.\n",
-      "Chunk lengths are [490, 443, 475, 490, 501, 470, 472, 487, 479, 477, 447, 442, 490, 468, 488, 477, 493, 493, 472, 491, 490, 501, 493, 468, 500, 500, 474, 460, 489, 462, 490, 482, 445]\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 33/33 [01:12<00:00,  2.20s/it]\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_detail_1 = summarize(united_states_wikipedia_text, detail=1.0, verbose=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The original document is ~15k tokens long. Notice how large the gap is between the length of 'summary_pt0' and summary_pt10'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 53,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:22:57.782389Z",
-     "start_time": "2024-04-10T05:22:57.763041Z"
-    }
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "[307, 494, 839, 1662, 3552, 4128]"
-     },
-     "execution_count": 53,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# lengths of summaries\n",
-    "[len(tokenize(x)) for x in\n",
-    " [summary_with_detail_0, summary_with_detail_pt1, summary_with_detail_pt2, summary_with_detail_pt4,\n",
-    "  summary_with_detail_pt8, summary_with_detail_1]]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's inspect the summaries to see how the level of detail changes with the `detail` parameter set to 0, 0.1, 0.2, 0.4, 0.8, and 1 respectively."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 54,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:22:57.785881Z",
-     "start_time": "2024-04-10T05:22:57.783455Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The United States of America is a diverse country located in North America, with a population exceeding 334 million. It is a federation of 50 states, a federal capital district, and various territories. The country has a rich history, from the migration of Paleo-Indians over 12,000 years ago to the British colonization and the American Revolution. The U.S. has gone through significant events like the Civil War, World War II, and the Cold War, emerging as a superpower after the collapse of the Soviet Union.\n",
-      "\n",
-      "The U.S. government is a presidential constitutional republic with three separate branches: legislative, executive, and judicial. The country has a strong emphasis on liberty, equality under the law, individualism, and limited government. Economically, the U.S. has the largest nominal GDP in the world and is a leader in economic competitiveness, innovation, and human rights. The U.S. is also a founding member of various international organizations like the UN, World Bank, and NATO.\n",
-      "\n",
-      "The U.S. has a rich cultural landscape, with influences from various ethnic groups and traditions. American literature, music, cinema, and theater have made significant contributions to global culture. The country is known for its diverse cuisine, with dishes influenced by various immigrant groups. The U.S. also has a strong presence in the fashion industry, with New York City being a global fashion capital.\n",
-      "\n",
-      "Overall, the United States is a country with a rich history, diverse population, strong economy, and significant cultural influence on the world stage.\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(summary_with_detail_0)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 55,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:22:57.788969Z",
-     "start_time": "2024-04-10T05:22:57.786691Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The United States of America is a country located in North America, consisting of 50 states, a federal capital district, and various territories. It has a rich history, from the arrival of Paleo-Indians over 12,000 years ago to British colonization and the American Revolution. The U.S. expanded across North America, faced sectional divisions over slavery, and emerged as a global superpower after World War II. The country has a presidential constitutional republic with three branches of government and a strong emphasis on liberty, equality, and limited government. Economically, the U.S. is a major player with the largest nominal GDP in the world and significant influence in various international organizations. The country's name, history, and expansion are detailed, including key events like the Declaration of Independence, the Revolutionary War, and the Louisiana Purchase.\n",
-      "\n",
-      "The text discusses key events in American history, including the Missouri Compromise, Indian removal policies, the Civil War, Reconstruction era, post-Civil War developments, rise as a superpower, Cold War era, and contemporary history. It highlights significant events such as the Trail of Tears, California Gold Rush, Reconstruction Amendments, immigration waves, World Wars, Cold War tensions, civil rights movement, economic developments, technological advancements, and major conflicts like the Gulf War and War on Terror. The text also mentions social changes, economic challenges like the Great Depression and Great Recession, and political developments leading to increased polarization in the 2010s.\n",
-      "\n",
-      "The text discusses the geography, climate, biodiversity, conservation efforts, government, politics, political parties, subdivisions, and foreign relations of the United States. It highlights the country's physical features, climate diversity, environmental issues, governmental structure, political parties, state subdivisions, and diplomatic relations. The text also mentions the historical context of the country's political system, including the development of political parties and the structure of the federal government.\n",
-      "\n",
-      "The text discusses the United States' international relations, military capabilities, law enforcement, crime rates, economy, and science and technology advancements. It highlights the country's membership in various international organizations, its military strength, economic dominance, income inequality, and technological innovations. The United States has strong diplomatic ties with several countries, a significant military presence globally, a large economy with high GDP, and is a leader in technological advancements and scientific research.\n",
-      "\n",
-      "The text discusses various aspects of the United States, including its scientific and innovation rankings, energy consumption, transportation infrastructure, demographics, language diversity, immigration, religion, urbanization, and healthcare. It highlights the country's achievements in scientific research, energy usage, transportation systems, population demographics, language diversity, immigration statistics, religious affiliations, urbanization trends, and healthcare facilities.\n",
-      "\n",
-      "The text discusses various aspects of life in the United States, including changes in life expectancy, the healthcare system, education, culture, society, literature, and mass media. It highlights the impact of the COVID-19 pandemic on life expectancy, the disparities in healthcare outcomes, the structure of the education system, the cultural diversity and values in American society, the development of American literature, and the media landscape in the country. The text also touches on issues such as healthcare coverage, education spending, student loan debt, and the protection of free speech in the U.S.\n",
-      "\n",
-      "The text discusses various aspects of American culture, including alternative newspapers in major cities, popular websites, the video game market, theater, visual arts, music, fashion, cinema, and cuisine. It highlights the influence of American culture globally, such as in music, fashion, cinema, and cuisine. The text also mentions significant figures and events in American cultural history, such as the Tony Awards, Broadway theater, the Hudson River School in visual arts, and the Golden Age of Hollywood in cinema. Additionally, it touches on the development of American cuisine, including traditional dishes and the impact of immigrant groups on American food culture.\n",
-      "\n",
-      "The American fast-food industry, known for pioneering the drive-through format in the 1940s, is considered a symbol of U.S. marketing dominance. Major American companies like McDonald's, Burger King, Pizza Hut, Kentucky Fried Chicken, and Domino's Pizza have a significant global presence with numerous outlets worldwide.\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(summary_with_detail_pt2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Note that this utility also allows passing additional instructions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 56,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:33:18.789246Z",
-     "start_time": "2024-04-10T05:22:57.789764Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 5/5 [10:19<00:00, 123.94s/it]"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "- The USA is a federation of 50 states, a federal capital district, and 326 Indian reservations.\n",
-      "- It has sovereignty over five major unincorporated island territories and various uninhabited islands.\n",
-      "- The country has a population exceeding 334 million.\n",
-      "- The USA has the world's third-largest land area and the largest maritime exclusive economic zone.\n",
-      "- The USA has had the largest nominal GDP in the world since 1890.\n",
-      "- In 2023, the USA accounted for over 25% of the global economy based on nominal GDP and 15% based on PPP.\n",
-      "- The USA has the highest median income per capita of any non-microstate.\n",
-      "- The USA ranks high in economic competitiveness, productivity, innovation, human rights, and higher education.\n",
-      "- The USA is a founding member of various international organizations such as the World Bank, IMF, NATO, and the UN Security Council.\n",
-      "\n",
-      "- The Great Society plan of President Lyndon Johnson's administration in the early 1960s resulted in groundbreaking laws and policies to counteract institutional racism.\n",
-      "- By 1985, the majority of women aged 16 and older in the U.S. were employed.\n",
-      "- In the 1990s, the U.S. saw the longest economic expansion in its history, with advances in technology such as the World Wide Web and the first gene therapy trial.\n",
-      "- The U.S. spent $877 billion on its military in 2022, the largest amount globally, making up 39% of global military spending and 3.5% of the country's GDP.\n",
-      "- The U.S. has the third-largest combined armed forces in the world, with about 800 bases and facilities abroad and deployments in 25 foreign countries.\n",
-      "- As of January 2023, the U.S. had the sixth highest per-capita incarceration rate globally, with almost 2 million people incarcerated.\n",
-      "- The U.S. had a nominal GDP of $27 trillion in 2023, the largest in the world, constituting over 25% of the global economy.\n",
-      "\n",
-      "- Real compounded annual GDP growth in the U.S. was 3.3%, compared to 2.3% for the rest of the Group of Seven.\n",
-      "- The U.S. ranks first in the world by disposable income per capita and nominal GDP, second by GDP (PPP) after China, and ninth by GDP (PPP) per capita.\n",
-      "- The U.S. has 136 of the world's 500 largest companies headquartered there.\n",
-      "- The U.S. dollar is the most used currency in international transactions and is the world's foremost reserve currency.\n",
-      "- The U.S. ranked second in the Global Competitiveness Report in 2019, after Singapore.\n",
-      "- The U.S. is the second-largest manufacturing country after China as of 2021.\n",
-      "- Americans have the highest average household and employee income among OECD member states.\n",
-      "- The U.S. has 735 billionaires and nearly 22 million millionaires as of 2023.\n",
-      "- In 2022, there were about 582,500 sheltered and unsheltered homeless persons in the U.S.\n",
-      "- The U.S. receives approximately 81% of its energy from fossil fuels.\n",
-      "- The U.S. has the highest vehicle ownership per capita in the world, with 910 vehicles per 1000 people.\n",
-      "- The U.S. has the third-highest number of patent applications and ranked 3rd in the Global Innovation Index in 2023.\n",
-      "- The U.S. has the third-highest number of published scientific papers in 2022.\n",
-      "- The U.S. has a diverse population with 37 ancestry groups having more than one million members.\n",
-      "- The U.S. has the largest Christian population in the world.\n",
-      "- The average American life expectancy at birth was 77.5 years in 2022.\n",
-      "- The U.S. spends more on education per student than any other country in the world.\n",
-      "\n",
-      "- The United States has the most Nobel Prize winners in history, with 411 awards won.\n",
-      "- American higher education is dominated by state university systems, with private universities enrolling about 20% of students.\n",
-      "- The U.S. spends more per student on higher education than the OECD average and all other nations in combined public and private spending.\n",
-      "- Student loan debt in the U.S. has increased by 102% in the last decade, exceeding 1.7 trillion dollars as of 2022.\n",
-      "- Americans donated 1.44% of total GDP to charity, the highest rate in the world.\n",
-      "- The U.S. has the world's largest music market with a total retail value of $15.9 billion in 2022.\n",
-      "- The United States restaurant industry was projected at $899 billion in sales for 2020, employing over 15 million people.\n",
-      "- The U.S. is home to over 220 Michelin Star rated restaurants, with 70 in New York City alone.\n",
-      "- California alone has 444 publishers, developers, and hardware companies in the video game market.\n",
-      "- The U.S. fast-food industry pioneered the drive-through format in the 1940s.\n",
-      "\n",
-      "- American companies mentioned: McDonald's, Burger King, Pizza Hut, Kentucky Fried Chicken, Domino's Pizza\n",
-      "- These companies have numerous outlets around the world\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "summary_with_additional_instructions = summarize(united_states_wikipedia_text, detail=0.1,\n",
-    "                                                 additional_instructions=\"Write in point form and focus on numerical data.\")\n",
-    "print(summary_with_additional_instructions)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "Finally, note that the utility allows for recursive summarization, where each summary is based on the previous summaries, adding more context to the summarization process. This can be enabled by setting the `summarize_recursively` parameter to True. This is more computationally expensive, but can increase consistency and coherence of the combined summary."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 57,
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 5/5 [00:09<00:00,  1.99s/it]"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The text provides an overview of the United States, including its geography, history, government structure, economic status, and global influence. It covers the country's origins, colonization, independence, expansion, Civil War, post-war era, rise as a superpower, and involvement in the Cold War. The U.S. is described as a presidential constitutional republic with a strong emphasis on individual rights, liberty, and limited government. The text also highlights the country's economic prowess, cultural influence, and membership in various international organizations.\n",
-      "\n",
-      "The text discusses the United States from the early 1960s to the present day, highlighting significant events such as President Lyndon Johnson's Great Society plan, the counterculture movement, societal changes, the end of the Cold War, the economic expansion of the 1990s, the September 11 attacks, the Great Recession, and political polarization. It also covers the country's geography, climate, biodiversity, conservation efforts, government structure, political parties, foreign relations, military strength, law enforcement, crime rates, and the economy, including its status as the world's largest economy.\n",
-      "\n",
-      "The text discusses the economic status of the United States, highlighting its GDP growth, ranking in various economic indicators, dominance in global trade, and technological advancements. It also covers income distribution, poverty rates, and social issues like homelessness and food insecurity. The text further delves into the country's energy consumption, transportation infrastructure, demographics, immigration trends, religious diversity, urbanization, healthcare system, life expectancy, and education system.\n",
-      "\n",
-      "The text discusses various aspects of American culture and society, including education, literature, mass media, theater, visual arts, music, fashion, cinema, and cuisine. It highlights the country's achievements in education, with a focus on higher education and federal financial aid for students. The text also delves into American cultural values, ethnic diversity, and the country's strong protections of free speech. Additionally, it covers the development of American literature, mass media landscape, theater scene, visual arts movements, music genres, fashion industry, cinema history, and culinary traditions. The influence of American culture globally, as well as the economic impact of industries like music and restaurants, is also discussed.\n",
-      "\n",
-      "American fast-food chains like McDonald's, Burger King, Pizza Hut, Kentucky Fried Chicken, and Domino's Pizza have a widespread global presence with numerous outlets worldwide.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "recursive_summary = summarize(united_states_wikipedia_text, detail=0.1, summarize_recursively=True,\n",
-    "                              additional_instructions=\"Don't overuse repetitive phrases to introduce each section\")\n",
-    "print(recursive_summary)"
-   ],
-   "metadata": {
-    "collapsed": false,
-    "ExecuteTime": {
-     "end_time": "2024-04-10T05:33:30.123036Z",
-     "start_time": "2024-04-10T05:33:18.791253Z"
-    }
-   }
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.9"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
--- a/examples/data/artificial_intelligence_wikipedia.txt
+++ b/examples/data/artificial_intelligence_wikipedia.txt
@ -0,0 +1,253 @@
+Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software which enable machines to perceive their environment and uses learning and intelligence to take actions that maximize their chances of achieving defined goals.[1] Such machines may be called AIs.
+AI technology is widely used throughout industry, government, and science. Some high-profile applications include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); interacting via human speech (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go).[2] However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[3][4]
+Alan Turing was the first person to conduct substantial research in the field that he called machine intelligence.[5] Artificial intelligence was founded as an academic discipline in 1956.[6] The field went through multiple cycles of optimism,[7][8] followed by periods of disappointment and loss of funding, known as AI winter.[9][10] Funding and interest vastly increased after 2012 when deep learning surpassed all previous AI techniques,[11] and after 2017 with the transformer architecture.[12] This led to the AI boom of the early 2020s, with companies, universities, and laboratories overwhelmingly based in the United States pioneering significant advances in artificial intelligence.[13]
+The growing use of artificial intelligence in the 21st century is influencing a societal and economic shift towards increased automation, data-driven decision-making, and the integration of AI systems into various economic sectors and areas of life, impacting job markets, healthcare, government, industry, and education. This raises questions about the long-term effects, ethical implications, and risks of AI, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.
+The various sub-fields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception, and support for robotics.[a] General intelligence—the ability to complete any task performable by a human on an at least equal level—is among the field's long-term goals.[14]
+To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[15]
+Goals
+The general problem of simulating (or creating) intelligence has been broken into sub-problems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.[a]
+Reasoning and problem solving
+Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions.[16] By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.[17]
+Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": they became exponentially slower as the problems grew larger.[18] Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.[19] Accurate and efficient reasoning is an unsolved problem.
+Knowledge representation
+
+An ontology represents knowledge as a set of concepts within a domain and the relationships between those concepts.
+Knowledge representation and knowledge engineering[20] allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval,[21] scene interpretation,[22] clinical decision support,[23] knowledge discovery (mining "interesting" and actionable inferences from large databases),[24] and other areas.[25]
+A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge.[26] Knowledge bases need to represent things such as: objects, properties, categories and relations between objects;[27] situations, events, states and time;[28] causes and effects;[29] knowledge about knowledge (what we know about what other people know);[30] default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing);[31] and many other aspects and domains of knowledge.
+Among the most difficult problems in knowledge representation are: the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous);[32] and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally).[19] There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.[c]
+Planning and decision making
+An "agent" is anything that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen.[d][35] In automated planning, the agent has a specific goal.[36] In automated decision making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.[37]
+In classical planning, the agent knows exactly what the effect of any action will be.[38] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[39]
+In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning) or the agent can seek information to improve its preferences.[40] Information value theory can be used to weigh the value of exploratory or experimental actions.[41] The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain what the outcome will be.
+A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way, and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned.[42]
+Game theory describes rational behavior of multiple interacting agents, and is used in AI programs that make decisions that involve other agents.[43]
+Learning
+Machine learning is the study of programs that can improve their performance on a given task automatically.[44] It has been a part of AI from the beginning.[e]
+There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance.[47] Supervised learning requires a human to label the input data first, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).[48]
+In reinforcement learning the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good".[49] Transfer learning is when the knowledge gained from one problem is applied to a new problem.[50] Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning.[51]
+Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization.[52]
+Natural language processing
+Natural language processing (NLP)[53] allows programs to read, write and communicate in human languages such as English. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering.[54]
+Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation[f] unless restricted to small domains called "micro-worlds" (due to the common sense knowledge problem[32]). Margaret Masterman believed that it was meaning, and not grammar that was the key to understanding languages, and that thesauri and not dictionaries should be the basis of computational language structure.
+Modern deep learning techniques for NLP include word embedding (representing words, typically as vectors encoding their meaning),[55] transformers (a deep learning architecture using an attention mechanism),[56] and others.[57] In 2019, generative pre-trained transformer (or "GPT") language models began to generate coherent text,[58][59] and by 2023 these models were able to get human-level scores on the bar exam, SAT test, GRE test, and many other real-world applications.[60]
+Perception
+Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.[61]
+The field includes speech recognition,[62] image classification,[63] facial recognition, object recognition,[64] and robotic perception.[65]
+Social intelligence
+
+Kismet, a robot head which was made in the 1990s; a machine that can recognize and simulate emotions.[66]
+Affective computing is an interdisciplinary umbrella that comprises systems that recognize, interpret, process or simulate human feeling, emotion and mood.[67] For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate human–computer interaction.
+However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents.[68] Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the affects displayed by a videotaped subject.[69]
+General intelligence
+A machine with artificial general intelligence should be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.[14]
+Techniques
+AI research uses a wide variety of techniques to accomplish the goals above.[b]
+Search and optimization
+AI can solve many problems by intelligently searching through many possible solutions.[70] There are two very different kinds of search used in AI: state space search and local search.
+State space search
+State space search searches through a tree of possible states to try to find a goal state.[71] For example, planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis.[72]
+Simple exhaustive searches[73] are rarely sufficient for most real-world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes.[18] "Heuristics" or "rules of thumb" can help to prioritize choices that are more likely to reach a goal.[74]
+Adversarial search is used for game-playing programs, such as chess or Go. It searches through a tree of possible moves and counter-moves, looking for a winning position.[75]
+Local search
+
+Illustration of gradient descent for 3 different starting points. Two parameters (represented by the plan coordinates) are adjusted in order to minimize the loss function (the height).
+Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.[76]
+Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks.[77]
+Another type of local search is evolutionary computation, which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, selecting only the fittest to survive each generation.[78]
+Distributed search processes can coordinate via swarm intelligence algorithms. Two popular swarm algorithms used in search are particle swarm optimization (inspired by bird flocking) and ant colony optimization (inspired by ant trails).[79]
+Logic
+Formal logic is used for reasoning and knowledge representation.[80] Formal logic comes in two main forms: propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies")[81] and predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").[82]
+Deductive reasoning in logic is the process of proving a new statement (conclusion) from other statements that are given and assumed to be true (the premises).[83] Proofs can be structured as proof trees, in which nodes are labelled by sentences, and children nodes are connected to parent nodes by inference rules.
+Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose leaf nodes are labelled by premises or axioms. In the case of Horn clauses, problem-solving search can be performed by reasoning forwards from the premises or backwards from the problem.[84] In the more general case of the clausal form of first-order logic, resolution is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.[85]
+Inference in both Horn clause logic and first-order logic is undecidable, and therefore intractable. However, backward reasoning with Horn clauses, which underpins computation in the logic programming language Prolog, is Turing complete. Moreover, its efficiency is competitive with computation in other symbolic programming languages.[86]
+Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.[87]
+Non-monotonic logics, including logic programming with negation as failure, are designed to handle default reasoning.[31] Other specialized versions of logic have been developed to describe many complex domains.
+Probabilistic methods for uncertain reasoning
+
+A simple Bayesian network, with the associated conditional probability tables
+Many problems in AI (including in reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics.[88] Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using decision theory, decision analysis,[89] and information value theory.[90] These tools include models such as Markov decision processes,[91] dynamic decision networks,[92] game theory and mechanism design.[93]
+Bayesian networks[94] are a tool that can be used for reasoning (using the Bayesian inference algorithm),[g][96] learning (using the expectation-maximization algorithm),[h][98] planning (using decision networks)[99] and perception (using dynamic Bayesian networks).[92]
+Probabilistic algorithms can also be used for filtering, prediction, smoothing and finding explanations for streams of data, helping perception systems to analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).[92]
+
+Expectation-maximization clustering of Old Faithful eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption.
+Classifiers and statistical learning methods
+The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. Classifiers[100] are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.[48]
+There are many kinds of classifiers in use. The decision tree is the simplest and most widely used symbolic machine learning algorithm.[101] K-nearest neighbor algorithm was the most widely used analogical AI until the mid-1990s, and Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.[102] The naive Bayes classifier is reportedly the "most widely used learner"[103] at Google, due in part to its scalability.[104] Neural networks are also used as classifiers.[105]
+Artificial neural networks
+
+A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.
+An artificial neural network is based on a collection of nodes also known as artificial neurons, which loosely model the neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.[105]
+Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[106] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[107]
+In feedforward neural networks the signal passes in only one direction.[108] Recurrent neural networks feed the output signal back into the input, which allows short-term memories of previous input events. Long short term memory is the most successful network architecture for recurrent networks.[109] Perceptrons[110] use only a single layer of neurons, deep learning[111] uses multiple layers. Convolutional neural networks strengthen the connection between neurons that are "close" to each other—this is especially important in image processing, where a local set of neurons must identify an "edge" before the network can identify an object.[112]
+Deep learning
+
+Deep learning[111] uses several layers of neurons between the network's inputs and outputs. The multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.[113]
+Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including computer vision, speech recognition, natural language processing, image classification[114] and others. The reason that deep learning performs so well in so many applications is not known as of 2023.[115] The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s)[i] but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.[j]
+GPT
+Generative pre-trained transformers (GPT) are large language models that are based on the semantic relationships between words in sentences (natural language processing). Text-based GPT models are pre-trained on a large corpus of text which can be from the internet. The pre-training consists in predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pre-training, GPT models accumulate knowledge about the world, and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are still prone to generating falsehoods called "hallucinations", although this can be reduced with RLHF and quality data. They are used in chatbots, which allow you to ask a question or request a task in simple text.[124][125]
+Current models and services include: Gemini (formerly Bard), ChatGPT, Grok, Claude, Copilot and LLaMA.[126] Multimodal GPT models can process different types of data (modalities) such as images, videos, sound and text.[127]
+Specialized hardware and software
+Main articles: Programming languages for artificial intelligence and Hardware for artificial intelligence
+In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software, had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training.[128] Historically, specialized languages, such as Lisp, Prolog, Python and others, had been used.
+Applications
+Main article: Applications of artificial intelligence
+AI and machine learning technology is used in most of the essential applications of the 2020s, including: search engines (such as Google Search), targeting online advertisements, recommendation systems (offered by Netflix, YouTube or Amazon), driving internet traffic, targeted advertising (AdSense, Facebook), virtual assistants (such as Siri or Alexa), autonomous vehicles (including drones, ADAS and self-driving cars), automatic language translation (Microsoft Translator, Google Translate), facial recognition (Apple's Face ID or Microsoft's DeepFace and Google's FaceNet) and image labeling (used by Facebook, Apple's iPhoto and TikTok).
+Health and medicine
+Main article: Artificial intelligence in healthcare
+The application of AI in medicine and medical research has the potential to increase patient care and quality of life.[129] Through the lens of the Hippocratic Oath, medical professionals are ethically compelled to use AI, if applications can more accurately diagnose and treat patients.
+For medical research, AI is an important tool for processing and integrating big data. This is particularly important for organoid and tissue engineering development which use microscopy imaging as a key technique in fabrication.[130] It has been suggested that AI can overcome discrepancies in funding allocated to different fields of research.[130] New AI tools can deepen our understanding of biomedically relevant pathways. For example, AlphaFold 2 (2021) demonstrated the ability to approximate, in hours rather than months, the 3D structure of a protein.[131] In 2023, it was reported that AI guided drug discovery helped find a class of antibiotics capable of killing two different types of drug-resistant bacteria.[132]
+Games
+Main article: Game artificial intelligence
+Game playing programs have been used since the 1950s to demonstrate and test AI's most advanced techniques.[133] Deep Blue became the first computer chess-playing system to beat a reigning world chess champion, Garry Kasparov, on 11 May 1997.[134] In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin.[135] In March 2016, AlphaGo won 4 out of 5 games of Go in a match with Go champion Lee Sedol, becoming the first computer Go-playing system to beat a professional Go player without handicaps. Then in 2017 it defeated Ke Jie, who was the best Go player in the world.[136] Other programs handle imperfect-information games, such as the poker-playing program Pluribus.[137] DeepMind developed increasingly generalistic reinforcement learning models, such as with MuZero, which could be trained to play chess, Go, or Atari games.[138] In 2019, DeepMind's AlphaStar achieved grandmaster level in StarCraft II, a particularly challenging real-time strategy game that involves incomplete knowledge of what happens on the map.[139] In 2021 an AI agent competed in a PlayStation Gran Turismo competition, winning against four of the world's best Gran Turismo drivers using deep reinforcement learning.[140]
+Military
+Main article: Military artificial intelligence
+Various countries are deploying AI military applications.[141] The main applications enhance command and control, communications, sensors, integration and interoperability.[142] Research is targeting intelligence collection and analysis, logistics, cyber operations, information operations, and semiautonomous and autonomous vehicles.[141] AI technologies enable coordination of sensors and effectors, threat detection and identification, marking of enemy positions, target acquisition, coordination and deconfliction of distributed Joint Fires between networked combat vehicles involving manned and unmanned teams.[142] AI was incorporated into military operations in Iraq and Syria.[141]
+In November 2023, US Vice President Kamala Harris disclosed a declaration signed by 31 nations to set guardrails for the military use of AI. The commitments include using legal reviews to ensure the compliance of military AI with international laws, and being cautious and transparent in the development of this technology.[143]
+Generative AI
+Main article: Generative artificial intelligence
+
+Vincent van Gogh in watercolour created by generative AI software
+In the early 2020s, generative AI gained widespread prominence. In March 2023, 58% of US adults had heard about ChatGPT and 14% had tried it.[144] The increasing realism and ease-of-use of AI-based text-to-image generators such as Midjourney, DALL-E, and Stable Diffusion sparked a trend of viral AI-generated photos. Widespread attention was gained by a fake photo of Pope Francis wearing a white puffer coat, the fictional arrest of Donald Trump, and a hoax of an attack on the Pentagon, as well as the usage in professional creative arts.[145][146]
+Industry-specific tasks
+There are also thousands of successful AI applications used to solve specific problems for specific industries or institutions. In a 2017 survey, one in five companies reported they had incorporated "AI" in some offerings or processes.[147] A few examples are energy storage, medical diagnosis, military logistics, applications that predict the result of judicial decisions, foreign policy, or supply chain management.
+In agriculture, AI has helped farmers identify areas that need irrigation, fertilization, pesticide treatments or increasing yield. Agronomists use AI to conduct research and development. AI has been used to predict the ripening time for crops such as tomatoes, monitor soil moisture, operate agricultural robots, conduct predictive analytics, classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and save water.
+Artificial intelligence is used in astronomy to analyze increasing amounts of available data and applications, mainly for "classification, regression, clustering, forecasting, generation, discovery, and the development of new scientific insights" for example for discovering exoplanets, forecasting solar activity, and distinguishing between signals and instrumental effects in gravitational wave astronomy. It could also be used for activities in space such as space exploration, including analysis of data from space missions, real-time science decisions of spacecraft, space debris avoidance, and more autonomous operation.
+Ethics
+Main article: Ethics of artificial intelligence
+AI has potential benefits and potential risks. AI may be able to advance science and find solutions for serious problems: Demis Hassabis of Deep Mind hopes to "solve intelligence, and then use that to solve everything else".[148] However, as the use of AI has become widespread, several unintended consequences and risks have been identified.[149] In-production systems can sometimes not factor ethics and bias into their AI training processes, especially when the AI algorithms are inherently unexplainable in deep learning.[150]
+Risks and harm
+Privacy and copyright
+Further information: Information privacy and Artificial intelligence and copyright
+Machine-learning algorithms require large amounts of data. The techniques used to acquire this data have raised concerns about privacy, surveillance and copyright.
+Technology companies collect a wide range of data from their users, including online activity, geolocation data, video and audio.[151] For example, in order to build speech recognition algorithms, Amazon has recorded millions of private conversations and allowed temporary workers to listen to and transcribe some of them.[152] Opinions about this widespread surveillance range from those who see it as a necessary evil to those for whom it is clearly unethical and a violation of the right to privacy.[153]
+AI developers argue that this is the only way to deliver valuable applications. and have developed several techniques that attempt to preserve privacy while still obtaining the data, such as data aggregation, de-identification and differential privacy.[154] Since 2016, some privacy experts, such as Cynthia Dwork, have begun to view privacy in terms of fairness. Brian Christian wrote that experts have pivoted "from the question of 'what they know' to the question of 'what they're doing with it'."[155]
+Generative AI is often trained on unlicensed copyrighted works, including in domains such as images or computer code; the output is then used under the rationale of "fair use". Website owners who do not wish to have their copyrighted content AI-indexed or 'scraped' can add code to their site if they do not want their website to be indexed by a search engine, which is currently available through certain services such as OpenAI. Experts disagree about how well and under what circumstances this rationale will hold up in courts of law; relevant factors may include "the purpose and character of the use of the copyrighted work" and "the effect upon the potential market for the copyrighted work".[156] In 2023, leading authors (including John Grisham and Jonathan Franzen) sued AI companies for using their work to train generative AI.[157][158]
+Misinformation
+See also: YouTube § Moderation and offensive content
+YouTube, Facebook and others use recommender systems to guide users to more content. These AI programs were given the goal of maximizing user engagement (that is, the only goal was to keep people watching). The AI learned that users tended to choose misinformation, conspiracy theories, and extreme partisan content, and, to keep them watching, the AI recommended more of it. Users also tended to watch more content on the same subject, so the AI led people into filter bubbles where they received multiple versions of the same misinformation.[159] This convinced many users that the misinformation was true, and ultimately undermined trust in institutions, the media and the government.[160] The AI program had correctly learned to maximize its goal, but the result was harmful to society. After the U.S. election in 2016, major technology companies took steps to mitigate the problem.
+In 2022, generative AI began to create images, audio, video and text that are indistinguishable from real photographs, recordings, films or human writing. It is possible for bad actors to use this technology to create massive amounts of misinformation or propaganda.[161] AI pioneer Geoffrey Hinton expressed concern about AI enabling "authoritarian leaders to manipulate their electorates" on a large scale, among other risks.[162]
+Algorithmic bias and fairness
+Main articles: Algorithmic bias and Fairness (machine learning)
+Machine learning applications will be biased if they learn from biased data.[163] The developers may not be aware that the bias exists.[164] Bias can be introduced by the way training data is selected and by the way a model is deployed.[165][163] If a biased algorithm is used to make decisions that can seriously harm people (as it can in medicine, finance, recruitment, housing or policing) then the algorithm may cause discrimination.[166] Fairness in machine learning is the study of how to prevent the harm caused by algorithmic bias. It has become serious area of academic study within AI. Researchers have discovered it is not always possible to define "fairness" in a way that satisfies all stakeholders.[167]
+On June 28, 2015, Google Photos's new image labeling feature mistakenly identified Jacky Alcine and a friend as "gorillas" because they were black. The system was trained on a dataset that contained very few images of black people,[168] a problem called "sample size disparity".[169] Google "fixed" this problem by preventing the system from labelling anything as a "gorilla". Eight years later, in 2023, Google Photos still could not identify a gorilla, and neither could similar products from Apple, Facebook, Microsoft and Amazon.[170]
+COMPAS is a commercial program widely used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. In 2016, Julia Angwin at ProPublica discovered that COMPAS exhibited racial bias, despite the fact that the program was not told the races of the defendants. Although the error rate for both whites and blacks was calibrated equal at exactly 61%, the errors for each race were different—the system consistently overestimated the chance that a black person would re-offend and would underestimate the chance that a white person would not re-offend.[171] In 2017, several researchers[k] showed that it was mathematically impossible for COMPAS to accommodate all possible measures of fairness when the base rates of re-offense were different for whites and blacks in the data.[173]
+A program can make biased decisions even if the data does not explicitly mention a problematic feature (such as "race" or "gender"). The feature will correlate with other features (like "address", "shopping history" or "first name"), and the program will make the same decisions based on these features as it would on "race" or "gender".[174] Moritz Hardt said "the most robust fact in this research area is that fairness through blindness doesn't work."[175]
+Criticism of COMPAS highlighted that machine learning models are designed to make "predictions" that are only valid if we assume that the future will resemble the past. If they are trained on data that includes the results of racist decisions in the past, machine learning models must predict that racist decisions will be made in the future. If an application then uses these predictions as recommendations, some of these "recommendations" will likely be racist.[176] Thus, machine learning is not well suited to help make decisions in areas where there is hope that the future will be better than the past. It is necessarily descriptive and not proscriptive.[l]
+Bias and unfairness may go undetected because the developers are overwhelmingly white and male: among AI engineers, about 4% are black and 20% are women.[169]
+At its 2022 Conference on Fairness, Accountability, and Transparency (ACM FAccT 2022), the Association for Computing Machinery, in Seoul, South Korea, presented and published findings that recommend that until AI and robotics systems are demonstrated to be free of bias mistakes, they are unsafe, and the use of self-learning neural networks trained on vast, unregulated sources of flawed internet data should be curtailed.[178]
+Lack of transparency
+See also: Explainable AI, Algorithmic transparency, and Right to explanation
+
+Lidar testing vehicle for autonomous driving
+Many AI systems are so complex that their designers cannot explain how they reach their decisions.[179] Particularly with deep neural networks, in which there are a large amount of non-linear relationships between inputs and outputs. But some popular explainability techniques exist.[180]
+It is impossible to be certain that a program is operating correctly if no one knows how exactly it works. There have been many cases where a machine learning program passed rigorous tests, but nevertheless learned something different than what the programmers intended. For example, a system that could identify skin diseases better than medical professionals was found to actually have a strong tendency to classify images with a ruler as "cancerous", because pictures of malignancies typically include a ruler to show the scale.[181] Another machine learning system designed to help effectively allocate medical resources was found to classify patients with asthma as being at "low risk" of dying from pneumonia. Having asthma is actually a severe risk factor, but since the patients having asthma would usually get much more medical care, they were relatively unlikely to die according to the training data. The correlation between asthma and low risk of dying from pneumonia was real, but misleading.[182]
+People who have been harmed by an algorithm's decision have a right to an explanation.[183] Doctors, for example, are expected to clearly and completely explain to their colleagues the reasoning behind any decision they make. Early drafts of the European Union's General Data Protection Regulation in 2016 included an explicit statement that this right exists.[m] Industry experts noted that this is an unsolved problem with no solution in sight. Regulators argued that nevertheless the harm is real: if the problem has no solution, the tools should not be used.[184]
+DARPA established the XAI ("Explainable Artificial Intelligence") program in 2014 to try and solve these problems.[185]
+There are several possible solutions to the transparency problem. SHAP tried to solve the transparency problems by visualising the contribution of each feature to the output.[186] LIME can locally approximate a model with a simpler, interpretable model.[187] Multitask learning provides a large number of outputs in addition to the target classification. These other outputs can help developers deduce what the network has learned.[188] Deconvolution, DeepDream and other generative methods can allow developers to see what different layers of a deep network have learned and produce output that can suggest what the network is learning.[189]
+Bad actors and weaponized AI
+Main articles: Lethal autonomous weapon, Artificial intelligence arms race, and AI safety
+Artificial intelligence provides a number of tools that are useful to bad actors, such as authoritarian governments, terrorists, criminals or rogue states.
+A lethal autonomous weapon is a machine that locates, selects and engages human targets without human supervision.[n] Widely available AI tools can be used by bad actors to develop inexpensive autonomous weapons and, if produced at scale, they are potentially weapons of mass destruction.[191] Even when used in conventional warfare, it is unlikely that they will be unable to reliably choose targets and could potentially kill an innocent person.[191] In 2014, 30 nations (including China) supported a ban on autonomous weapons under the United Nations' Convention on Certain Conventional Weapons, however the United States and others disagreed.[192] By 2015, over fifty countries were reported to be researching battlefield robots.[193]
+AI tools make it easier for authoritarian governments to efficiently control their citizens in several ways. Face and voice recognition allow widespread surveillance. Machine learning, operating this data, can classify potential enemies of the state and prevent them from hiding. Recommendation systems can precisely target propaganda and misinformation for maximum effect. Deepfakes and generative AI aid in producing misinformation. Advanced AI can make authoritarian centralized decision making more competitive than liberal and decentralized systems such as markets. It lowers the cost and difficulty of digital warfare and advanced spyware.[194] All these technologies have been available since 2020 or earlier -- AI facial recognition systems are already being used for mass surveillance in China.[195][196]
+There many other ways that AI is expected to help bad actors, some of which can not be foreseen. For example, machine-learning AI is able to design tens of thousands of toxic molecules in a matter of hours.[197]
+Reliance on industry giants
+Training AI systems requires an enormous amount of computing power. Usually only Big Tech companies have the financial resources to make such investments. Smaller startups such as Cohere and OpenAI end up buying access to data centers from Google and Microsoft respectively.[198]
+Technological unemployment
+Main articles: Workplace impact of artificial intelligence and Technological unemployment
+Economists have frequently highlighted the risks of redundancies from AI, and speculated about unemployment if there is no adequate social policy for full employment.[199]
+In the past, technology has tended to increase rather than reduce total employment, but economists acknowledge that "we're in uncharted territory" with AI.[200] A survey of economists showed disagreement about whether the increasing use of robots and AI will cause a substantial increase in long-term unemployment, but they generally agree that it could be a net benefit if productivity gains are redistributed.[201] Risk estimates vary; for example, in the 2010s, Michael Osborne and Carl Benedikt Frey estimated 47% of U.S. jobs are at "high risk" of potential automation, while an OECD report classified only 9% of U.S. jobs as "high risk".[o][203] The methodology of speculating about future employment levels has been criticised as lacking evidential foundation, and for implying that technology, rather than social policy, creates unemployment, as opposed to redundancies.[199] In April 2023, it was reported that 70% of the jobs for Chinese video game illustrators had been eliminated by generative artificial intelligence.[204][205]
+Unlike previous waves of automation, many middle-class jobs may be eliminated by artificial intelligence; The Economist stated in 2015 that "the worry that AI could do to white-collar jobs what steam power did to blue-collar ones during the Industrial Revolution" is "worth taking seriously".[206] Jobs at extreme risk range from paralegals to fast food cooks, while job demand is likely to increase for care-related professions ranging from personal healthcare to the clergy.[207]
+From the early days of the development of artificial intelligence, there have been arguments, for example, those put forward by Joseph Weizenbaum, about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculation and qualitative, value-based judgement.[208]
+Existential risk
+Main article: Existential risk from artificial general intelligence
+It has been argued AI will become so powerful that humanity may irreversibly lose control of it. This could, as physicist Stephen Hawking stated, "spell the end of the human race".[209] This scenario has been common in science fiction, when a computer or robot suddenly develops a human-like "self-awareness" (or "sentience" or "consciousness") and becomes a malevolent character.[p] These sci-fi scenarios are misleading in several ways.
+First, AI does not require human-like "sentience" to be an existential risk. Modern AI programs are given specific goals and use learning and intelligence to achieve them. Philosopher Nick Bostrom argued that if one gives almost any goal to a sufficiently powerful AI, it may choose to destroy humanity to achieve it (he used the example of a paperclip factory manager).[211] Stuart Russell gives the example of household robot that tries to find a way to kill its owner to prevent it from being unplugged, reasoning that "you can't fetch the coffee if you're dead."[212] In order to be safe for humanity, a superintelligence would have to be genuinely aligned with humanity's morality and values so that it is "fundamentally on our side".[213]
+Second, Yuval Noah Harari argues that AI does not require a robot body or physical control to pose an existential risk. The essential parts of civilization are not physical. Things like ideologies, law, government, money and the economy are made of language; they exist because there are stories that billions of people believe. The current prevalence of misinformation suggests that an AI could use language to convince people to believe anything, even to take actions that are destructive.[214]
+The opinions amongst experts and industry insiders are mixed, with sizable fractions both concerned and unconcerned by risk from eventual superintelligent AI.[215] Personalities such as Stephen Hawking, Bill Gates, and Elon Musk have expressed concern about existential risk from AI.[216] AI pioneers including Fei-Fei Li, Geoffrey Hinton, Yoshua Bengio, Cynthia Breazeal, Rana el Kaliouby, Demis Hassabis, Joy Buolamwini, and Sam Altman have expressed concerns about the risks of AI. In 2023, many leading AI experts issued the joint statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".[217]
+Other researchers, however, spoke in favor of a less dystopian view. AI pioneer Juergen Schmidhuber did not sign the joint statement, emphasising that in 95% of all cases, AI research is about making "human lives longer and healthier and easier."[218] While the tools that are now being used to improve lives can also be used by bad actors, "they can also be used against the bad actors."[219][220] Andrew Ng also argued that "it's a mistake to fall for the doomsday hype on AI—and that regulators who do will only benefit vested interests."[221] Yann LeCun "scoffs at his peers' dystopian scenarios of supercharged misinformation and even, eventually, human extinction."[222] In the early 2010s, experts argued that the risks are too distant in the future to warrant research or that humans will be valuable from the perspective of a superintelligent machine.[223] However, after 2016, the study of current and future risks and possible solutions became a serious area of research.[224]
+Ethical machines and alignment
+Main articles: Machine ethics, AI safety, Friendly artificial intelligence, Artificial moral agents, and Human Compatible
+Friendly AI are machines that have been designed from the beginning to minimize risks and to make choices that benefit humans. Eliezer Yudkowsky, who coined the term, argues that developing friendly AI should be a higher research priority: it may require a large investment and it must be completed before AI becomes an existential risk.[225]
+Machines with intelligence have the potential to use their intelligence to make ethical decisions. The field of machine ethics provides machines with ethical principles and procedures for resolving ethical dilemmas.[226] The field of machine ethics is also called computational morality,[226] and was founded at an AAAI symposium in 2005.[227]
+Other approaches include Wendell Wallach's "artificial moral agents"[228] and Stuart J. Russell's three principles for developing provably beneficial machines.[229]
+Frameworks
+Artificial Intelligence projects can have their ethical permissibility tested while designing, developing, and implementing an AI system. An AI framework such as the Care and Act Framework containing the SUM values—developed by the Alan Turing Institute tests projects in four main areas:[230][231]
+RESPECT the dignity of individual people
+CONNECT with other people sincerely, openly and inclusively
+CARE for the wellbeing of everyone
+PROTECT social values, justice and the public interest
+Other developments in ethical frameworks include those decided upon during the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, among others;[232] however, these principles do not go without their criticisms, especially regards to the people chosen contributes to these frameworks.[233]
+Promotion of the wellbeing of the people and communities that these technologies affect requires consideration of the social and ethical implications at all stages of AI system design, development and implementation, and collaboration between job roles such as data scientists, product managers, data engineers, domain experts, and delivery managers.[234]
+Regulation
+Main articles: Regulation of artificial intelligence, Regulation of algorithms, and AI safety
+
+The first global AI Safety Summit was held in 2023 with a declaration calling for international co-operation.
+The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI); it is therefore related to the broader regulation of algorithms.[235] The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally.[236] According to AI Index at Stanford, the annual number of AI-related laws passed in the 127 survey countries jumped from one passed in 2016 to 37 passed in 2022 alone.[237][238] Between 2016 and 2020, more than 30 countries adopted dedicated strategies for AI.[239] Most EU member states had released national AI strategies, as had Canada, China, India, Japan, Mauritius, the Russian Federation, Saudi Arabia, United Arab Emirates, US and Vietnam. Others were in the process of elaborating their own AI strategy, including Bangladesh, Malaysia and Tunisia.[239] The Global Partnership on Artificial Intelligence was launched in June 2020, stating a need for AI to be developed in accordance with human rights and democratic values, to ensure public confidence and trust in the technology.[239] Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher published a joint statement in November 2021 calling for a government commission to regulate AI.[240] In 2023, OpenAI leaders published recommendations for the governance of superintelligence, which they believe may happen in less than 10 years.[241] In 2023, the United Nations also launched an advisory body to provide recommendations on AI governance; the body comprises technology company executives, governments officials and academics.[242]
+In a 2022 Ipsos survey, attitudes towards AI varied greatly by country; 78% of Chinese citizens, but only 35% of Americans, agreed that "products and services using AI have more benefits than drawbacks".[237] A 2023 Reuters/Ipsos poll found that 61% of Americans agree, and 22% disagree, that AI poses risks to humanity.[243] In a 2023 Fox News poll, 35% of Americans thought it "very important", and an additional 41% thought it "somewhat important", for the federal government to regulate AI, versus 13% responding "not very important" and 8% responding "not at all important".[244][245]
+In November 2023, the first global AI Safety Summit was held in Bletchley Park in the UK to discuss the near and far term risks of AI and the possibility of mandatory and voluntary regulatory frameworks.[246] 28 countries including the United States, China, and the European Union issued a declaration at the start of the summit, calling for international co-operation to manage the challenges and risks of artificial intelligence.[247][248]
+History
+Main article: History of artificial intelligence
+For a chronological guide, see Timeline of artificial intelligence.
+The study of mechanical or "formal" reasoning began with philosophers and mathematicians in antiquity. The study of logic led directly to Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as "0" and "1", could simulate any conceivable form of mathematical reasoning.[249][5] This, along with concurrent discoveries in cybernetics, information theory and neurobiology, led researchers to consider the possibility of building an "electronic brain".[q] They developed several areas of research that would become part of AI,[251] such as McCullouch and Pitts design for "artificial neurons" in 1943,[252] and Turing's influential 1950 paper 'Computing Machinery and Intelligence', which introduced the Turing test and showed that "machine intelligence" was plausible.[253][5]
+The field of AI research was founded at a workshop at Dartmouth College in 1956.[r][6] The attendees became the leaders of AI research in the 1960s.[s] They and their students produced programs that the press described as "astonishing":[t] computers were learning checkers strategies, solving word problems in algebra, proving logical theorems and speaking English.[u][7] Artificial intelligence laboratories were set up at a number of British and U.S. Universities in the latter 1950s and early 1960s.[5]
+Researchers in the 1960s and the 1970s were convinced that their methods would eventually succeed in creating a machine with general intelligence and considered this the goal of their field.[257] Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".[258] Marvin Minsky agreed, writing, "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved".[259] They had, however, underestimated the difficulty of the problem.[v] In 1974, both the U.S. and British governments cut off exploratory research in response to the criticism of Sir James Lighthill[261] and ongoing pressure from the U.S. Congress to fund more productive projects.[262] Minsky's and Papert's book Perceptrons was understood as proving that artificial neural networks would never be useful for solving real-world tasks, thus discrediting the approach altogether.[263] The "AI winter", a period when obtaining funding for AI projects was difficult, followed.[9]
+In the early 1980s, AI research was revived by the commercial success of expert systems,[264] a form of AI program that simulated the knowledge and analytical skills of human experts. By 1985, the market for AI had reached over a billion dollars. At the same time, Japan's fifth generation computer project inspired the U.S. and British governments to restore funding for academic research.[8] However, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.[10]
+Up to this point, most of AI's funding had gone to projects which used high level symbols to represent mental objects like plans, goals, beliefs and known facts. In the 1980s, some researchers began to doubt that this approach would be able to imitate all the processes of human cognition, especially perception, robotics, learning and pattern recognition,[265] and began to look into "sub-symbolic" approaches.[266] Rodney Brooks rejected "representation" in general and focussed directly on engineering machines that move and survive.[w] Judea Pearl, Lofti Zadeh and others developed methods that handled incomplete and uncertain information by making reasonable guesses rather than precise logic.[88][271] But the most important development was the revival of "connectionism", including neural network research, by Geoffrey Hinton and others.[272] In 1990, Yann LeCun successfully showed that convolutional neural networks can recognize handwritten digits, the first of many successful applications of neural networks.[273]
+AI gradually restored its reputation in the late 1990s and early 21st century by exploiting formal mathematical methods and by finding specific solutions to specific problems. This "narrow" and "formal" focus allowed researchers to produce verifiable results and collaborate with other fields (such as statistics, economics and mathematics).[274] By 2000, solutions developed by AI researchers were being widely used, although in the 1990s they were rarely described as "artificial intelligence".[275] However, several academic researchers became concerned that AI was no longer pursuing its original goal of creating versatile, fully intelligent machines. Beginning around 2002, they founded the subfield of artificial general intelligence (or "AGI"), which had several well-funded institutions by the 2010s.[14]
+Deep learning began to dominate industry benchmarks in 2012 and was adopted throughout the field.[11] For many specific tasks, other methods were abandoned.[x] Deep learning's success was based on both hardware improvements (faster computers,[277] graphics processing units, cloud computing[278]) and access to large amounts of data[279] (including curated datasets,[278] such as ImageNet). Deep learning's success led to an enormous increase in interest and funding in AI.[y] The amount of machine learning research (measured by total publications) increased by 50% in the years 2015–2019.[239]
+In 2016, issues of fairness and the misuse of technology were catapulted into center stage at machine learning conferences, publications vastly increased, funding became available, and many researchers re-focussed their careers on these issues. The alignment problem became a serious field of academic study.[224]
+In the late teens and early 2020s, AGI companies began to deliver programs that created enormous interest. In 2015, AlphaGo, developed by DeepMind, beat the world champion Go player. The program was taught only the rules of the game and developed strategy by itself. GPT-3 is a large language model that was released in 2020 by OpenAI and is capable of generating high-quality human-like text.[280] These programs, and others, inspired an aggressive AI boom, where large companies began investing billions in AI research. According to 'AI Impacts', about $50 billion annually was invested in "AI" around 2022 in the U.S. alone and about 20% of new US Computer Science PhD graduates have specialized in "AI".[281] About 800,000 "AI"-related US job openings existed in 2022.[282]
+Philosophy
+Main article: Philosophy of artificial intelligence
+Defining artificial intelligence
+Main articles: Turing test, Intelligent agent, Dartmouth workshop, and Synthetic intelligence
+Alan Turing wrote in 1950 "I propose to consider the question 'can machines think'?"[283] He advised changing the question from whether a machine "thinks", to "whether or not it is possible for machinery to show intelligent behaviour".[283] He devised the Turing test, which measures the ability of a machine to simulate human conversation.[253] Since we can only observe the behavior of the machine, it does not matter if it is "actually" thinking or literally has a "mind". Turing notes that we can not determine these things about other people but "it is usual to have a polite convention that everyone thinks"[284]
+Russell and Norvig agree with Turing that intelligence must be defined in terms of external behavior, not internal structure.[1] However, they are critical that the test requires the machine to imitate humans. "Aeronautical engineering texts," they wrote, "do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool other pigeons.'"[285] AI founder John McCarthy agreed, writing that "Artificial intelligence is not, by definition, simulation of human intelligence".[286]
+McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world."[287] Another AI founder, Marvin Minsky similarly describes it as "the ability to solve hard problems".[288] The leading AI textbook defines it as the study of agents that perceive their environment and take actions that maximize their chances of achieving defined goals.[289] These definitions view intelligence in terms of well-defined problems with well-defined solutions, where both the difficulty of the problem and the performance of the program are direct measures of the "intelligence" of the machine—and no other philosophical discussion is required, or may not even be possible.
+Another definition has been adopted by Google,[290] a major practitioner in the field of AI. This definition stipulates the ability of systems to synthesize information as the manifestation of intelligence, similar to the way it is defined in biological intelligence.
+Evaluating approaches to AI
+No established unifying theory or paradigm has guided AI research for most of its history.[z] The unprecedented success of statistical machine learning in the 2010s eclipsed all other approaches (so much so that some sources, especially in the business world, use the term "artificial intelligence" to mean "machine learning with neural networks"). This approach is mostly sub-symbolic, soft and narrow (see below). Critics argue that these questions may have to be revisited by future generations of AI researchers.
+Symbolic AI and its limits
+Symbolic AI (or "GOFAI")[292] simulated the high-level conscious reasoning that people use when they solve puzzles, express legal reasoning and do mathematics. They were highly successful at "intelligent" tasks such as algebra or IQ tests. In the 1960s, Newell and Simon proposed the physical symbol systems hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."[293]
+However, the symbolic approach failed on many tasks that humans solve easily, such as learning, recognizing an object or commonsense reasoning. Moravec's paradox is the discovery that high-level "intelligent" tasks were easy for AI, but low level "instinctive" tasks were extremely difficult.[294] Philosopher Hubert Dreyfus had argued since the 1960s that human expertise depends on unconscious instinct rather than conscious symbol manipulation, and on having a "feel" for the situation, rather than explicit symbolic knowledge.[295] Although his arguments had been ridiculed and ignored when they were first presented, eventually, AI research came to agree with him.[aa][19]
+The issue is not resolved: sub-symbolic reasoning can make many of the same inscrutable mistakes that human intuition does, such as algorithmic bias. Critics such as Noam Chomsky argue continuing research into symbolic AI will still be necessary to attain general intelligence,[297][298] in part because sub-symbolic AI is a move away from explainable AI: it can be difficult or impossible to understand why a modern statistical AI program made a particular decision. The emerging field of neuro-symbolic artificial intelligence attempts to bridge the two approaches.
+Neat vs. scruffy
+Main article: Neats and scruffies
+"Neats" hope that intelligent behavior is described using simple, elegant principles (such as logic, optimization, or neural networks). "Scruffies" expect that it necessarily requires solving a large number of unrelated problems. Neats defend their programs with theoretical rigor, scruffies rely mainly on incremental testing to see if they work. This issue was actively discussed in the 1970s and 1980s,[299] but eventually was seen as irrelevant. Modern AI has elements of both.
+Soft vs. hard computing
+Main article: Soft computing
+Finding a provably correct or optimal solution is intractable for many important problems.[18] Soft computing is a set of techniques, including genetic algorithms, fuzzy logic and neural networks, that are tolerant of imprecision, uncertainty, partial truth and approximation. Soft computing was introduced in the late 1980s and most successful AI programs in the 21st century are examples of soft computing with neural networks.
+Narrow vs. general AI
+Main articles: Weak artificial intelligence and Artificial general intelligence
+AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence directly or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals.[300][301] General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focusing on specific problems with specific solutions. The experimental sub-field of artificial general intelligence studies this area exclusively.
+Machine consciousness, sentience and mind
+Main articles: Philosophy of artificial intelligence and Artificial consciousness
+The philosophy of mind does not know whether a machine can have a mind, consciousness and mental states, in the same sense that human beings do. This issue considers the internal experiences of the machine, rather than its external behavior. Mainstream AI research considers this issue irrelevant because it does not affect the goals of the field: to build machines that can solve problems using intelligence. Russell and Norvig add that "[t]he additional project of making a machine conscious in exactly the way humans are is not one that we are equipped to take on."[302] However, the question has become central to the philosophy of mind. It is also typically the central question at issue in artificial intelligence in fiction.
+Consciousness
+Main articles: Hard problem of consciousness and Theory of mind
+David Chalmers identified two problems in understanding the mind, which he named the "hard" and "easy" problems of consciousness.[303] The easy problem is understanding how the brain processes signals, makes plans and controls behavior. The hard problem is explaining how this feels or why it should feel like anything at all, assuming we are right in thinking that it truly does feel like something (Dennett's consciousness illusionism says this is an illusion). Human information processing is easy to explain, however, human subjective experience is difficult to explain. For example, it is easy to imagine a color-blind person who has learned to identify which objects in their field of view are red, but it is not clear what would be required for the person to know what red looks like.[304]
+Computationalism and functionalism
+Main articles: Computational theory of mind, Functionalism (philosophy of mind), and Chinese room
+Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing. Computationalism argues that the relationship between mind and body is similar or identical to the relationship between software and hardware and thus may be a solution to the mind–body problem. This philosophical position was inspired by the work of AI researchers and cognitive scientists in the 1960s and was originally proposed by philosophers Jerry Fodor and Hilary Putnam.[305]
+Philosopher John Searle characterized this position as "strong AI": "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."[ab] Searle counters this assertion with his Chinese room argument, which attempts to show that, even if a machine perfectly simulates human behavior, there is still no reason to suppose it also has a mind.[309]
+AI welfare and rights
+It is difficult or impossible to reliably evaluate whether an advanced AI is sentient (has the ability to feel), and if so, to what degree.[310] But if there is a significant chance that a given machine can feel and suffer, then it may be entitled to certain rights or welfare protection measures, similarly to animals.[311][312] Sapience (a set of capacities related to high intelligence, such as discernment or self-awareness) may provide another moral basis for AI rights.[311] Robot rights are also sometimes proposed as a practical way to integrate autonomous agents into society.[313]
+In 2017, the European Union considered granting "electronic personhood" to some of the most capable AI systems. Similarly to the legal status of companies, it would have conferred rights but also responsibilities.[314] Critics argued in 2018 that granting rights to AI systems would downplay the importance of human rights, and that legislation should focus on user needs rather than speculative futuristic scenarios. They also noted that robots lacked the autonomy to take part to society on their own.[315][316]
+Progress in AI increased interest in the topic. Proponents of AI welfare and rights often argue that AI sentience, if it emerges, would be particularly easy to deny. They warn that this may be a moral blind spot analogous to slavery or factory farming, which could lead to large-scale suffering if sentient AI is created and carelessly exploited.[312][311]
+Future
+Superintelligence and the singularity
+A superintelligence is a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind.[301]
+If research into artificial general intelligence produced sufficiently intelligent software, it might be able to reprogram and improve itself. The improved software would be even better at improving itself, leading to what I. J. Good called an "intelligence explosion" and Vernor Vinge called a "singularity".[317]
+However, technologies cannot improve exponentially indefinitely, and typically follow an S-shaped curve, slowing when they reach the physical limits of what the technology can do.[318]
+Transhumanism
+Robot designer Hans Moravec, cyberneticist Kevin Warwick, and inventor Ray Kurzweil have predicted that humans and machines will merge in the future into cyborgs that are more capable and powerful than either. This idea, called transhumanism, has roots in Aldous Huxley and Robert Ettinger.[319]
+Edward Fredkin argues that "artificial intelligence is the next stage in evolution", an idea first proposed by Samuel Butler's "Darwin among the Machines" as far back as 1863, and expanded upon by George Dyson in his book of the same name in 1998.[320]
+In fiction
+Main article: Artificial intelligence in fiction
+
+The word "robot" itself was coined by Karel Čapek in his 1921 play R.U.R., the title standing for "Rossum's Universal Robots".
+Thought-capable artificial beings have appeared as storytelling devices since antiquity,[321] and have been a persistent theme in science fiction.[322]
+A common trope in these works began with Mary Shelley's Frankenstein, where a human creation becomes a threat to its masters. This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: A Space Odyssey (both 1968), with HAL 9000, the murderous computer in charge of the Discovery One spaceship, as well as The Terminator (1984) and The Matrix (1999). In contrast, the rare loyal robots such as Gort from The Day the Earth Stood Still (1951) and Bishop from Aliens (1986) are less prominent in popular culture.[323]
+Isaac Asimov introduced the Three Laws of Robotics in many books and stories, most notably the "Multivac" series about a super-intelligent computer of the same name. Asimov's laws are often brought up during lay discussions of machine ethics;[324] while almost all artificial intelligence researchers are familiar with Asimov's laws through popular culture, they generally consider the laws useless for many reasons, one of which is their ambiguity.[325]
+Several works use AI to force us to confront the fundamental question of what makes us human, showing us artificial beings that have the ability to feel, and thus to suffer. This appears in Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, as well as the novel Do Androids Dream of Electric Sheep?, by Philip K. Dick. Dick considers the idea that our understanding of human subjectivity is altered by technology created with artificial intelligence.
+
--- a/examples/data/united_states_wikipedia.txt
+++ b/examples/data/united_states_wikipedia.txt
@ -1,377 +0,0 @@
-The United States of America (USA or U.S.A.), commonly known as the United States (US or U.S.) or America, is a country primarily located in North America, between Canada and Mexico. It is a federation of 50 states, a federal capital district (Washington, D.C.), and 326 Indian reservations. Outside the union of states, it asserts sovereignty over five major unincorporated island territories and various uninhabited islands.[j] The country has the world's third-largest land area,[d] largest maritime exclusive economic zone, and the third-largest population, exceeding 334 million.[k]
-
-Paleo-Indians migrated across the Bering land bridge more than 12,000 years ago. British colonization led to the first settlement of the Thirteen Colonies in Virginia in 1607. Clashes with the British Crown over taxation and political representation sparked the American Revolution, with the Second Continental Congress formally declaring independence on July 4, 1776. Following its victory in the Revolutionary War (1775–1783), the country continued to expand across North America. As more states were admitted, sectional division over slavery led to the secession of the Confederate States of America, which fought the remaining states of the Union during the 1861–1865 American Civil War. With the Union's victory and preservation, slavery was abolished nationally. By 1900, the United States had become the world's largest economy and established itself as a great power. After Japan's attack on Pearl Harbor in December 1941, the U.S. entered World War II. The aftermath of the war left the U.S. and the Soviet Union as the world's two superpowers and led to the Cold War, during which both countries engaged in a struggle for ideological dominance and international influence. Following the Soviet Union's collapse and the end of the Cold War in 1991, the U.S. emerged as the world's sole superpower.
-
-The U.S. national government is a presidential constitutional republic and liberal democracy with three separate branches: legislative, executive, and judicial. It has a bicameral national legislature composed of the House of Representatives, a lower house based on population; and the Senate, an upper house based on equal representation for each state. Substantial autonomy is given to states and several territories, with a political culture that emphasizes liberty, equality under the law, individualism, and limited government.
-
-One of the most developed countries, the United States has had the largest nominal GDP in the world since 1890 and accounted for over 25% of the global economy (15% based on PPP) in 2023. It possesses by far the largest amount of wealth of any country and the highest median income per capita of any non-microstate. The U.S. ranks among the world's highest in economic competitiveness, productivity, innovation, human rights, and higher education. Its hard power and cultural influence have a global reach. The U.S. is a founding member of the World Bank, IMF, Organization of American States, NATO, Quad, World Health Organization, and a permanent member of the UN Security Council.
-
-Etymology
-Further information: Names of the United States and Demonyms for the United States
-The first documentary evidence of the phrase "United States of America" dates back to a letter from January 2, 1776, written by Stephen Moylan, a Continental Army aide to General George Washington, to Joseph Reed, Washington's aide-de-camp. Moylan expressed his desire to go "with full and ample powers from the United States of America to Spain" to seek assistance in the Revolutionary War effort.[20][21] The first known publication of the phrase "United States of America" was in an anonymous essay in The Virginia Gazette newspaper in Williamsburg, on April 6, 1776.[22]
-
-By June 1776, the name "United States of America" appeared in drafts of the Articles of Confederation and Perpetual Union, authored by John Dickinson, a Founding Father from the Province of Pennsylvania,[23][24] and in the Declaration of Independence, written primarily by Thomas Jefferson and adopted by the Second Continental Congress in Philadelphia, on July 4, 1776.[23][25]
-
-History
-Main article: History of the United States
-For a topical guide, see Outline of United States history.
-Indigenous peoples
-Further information: Native Americans in the United States and Pre-Columbian era
-
-Cliff Palace, built by Ancestral Puebloans in present-day Montezuma County, Colorado, between c. 1200 and 1275[26]
-The first inhabitants of North America migrated from Siberia across the Bering land bridge at least 12,000 years ago;[27][28] the Clovis culture, which appeared around 11,000 BC, is believed to be the first widespread culture in the Americas.[29][30] Over time, indigenous North American cultures grew increasingly sophisticated, and some, such as the Mississippian culture, developed agriculture, architecture, and complex societies.[31] Indigenous peoples and cultures such as the Algonquian peoples,[32] Ancestral Puebloans,[33] and the Iroquois developed across the present-day United States.[34] Native population estimates of what is now the United States before the arrival of European immigrants range from around 500,000[35][36] to nearly 10 million.[36][37]
-
-European colonization
-Main article: Colonial history of the United States
-See also: European colonization of the Americas
-
-The 1750 colonial possessions of Britain (in pink and purple), France (in blue), and Spain (in orange) in present-day Canada and the United States
-Christopher Columbus began exploring the Caribbean in 1492, leading to Spanish settlements in present-day Puerto Rico, Florida, and New Mexico.[38][39][40] France established its own settlements along the Mississippi River and Gulf of Mexico.[41] British colonization of the East Coast began with the Virginia Colony (1607) and Plymouth Colony (1620).[42][43] The Mayflower Compact and the Fundamental Orders of Connecticut established precedents for representative self-governance and constitutionalism that would develop throughout the American colonies.[44][45] While European settlers in what is now the United States experienced conflicts with Native Americans, they also engaged in trade, exchanging European tools for food and animal pelts.[46][l] Relations ranged from close cooperation to warfare and massacres. The colonial authorities often pursued policies that forced Native Americans to adopt European lifestyles, including conversion to Christianity.[50][51] Along the eastern seaboard, settlers trafficked African slaves through the Atlantic slave trade.[52]
-
-The original Thirteen Colonies[m] that would later found the United States were administered by Great Britain,[53] and had local governments with elections open to most white male property owners.[54][55] The colonial population grew rapidly, eclipsing Native American populations;[56] by the 1770s, the natural increase of the population was such that only a small minority of Americans had been born overseas.[57] The colonies' distance from Britain allowed for the development of self-governance,[58] and the First Great Awakening—a series of Christian revivals—fueled colonial interest in religious liberty.[59]
-
-Revolution and expansion (1776–1861)
-Further information: History of the United States (1776–1789), History of the United States (1789–1815), and History of the United States (1815–1849)
-See caption
-Declaration of Independence, a portrait by John Trumbull depicting the Committee of Five presenting the draft of the Declaration to the Continental Congress on June 28, 1776, in Philadelphia
-After winning the French and Indian War, Britain began to assert greater control over local colonial affairs, creating colonial political resistance; one of the primary colonial grievances was a denial of their rights as Englishmen, particularly the right to representation in the British government that taxed them. In 1774, the First Continental Congress met in Philadelphia, and passed a colonial boycott of British goods that proved effective. The British attempt to then disarm the colonists resulted in the 1775 Battles of Lexington and Concord, igniting the American Revolutionary War. At the Second Continental Congress, the colonies appointed George Washington commander-in-chief of the Continental Army and created a committee led by Thomas Jefferson to write the Declaration of Independence, adopted on July 4, 1776.[60] The political values of the American Revolution included liberty, inalienable individual rights; and the sovereignty of the people;[61] supporting republicanism and rejecting monarchy, aristocracy, and hereditary political power; virtue and faithfulness in the performance of civic duties; and vilification of corruption.[62] The Founding Fathers of the United States, which included George Washington, Benjamin Franklin, Alexander Hamilton, Thomas Jefferson, John Jay, James Madison, Thomas Paine, and John Adams, took inspiration from Ancient Greco-Roman, Renaissance, and Age of Enlightenment philosophies and ideas.[63][64]
-
-After the British surrender at the siege of Yorktown in 1781, American sovereignty was internationally recognized by the Treaty of Paris (1783), through which the U.S. gained territory stretching west to the Mississippi River, north to present-day Canada, and south to Spanish Florida.[65] Ratified in 1781, the Articles of Confederation established a decentralized government that operated until 1789.[60] The Northwest Ordinance (1787) established the precedent by which the country's territory would expand with the admission of new states, rather than the expansion of existing states.[66] The U.S. Constitution was drafted at the 1787 Constitutional Convention to overcome the limitations of the Articles; it went into effect in 1789, creating a federation administered by three branches on the principle of checks and balances.[67] Washington was elected the country's first president under the Constitution, and the Bill of Rights was adopted in 1791 to allay concerns by skeptics of the more centralized government;[68][69] his resignations first as commander-in-chief after the Revolution and later as president set a precedent followed by John Adams, establishing the peaceful transfer of power between rival parties.[70][71]
-
-
-Animation showing the free/slave status of U.S. states and territories expansion, 1789–1861
-In the late 18th century, American settlers began to expand westward, some with a sense of manifest destiny.[72] The Louisiana Purchase (1803) from France nearly doubled the territory of the United States.[73] Lingering issues with Britain remained, leading to the War of 1812, which was fought to a draw.[74] Spain ceded Florida and its Gulf Coast territory in 1819.[75] The Missouri Compromise attempted to balance desires of northern states to prevent expansion of slavery in the country with those of southern states to expand it, admitting Missouri as a slave state and Maine as a free state and declared a policy of prohibiting slavery in the remaining Louisiana Purchase lands north of the 36°30′ parallel.[76] As Americans expanded further into land inhabited by Native Americans, the federal government often applied policies of Indian removal or assimilation.[77][78] The infamous Trail of Tears (1830–1850) was a U.S. government policy that forcibly removed and displaced most Native Americans living east of the Mississippi River to lands far to the west. These and earlier organized displacements prompted a long series of American Indian Wars west of the Mississippi.[79][80] The Republic of Texas was annexed in 1845,[81] and the 1846 Oregon Treaty led to U.S. control of the present-day American Northwest.[82] Victory in the Mexican–American War resulted in the 1848 Mexican Cession of California and much of the present-day American Southwest.[72][83] The California Gold Rush of 1848–1849 spurred a huge migration of white settlers to the Pacific coast, leading to even more confrontations with Native populations. One of the most violent, the California genocide of thousands of Native inhabitants, lasted into the early 1870s,[84][85] just as additional western territories and states were created.[86]
-
-Civil War (1861–1865)
-Main articles: History of the United States (1849–1865) and American Civil War
-
-Division of the states during the American Civil War:
-  Union states
-  Border states
-  Confederate states
-  Territories
-During the colonial period, slavery was legal in the American colonies, though the practice began to be significantly questioned during the American Revolution.[87] States in The North enacted abolition laws,[88] though support for slavery strengthened in Southern states, as inventions such as the cotton gin made the institution increasingly profitable for Southern elites.[89][90][91] This sectional conflict regarding slavery culminated in the American Civil War (1861–1865).[92][93]
-
-Eleven slave states seceded and formed the Confederate States of America, while the other states remained in the Union.[94] War broke out in April 1861 after the Confederacy bombarded Fort Sumter.[95] After the January 1863 Emancipation Proclamation, many freed slaves joined the Union Army.[96] The war began to turn in the Union's favor following the 1863 Siege of Vicksburg and Battle of Gettysburg, and the Confederacy surrendered in 1865 after the Union's victory in the Battle of Appomattox Court House.[97]
-
-The Reconstruction era followed the war. After the assassination of President Abraham Lincoln, Reconstruction Amendments were passed to protect the rights of African Americans. National infrastructure, including transcontinental telegraph and railroads, spurred growth in the American frontier.[98]
-
-Post-Civil War era (1865–1898)
-Main article: History of the United States (1865–1917)
-Duration: 2 minutes and 27 seconds.2:27
-An Edison Studios film showing immigrants arriving at Ellis Island in New York Harbor, a major point of entry for European immigrants in the late 19th and early 20th centuries[99][100]
-From 1865 through 1917 an unprecedented stream of immigrants arrived in the United States, including 24.4 million from Europe.[101] Most came through the port of New York City, and New York City and other large cities on the East Coast became home to large Jewish, Irish, and Italian populations, while many Germans and Central Europeans moved to the Midwest. At the same time, about one million French Canadians migrated from Quebec to New England.[102] During the Great Migration, millions of African Americans left the rural South for urban areas in the North.[103] Alaska was purchased from Russia in 1867.[104]
-
-The Compromise of 1877 effectively ended Reconstruction and white supremacists took local control of Southern politics.[105][106] African Americans endured a period of heightened, overt racism following Reconstruction, a time often called the nadir of American race relations.[107][108] A series of Supreme Court decisions, including Plessy v. Ferguson, emptied the Fourteenth and Fifteenth Amendments of their force, allowing Jim Crow laws in the South to remain unchecked, sundown towns in the Midwest, and segregation in cities across the country, which would be reinforced by the policy of redlining later adopted by the federal Home Owners' Loan Corporation.[109]
-
-An explosion of technological advancement accompanied by the exploitation of cheap immigrant labor[110] led to rapid economic development during the late 19th and early 20th centuries, allowing the United States to outpace England, France, and Germany combined.[111][112] This fostered the amassing of power by a few prominent industrialists, largely by their formation of trusts and monopolies to prevent competition.[113] Tycoons led the nation's expansion in the railroad, petroleum, and steel industries. The United States emerged as a pioneer of the automotive industry.[114] These changes were accompanied by significant increases in economic inequality, slum conditions, and social unrest, creating the environment for labor unions to begin to flourish.[115][116][117] This period eventually ended with the advent of the Progressive Era, which was characterized by significant reforms.[118][119]
-
-Rise as a superpower (1898–1945)
-Main article: History of the United States (1917–1945)
-
-The Trinity nuclear test in 1945, part of the Manhattan Project and the first detonation of a nuclear weapon. The World Wars permanently ended the country's policy of isolationism and left it as a world superpower.
-Pro-American elements in Hawaii overthrew the Hawaiian monarchy; the islands were annexed in 1898. Puerto Rico, Guam, and the Philippines were ceded by Spain following the Spanish–American War.[120] American Samoa was acquired by the United States in 1900 after the Second Samoan Civil War.[121] The U.S. Virgin Islands were purchased from Denmark in 1917.[122] The United States entered World War I alongside the Allies of World War I, helping to turn the tide against the Central Powers.[123] In 1920, a constitutional amendment granted nationwide women's suffrage.[124] During the 1920s and 30s, radio for mass communication and the invention of early television transformed communications nationwide.[125] The Wall Street Crash of 1929 triggered the Great Depression, which President Franklin D. Roosevelt responded to with New Deal social and economic policies.[126][127]
-
-At first neutral during World War II, the U.S. began supplying war materiel to the Allies of World War II in March 1941 and entered the war in December after the Empire of Japan's attack on Pearl Harbor.[128][129] The U.S. developed the first nuclear weapons and used them against the Japanese cities of Hiroshima and Nagasaki in August 1945, ending the war.[130][131] The United States was one of the "Four Policemen" who met to plan the postwar world, alongside the United Kingdom, Soviet Union, and China.[132][133] The U.S. emerged relatively unscathed from the war, with even greater economic and international political influence.[134]
-
-Cold War (1945–1991)
-Main articles: History of the United States (1945–1964), History of the United States (1964–1980), and History of the United States (1980–1991)
-
-Mikhail Gorbachev and Ronald Reagan sign the Intermediate-Range Nuclear Forces Treaty at the White House, 1987.
-After World War II, the United States entered the Cold War, where geopolitical tensions between the U.S. and the Soviet Union led the two countries to dominate world affairs.[135] The U.S. engaged in regime change against governments perceived to be aligned with the Soviet Union, and competed in the Space Race, culminating in the first crewed Moon landing in 1969.[136][137][138][139]
-
-Domestically, the U.S. experienced economic growth, urbanization, and population growth following World War II.[140] The civil rights movement emerged, with Martin Luther King Jr. becoming a prominent leader in the early 1960s.[141] The Great Society plan of President Lyndon Johnson's administration resulted in groundbreaking and broad-reaching laws, policies and a constitutional amendment to counteract some of the worst effects of lingering institutional racism.[142] The counterculture movement in the U.S. brought significant social changes, including the liberalization of attitudes toward recreational drug use and sexuality. It also encouraged open defiance of the military draft (leading to the end of conscription in 1973) and wide opposition to U.S. intervention in Vietnam (with the U.S. totally withdrawing in 1975).[143][144][145] The societal shift in the roles of women partly resulted in large increases in female labor participation in the 1970s, and by 1985 the majority of women aged 16 and older were employed.[146] The late 1980s and early 1990s saw the collapse of the Warsaw Pact and the dissolution of the Soviet Union, which marked the end of the Cold War and solidified the U.S. as the world's sole superpower.[147][148][149][150]
-
-Contemporary (1991–present)
-Main articles: History of the United States (1991–2008) and History of the United States (2008–present)
-
-The Twin Towers in New York City during the September 11 attacks of 2001
-The 1990s saw the longest recorded economic expansion in American history, a dramatic decline in crime, and advances in technology, with the World Wide Web, the evolution of the Pentium microprocessor in accordance with Moore's law, rechargeable lithium-ion batteries, the first gene therapy trial, and cloning all emerging and being improved upon throughout the decade. The Human Genome Project was formally launched in 1990, while Nasdaq became the first stock market in the United States to trade online in 1998.[151] In 1991, an American-led international coalition of states expelled an Iraqi invasion force from Kuwait in the Gulf War.[152]
-
-The September 11, 2001 attacks by the pan-Islamist militant organization Al-Qaeda led to the war on terror and subsequent military interventions in Afghanistan and Iraq.[153][154] The cultural impact of the attacks was profound and long-lasting.
-
-The U.S. housing bubble culminated in 2007 with the Great Recession, the largest economic contraction since the Great Depression.[155] Coming to a head in the 2010s, political polarization increased as sociopolitical debates on cultural issues dominated politics.[156] This polarization was capitalized upon in the January 2021 Capitol attack,[157] when a mob of protesters entered the U.S. Capitol building and attempted to prevent the peaceful transfer of power.[158]
-
-Geography
-Main articles: Geography of the United States and Borders of the United States
-
-A topographic map of the United States
-The United States is the world's third-largest country by land and total area behind Russia and Canada.[d][159][160] The 48 contiguous states and the District of Columbia occupy a combined area of 3,119,885 square miles (8,080,470 km2).[161][162] The coastal plain of the Atlantic seaboard gives way to inland forests and rolling hills in the Piedmont plateau region.[163]
-
-The Appalachian Mountains and the Adirondack massif separate the East Coast from the Great Lakes and the grasslands of the Midwest.[164] The Mississippi River System—the world's fourth longest river system—runs mainly north–south through the heart of the country. The flat, fertile prairie of the Great Plains stretches to the west, interrupted by a highland region in the southeast.[164]
-
-The Rocky Mountains, west of the Great Plains, extend north to south across the country, peaking at over 14,000 feet (4,300 m) in Colorado.[165] Farther west are the rocky Great Basin and Chihuahua, Sonoran, and Mojave deserts.[166] The Sierra Nevada and Cascade mountain ranges run close to the Pacific coast. The lowest and highest points in the contiguous United States are in the state of California,[167] about 84 miles (135 km) apart.[168] At an elevation of 20,310 feet (6,190.5 m), Alaska's Denali is the highest peak in the country and continent.[169] Active volcanoes are common throughout Alaska's Alexander and Aleutian Islands, and Hawaii consists of volcanic islands. The supervolcano underlying Yellowstone National Park in the Rockies is the continent's largest volcanic feature.[170] In 2021, the United States had 8% of global permanent meadows and pastures and 10% of cropland.[171]
-
-Climate
-Main articles: Climate of the United States and Climate change in the United States
-
-The Köppen climate types of the United States
-With its large size and geographic variety, the United States includes most climate types. East of the 100th meridian, the climate ranges from humid continental in the north to humid subtropical in the south.[172] The western Great Plains are semi-arid. Many mountainous areas of the American West have an alpine climate. The climate is arid in the Southwest, Mediterranean in coastal California, and oceanic in coastal Oregon, Washington, and southern Alaska. Most of Alaska is subarctic or polar. Hawaii and the southern tip of Florida are tropical, as well as its territories in the Caribbean and the Pacific.[173]
-
-States bordering the Gulf of Mexico are prone to hurricanes, and most of the world's tornadoes occur in the country, mainly in Tornado Alley.[174] Overall, the United States receives more high-impact extreme weather incidents than any other country.[175] Extreme weather became more frequent in the U.S. in the 21st century, with three times the number of reported heat waves as in the 1960s. In the American Southwest, droughts became more persistent and more severe.[176]
-
-Biodiversity and conservation
-Main articles: Fauna of the United States and Flora of the United States
-
-A bald eagle
-The bald eagle, the national bird of the United States since 1782[177]
-The U.S. is one of 17 megadiverse countries containing large numbers of endemic species: about 17,000 species of vascular plants occur in the contiguous United States and Alaska, and over 1,800 species of flowering plants are found in Hawaii, few of which occur on the mainland.[178] The United States is home to 428 mammal species, 784 birds, 311 reptiles, 295 amphibians,[179] and 91,000 insect species.[180]
-
-There are 63 national parks, and hundreds of other federally managed parks, forests, and wilderness areas, managed by the National Park Service and other agencies.[181] About 28% of the country's land is publicly owned and federally managed,[182] primarily in the western states.[183] Most of this land is protected, though some is leased for commercial use, and less than one percent is used for military purposes.[184][185]
-
-Environmental issues in the United States include debates on non-renewable resources and nuclear energy, air and water pollution, biodiversity, logging and deforestation,[186][187] and climate change.[188][189] The U.S. Environmental Protection Agency (EPA) is the federal agency charged with addressing most environmental-related issues.[190] The idea of wilderness has shaped the management of public lands since 1964, with the Wilderness Act.[191] The Endangered Species Act of 1973 provides a way to protect threatened and endangered species and their habitats. The United States Fish and Wildlife Service implements and enforces the Act.[192] As of 2022, the U.S. ranked 43rd among 180 countries in the Environmental Performance Index.[193] The country joined the Paris Agreement on climate change in 2016 and has many other environmental commitments.[194]
-
-Government and politics
-Main articles: Constitution of the United States and Politics of the United States
-Further information: Elections in the United States, Political ideologies in the United States, Americanism (ideology), and American civil religion
-
-The Capitol and its two legislative chambers, the Senate (left) and the House of Representatives (right)
-
-The White House, the residence and workplace of the U.S. president and the offices of the presidential staff
-
-The Supreme Court Building, which houses the nation's highest court
-The United States is a federal republic of 50 states, with its capital in a federal district, asserting sovereignty over five unincorporated territories and several uninhabited island possessions (some of which are disputed).[195][196] It is the world's oldest surviving federation, and, according to the World Economic Forum, the oldest democracy as well.[197] It is a liberal representative democracy "in which majority rule is tempered by minority rights protected by law."[198] The Constitution of the United States serves as the country's supreme legal document, also establishing the structure and responsibilities of the national federal government and its relationship with the individual states.[199]
-
-National government
-Main article: Federal government of the United States
-Comprised of three branches, all headquartered in Washington, D.C., the federal government is the national government of the United States. It is regulated by a strong system of checks and balances.[200]
-
-The U.S. Congress, a bicameral legislature, made up of the Senate and the House of Representatives, makes federal law, declares war, approves treaties, has the power of the purse,[201] and has the power of impeachment.[202] The Senate has 100 members (2 from each state), elected for a six-year term. The House of Representatives has 435 members from single member congressional districts allocated to each state on the basis of population, elected for a two-year term.[203]
-The U.S. president is the commander-in-chief of the military, can veto legislative bills before they become law (subject to congressional override), and appoints the members of the Cabinet (subject to Senate approval) and other officials, who administer and enforce federal laws and policies through their respective agencies.[204] The president and the vice president run and are elected together in a presidential election. Unlike any others in American politics, it is an indirect election, with the winner being determined by votes cast by electors of the Electoral College. The President and Vice President serve a four-year term and may be elected to the office no more than twice.[205]
-The U.S. federal judiciary, whose judges are all appointed for life by the President with Senate approval, consists primarily of the U.S. Supreme Court, the U.S. courts of appeals, and the U.S. district courts. The U.S. Supreme Court interprets laws and overturn those they find unconstitutional.[206] The Supreme Court is led by the Chief Justice of the United States. It has nine members who serve for life. The members are appointed by the sitting president when a vacancy becomes available.[207]
-The three-branch system is known as the presidential system, in contrast to the parliamentary system, where the executive is part of the legislative body. Many countries around the world copied this aspect of the 1789 Constitution of the United States, especially in the Americas.[208]
-
-Political parties
-Main articles: Political parties in the United States, Political party strength in U.S. states, and List of political parties in the United States
-
-U.S. state governments (governor and legislature) by party control:
-  Democratic control
-  Republican control
-  Split control
-The Constitution is silent on political parties. However, they developed independently in the 18th century with the Federalist and Anti-Federalist parties.[209] Since then, the United States has operated as a de facto two-party system, though the parties in that system have been different at different times.
-
-The two main national parties are presently the Democratic and the Republican. The former is perceived as relatively liberal in its political platform while the latter is perceived as relatively conservative.[210] Each has a primary system to nominate a presidential ticket, and each runs candidates for other offices in every state in the Union. Other smaller and less influential parties exist but do not have the national scope and breadth of the two main parties.
-
-Subdivisions
-Main articles: State governments of the United States, Local government in the United States, and U.S. state
-Further information: List of states and territories of the United States, Indian reservation, Territories of the United States, and Territorial evolution of the United States
-In the American federal system, sovereign powers are shared between two levels of elected government: national and state. People in the states are also represented by local elected governments, which are administrative divisions of the states.[211] States are subdivided into counties or county equivalents, and further divided into municipalities. The District of Columbia is a federal district that contains the capital of the United States, the city of Washington.[212] The territories and the District of Columbia are administrative divisions of the federal government.[213]
-
-
-Foreign relations
-Main articles: Foreign relations of the United States and Foreign policy of the United States
-see caption
-The United Nations headquarters has been situated along the East River in Midtown Manhattan since 1952; in 1945, the United States was a founding member of the UN.
-The United States has an established structure of foreign relations, and it has the world's second-largest diplomatic corps as of 2024. It is a permanent member of the United Nations Security Council,[214] and home to the United Nations headquarters.[215] The United States is a member of the G7,[216] G20,[217] and OECD intergovernmental organizations.[218] Almost all countries have embassies and many have consulates (official representatives) in the country. Likewise, nearly all countries host formal diplomatic missions with the United States, except Iran,[219] North Korea,[220] and Bhutan.[221] Though Taiwan does not have formal diplomatic relations with the U.S., it maintains close unofficial relations.[222] The United States regularly supplies Taiwan with military equipment to deter potential Chinese aggression.[223] Its geopolitical attention also turned to the Indo-Pacific when the United States joined the Quadrilateral Security Dialogue with Australia, India, and Japan.[224]
-
-The United States has a "Special Relationship" with the United Kingdom[225] and strong ties with Canada,[226] Australia,[227] New Zealand,[228] the Philippines,[229] Japan,[230] South Korea,[231] Israel,[232] and several European Union countries (France, Italy, Germany, Spain, and Poland).[233] The U.S. works closely with its NATO allies on military and national security issues, and with countries in the Americas through the Organization of American States and the United States–Mexico–Canada Free Trade Agreement. In South America, Colombia is traditionally considered to be the closest ally of the United States.[234] The U.S. exercises full international defense authority and responsibility for Micronesia, the Marshall Islands, and Palau through the Compact of Free Association.[235] It has increasingly conducted strategic cooperation with India,[236] but its ties with China have steadily deteriorated.[237][238] Since 2014, the U.S. has become a key ally of Ukraine;[239] it has also provided the country with significant military equipment and other support in response to Russia's 2022 invasion.[240]
-
-Military
-Main articles: United States Armed Forces and Military history of the United States
-
-The Pentagon, the headquarters of the U.S. Department of Defense in Arlington County, Virginia, is one of the world's largest office buildings with about 6.5 million square feet (600,000 m2) of floor space.
-The President is the commander-in-chief of the United States Armed Forces and appoints its leaders, the secretary of defense and the Joint Chiefs of Staff. The Department of Defense, which is headquartered at the Pentagon near Washington, D.C., administers five of the six service branches, which are made up of the Army, Marine Corps, Navy, Air Force, and Space Force. The Coast Guard is administered by the Department of Homeland Security in peacetime and can be transferred to the Department of the Navy in wartime.[241]
-
-The United States spent $877 billion on its military in 2022, which is by far the largest amount of any country, making up 39% of global military spending and accounting for 3.5% of the country's GDP.[242][243] The U.S. has 45% of the world's nuclear weapons, the second-largest amount after Russia.[244]
-
-The United States has the third-largest combined armed forces in the world, behind the Chinese People's Liberation Army and Indian Armed Forces.[245] The military operates about 800 bases and facilities abroad,[246] and maintains deployments greater than 100 active duty personnel in 25 foreign countries.[247]
-
-Law enforcement and crime
-Main articles: Law of the United States, Law enforcement in the United States, Crime in the United States, and Censorship in the United States
-
-J. Edgar Hoover Building, the headquarters of the Federal Bureau of Investigation (FBI), in Washington, D.C.
-There are about 18,000 U.S. police agencies from local to national level in the United States.[248] Law in the United States is mainly enforced by local police departments and sheriff departments in their municipal or county jurisdictions. The state police departments have authority in their respective state, and federal agencies such as the Federal Bureau of Investigation (FBI) and the U.S. Marshals Service have national jurisdiction and specialized duties, such as protecting civil rights, national security and enforcing U.S. federal courts' rulings and federal laws.[249] State courts conduct most civil and criminal trials,[250] and federal courts handle designated crimes and appeals of state court decisions.[251]
-
-As of January 2023, the United States has the sixth highest per-capita incarceration rate in the world, at 531 people per 100,000; and the largest prison and jail population in the world with almost 2 million people incarcerated.[252][253][254] An analysis of the World Health Organization Mortality Database from 2010 showed U.S. homicide rates "were 7 times higher than in other high-income countries, driven by a gun homicide rate that was 25 times higher."[255]
-
-Economy
-Main article: Economy of the United States
-Further information: Economic history of the United States and Tourism in the United States
-see caption
-The U.S. dollar, most-used currency in international transactions and the world's foremost reserve currency[256]
-
-Microsoft campus, in Redmond, Washington, is the headquarters of Microsoft, the world's biggest company by market capitalization.[257]
-The U.S. has been the world's largest economy nominally since about 1890.[258] The 2023 nominal U.S. gross domestic product (GDP) of $27 trillion was the largest in the world, constituting over 25% of the global economy or 15% at purchasing power parity (PPP).[259][13] From 1983 to 2008, U.S. real compounded annual GDP growth was 3.3%, compared to a 2.3% weighted average for the rest of the Group of Seven.[260] The country ranks first in the world by disposable income per capita and nominal GDP;[261] second by GDP (PPP), after China;[13] and ninth by GDP (PPP) per capita.[13]
-
-Of the world's 500 largest companies, 136 are headquartered in the U.S.[262] The U.S. dollar is the currency most used in international transactions and is the world's foremost reserve currency, backed by the country's dominant economy, its military, the petrodollar system, and its linked eurodollar and large U.S. treasuries market.[256] Several countries use it as their official currency and in others it is the de facto currency.[263][264] It has free trade agreements with several countries, including the USMCA.[265] The U.S. ranked second in the Global Competitiveness Report in 2019, after Singapore.[266] While its economy has reached a post-industrial level of development, the United States remains an industrial power.[267] As of 2021, the U.S. is the second-largest manufacturing country after China.[268]
-
-
-The New York Stock Exchange on Wall Street, the world's largest stock exchange by market capitalization[269]
-New York City is the world's principal financial center[270][271] and the epicenter of the world's largest metropolitan economy.[272] The New York Stock Exchange and Nasdaq, both located in New York City, are the world's two largest stock exchanges by market capitalization and trade volume.[273][274] The United States is at or near the forefront of technological advancement and innovation[275] in many economic fields, especially in artificial intelligence; computers; pharmaceuticals; and medical, aerospace and military equipment.[276] The country's economy is fueled by abundant natural resources, a well-developed infrastructure, and high productivity.[277] The largest U.S. trading partners are the European Union, Mexico, Canada, China, Japan, South Korea, the United Kingdom, Vietnam, India, and Taiwan.[278] The United States is the world's largest importer and the second-largest exporter after China.[279] It is by far the world's largest exporter of services.[280]
-
-Americans have the highest average household and employee income among OECD member states,[281] and the fourth-highest median household income,[282] up from sixth-highest in 2013.[283] Wealth in the United States is highly concentrated; the richest 10% of the adult population own 72% of the country's household wealth, while the bottom 50% own just 2%.[284] Income inequality in the U.S. remains at record highs,[285] with the top fifth of earners taking home more than half of all income[286] and giving the U.S. one of the widest income distributions among OECD members.[287][288] The U.S. ranks first in the number of dollar billionaires and millionaires, with 735 billionaires and nearly 22 million millionaires (as of 2023).[289] There were about 582,500 sheltered and unsheltered homeless persons in the U.S. in 2022, with 60% staying in an emergency shelter or transitional housing program.[290] In 2018, six million children experienced food insecurity.[291] Feeding America estimates that around one in seven, or approximately 11 million, children experience hunger and do not know where they will get their next meal or when.[292] As of 2021, 38 million people, about 12% of the U.S. population, were living in poverty.[293]
-
-The United States has a smaller welfare state and redistributes less income through government action than most other high-income countries.[294][295] It is the only advanced economy that does not guarantee its workers paid vacation nationally[296] and is one of a few countries in the world without federal paid family leave as a legal right.[297] The United States has a higher percentage of low-income workers than almost any other developed country, largely because of a weak collective bargaining system and lack of government support for at-risk workers.[298]
-
-Science, technology, and energy
-Main articles: Science and technology in the United States, Science policy of the United States, Communications in the United States, and Energy in the United States
-
-U.S. astronaut Buzz Aldrin saluting the American flag on the Moon during the 1969 Apollo 11 mission; the United States is the only country that has landed crews on the lunar surface.
-The United States has been a leader in technological innovation since the late 19th century and scientific research since the mid-20th century. Methods for producing interchangeable parts and the establishment of a machine tool industry enabled the large-scale manufacturing of U.S. consumer products in the late 19th century. By the early 20th century, factory electrification, the introduction of the assembly line, and other labor-saving techniques created the system of mass production.[299] The United States is a leader in the development of artificial intelligence technology and has maintained a space program since the late 1950s, with plans for long-term habitation of the Moon.[300][301]
-
-In 2022, the United States was the country with the second-highest number of published scientific papers.[302] As of 2021, the U.S. ranked second by the number of patent applications, and third by trademark and industrial design applications.[303] In 2023, the United States ranked 3rd in the Global Innovation Index.[304]
-
-As of 2022, the United States receives approximately 81% of its energy from fossil fuel and the largest source of the country's energy came from petroleum (35.8%), followed by natural gas (33.4%), renewable sources (13.3%), coal (9.8%), and nuclear power (8%).[305][306] The United States constitutes less than 5% of the world's population, but consumes 17% of the world's energy.[307][308] The U.S. ranks as the second-highest emitter of greenhouse gases.[309]
-
-Transportation
-Main article: Transportation in the United States
-
-Hartsfield–Jackson Atlanta International Airport, serving the Atlanta metropolitan area, is the world's busiest airport by passenger traffic with over 93 million passengers annually in 2022.[310]
-Personal transportation in the United States is dominated by automobiles,[311][312] which operate on a network of 4 million miles (6.4 million kilometers) of public roads, making it the longest network in the world.[313][314] The Oldsmobile Curved Dash and the Ford Model T, both American cars, are considered the first mass-produced[315] and mass-affordable[316] cars, respectively. As of 2022, the United States is the second-largest manufacturer of motor vehicles[317] and is home to Tesla, the world's most valuable car company.[318] American automotive company General Motors held the title of the world's best-selling automaker from 1931 to 2008.[319] Currently, the American automotive industry is the world's second-largest automobile market by sales,[320] and the U.S. has the highest vehicle ownership per capita in the world,[321] with 910 vehicles per 1000 people.[322] The United States's rail transport network, the longest network in the world,[323] handles mostly freight.[324][325]
-
-The American civil airline industry is entirely privately owned and has been largely deregulated since 1978, while most major airports are publicly owned.[326] The three largest airlines in the world by passengers carried are U.S.-based; American Airlines is number one after its 2013 acquisition by US Airways.[327] Of the world's 50 busiest passenger airports, 16 are in the United States, including the top five and the busiest, Hartsfield–Jackson Atlanta International Airport.[328][329] As of 2022, there are 19,969 airports in the U.S., of which 5,193 are designated as "public use", including for general aviation and other activities.[330]
-
-Of the fifty busiest container ports, four are located in the United States, of which the busiest is the Port of Los Angeles.[331] The country's inland waterways are the world's fifth-longest, and total 41,009 km (25,482 mi).[332]
-
-Demographics
-Main article: Demographics of the United States
-Population
-Main articles: Americans and Race and ethnicity in the United States
-See also: List of U.S. states by population
-
-As of 2020, the majority of the U.S. population lived in suburbs. Above: Nassau County, New York on Long Island, immediately east of New York City.
-The U.S. Census Bureau reported 331,449,281 residents as of April 1, 2020,[n][333] making the United States the third-most-populous country in the world, after China and India.[334] According to the Bureau's U.S. Population Clock, on January 28, 2021, the U.S. population had a net gain of one person every 100 seconds, or about 864 people per day.[335] In 2018, 52% of Americans age 15 and over were married, 6% were widowed, 10% were divorced, and 32% had never been married.[336] In 2021, the total fertility rate for the U.S. stood at 1.7 children per woman,[337] and it had the world's highest rate of children (23%) living in single-parent households in 2019.[338]
-
-The United States has a diverse population; 37 ancestry groups have more than one million members.[339] White Americans with ancestry from Europe, the Middle East or North Africa, form the largest racial and ethnic group at 57.8% of the United States population.[340][341] Hispanic and Latino Americans form the second-largest group and are 18.7% of the United States population. African Americans constitute the country's third-largest ancestry group and are 12.1% of the total U.S. population.[339] Asian Americans are the country's fourth-largest group, composing 5.9% of the United States population, while the country's 3.7 million Native Americans account for about 1%.[339] In 2020, the median age of the United States population was 38.5 years.[334]
-
-Language
-Main article: Languages of the United States
-
-Most spoken languages in the U.S.
-While many languages are spoken in the United States, English is by far the most commonly spoken and written.[342] Although there is no official language at the federal level, some laws, such as U.S. naturalization requirements, standardize English, and most states have declared it the official language.[343] Three states and four U.S. territories have recognized local or indigenous languages in addition to English, including Hawaii (Hawaiian),[344] Alaska (twenty Native languages),[o][345] South Dakota (Sioux),[346] American Samoa (Samoan), Puerto Rico (Spanish), Guam (Chamorro), and the Northern Mariana Islands (Carolinian and Chamorro). In Puerto Rico, Spanish is more widely spoken than English.[347]
-
-According to the American Community Survey in 2010, some 229 million people out of the total U.S. population of 308 million spoke only English at home. About 37 million spoke Spanish at home, making it the second most commonly used language. Other languages spoken at home by one million people or more include Chinese (2.8 million), Tagalog (1.6 million), Vietnamese (1.4 million), French (1.3 million), Korean (1.1 million), and German (1 million).[348]
-
-Immigration
-Main articles: Immigration to the United States and United States Border Patrol
-
-Mexico–United States border wall between San Diego (left) and Tijuana (right)
-America's immigrant population, 51 million, is by far the world's largest in absolute terms.[349][350] In 2022, there were 87.7 million immigrants and U.S.-born children of immigrants in the United States, accounting for nearly 27% of the overall U.S. population.[351] In 2017, out of the U.S. foreign-born population, some 45% (20.7 million) were naturalized citizens, 27% (12.3 million) were lawful permanent residents, 6% (2.2 million) were temporary lawful residents, and 23% (10.5 million) were unauthorized immigrants.[352] In 2019, the top countries of origin for immigrants were Mexico (24% of immigrants), India (6%), China (5%), the Philippines (4.5%), and El Salvador (3%).[353] The United States has led the world in refugee resettlement for decades, admitting more refugees than the rest of the world combined.[354]
-
-Religion
-Main articles: Religion in the United States and Irreligion in the United States
-See also: List of religious movements that began in the United States
-Religious affiliation in the U.S., according to a 2022 Gallup poll:[7]
-
-  Protestantism (34%)
-  Catholicism (23%)
-  Non-specific Christian (11%)
-  Mormonism (2%)
-  Judaism (2%)
-  Other religions (6%)
-  Unaffiliated (21%)
-  Unanswered (1%)
-The First Amendment guarantees the free exercise of religion and forbids Congress from passing laws respecting its establishment.[355][356] Religious practice is widespread, among the most diverse in the world,[357] and profoundly vibrant.[358] The country has the world's largest Christian population.[359] A majority of the global Jewish population lives in the United States, as measured by the Law of Return.[360] Other notable faiths include Buddhism, Hinduism, Islam, many New Age movements, and Native American religions.[361] Religious practice varies significantly by region.[362] "Ceremonial deism" is common in American culture.[363]
-
-The overwhelming majority of Americans believe in a higher power or spiritual force, engage in spiritual practices such as prayer, and consider themselves religious or spiritual.[364][365] In the "Bible Belt", located within the Southern United States, evangelical Protestantism plays a significant role culturally, whereas New England and the Western United States tend to be more secular.[362] Mormonism—a Restorationist movement, whose members migrated westward from Missouri and Illinois under the leadership of Brigham Young in 1847 after the assassination of Joseph Smith[366]—remains the predominant religion in Utah to this day.[367]
-
-Urbanization
-Main articles: Urbanization in the United States and List of United States cities by population
-About 82% of Americans live in urban areas, including suburbs;[159] about half of those reside in cities with populations over 50,000.[368] In 2022, 333 incorporated municipalities had populations over 100,000, nine cities had more than one million residents, and four cities (New York City, Los Angeles, Chicago, and Houston) had populations exceeding two million.[369] Many U.S. metropolitan populations are growing rapidly, particularly in the South and West.[370]
-
-Health
-See also: Healthcare in the United States, Healthcare reform in the United States, and Health insurance in the United States
-The Texas Medical Center, a cluster of contemporary skyscrapers, at night
-Texas Medical Center in Houston is the largest medical complex in the world.[372][373] As of 2018, it employed 120,000 people and treated 10 million patients annually.[374]
-According to the Centers for Disease Control (CDC), average American life expectancy at birth was 77.5 years in 2022 (74.8 years for men and 80.2 years for women). This was a gain of 1.1 years from 76.4 years in 2021, but the CDC noted that the new average "didn't fully offset the loss of 2.4 years between 2019 and 2021". The COVID pandemic and higher overall mortality due to opioid overdoses and suicides were held mostly responsible for the previous drop in life expectancy.[375] The same report stated that the 2022 gains in average U.S. life expectancy were especially significant for men, Hispanics, and American Indian–Alaskan Native people (AIAN). Starting in 1998, the life expectancy in the U.S. fell behind that of other wealthy industrialized countries, and Americans' "health disadvantage" gap has been increasing ever since.[376] The U.S. has one of the highest suicide rates among high-income countries.[377] Approximately one-third of the U.S. adult population is obese and another third is overweight.[378] The U.S. healthcare system far outspends that of any other country, measured both in per capita spending and as a percentage of GDP, but attains worse healthcare outcomes when compared to peer countries for reasons that are debated.[379] The United States is the only developed country without a system of universal healthcare, and a significant proportion of the population that does not carry health insurance.[380] Government-funded healthcare coverage for the poor (Medicaid) and for those age 65 and older (Medicare) is available to Americans who meet the programs' income or age qualifications. In 2010, former President Obama passed the Patient Protection and Affordable Care Act.[p][381]
-
-Education
-Main articles: Education in the United States and Higher education in the United States
-Photograph of the University of Virginia
-The University of Virginia, founded by Thomas Jefferson in 1819, is one of many public colleges and universities in the United States.
-American primary and secondary education (known in the U.S. as K-12, "kindergarten through 12th grade") is decentralized. It is operated by state, territorial, and sometimes municipal governments and regulated by the U.S. Department of Education. In general, children are required to attend school or an approved homeschool from the age of five or six (kindergarten or first grade) until they are 18 years old. This often brings students through the 12th grade, the final year of a U.S. high school, but some states and territories allow them to leave school earlier, at age 16 or 17.[382] The U.S. spends more on education per student than any country in the world,[383] an average of $12,794 per year per public elementary and secondary school student in 2016–2017.[384] Among Americans age 25 and older, 84.6% graduated from high school, 52.6% attended some college, 27.2% earned a bachelor's degree, and 9.6% earned a graduate degree.[385] The U.S. literacy rate is near-universal.[159][386] The country has the most Nobel Prize winners in history, with 411 (having won 413 awards).[387][388]
-
-U.S. tertiary or higher education has earned a global reputation. Many of the world's top universities, as listed by various ranking organizations, are in the United States, including 19 of the top 25.[389][390] American higher education is dominated by state university systems, although the country's many private universities and colleges enroll about 20% of all American students. Large amounts of federal financial aid are provided to students in the form of grants and loans.
-
-Colleges and universities directly funded by the federal government are limited to military personnel and government employees and include the U.S. service academies, the Naval Postgraduate School, and military staff colleges. Local community colleges generally offer coursework and degree programs covering the first two years of college study. They often have more open admission policies, shorter academic programs, and lower tuition.[391]
-
-As for public expenditures on higher education, the U.S. spends more per student than the OECD average, and more than all nations in combined public and private spending.[392] Despite some student loan forgiveness programs in place,[393] student loan debt has increased by 102% in the last decade,[394] and exceeded 1.7 trillion dollars as of 2022.[395]
-
-Culture and society
-Main articles: Culture of the United States and Society of the United States
-The Statue of Liberty, a large teal bronze sculpture on a stone pedestal
-The Statue of Liberty (Liberty Enlightening the World) on Liberty Island in New York Harbor was an 1866 gift from France that has become an iconic symbol of the American Dream.[396]
-Americans have traditionally been characterized by a unifying political belief in an "American creed" emphasizing liberty, equality under the law, democracy, social equality, property rights, and a preference for limited government.[397][398] Culturally, the country has been described as having the values of individualism and personal autonomy,[399][400] having a strong work ethic,[401] competitiveness,[402] and voluntary altruism towards others.[403][404][405] According to a 2016 study by the Charities Aid Foundation, Americans donated 1.44% of total GDP to charity, the highest rate in the world by a large margin.[406] The United States is home to a wide variety of ethnic groups, traditions, and values. It has acquired significant cultural and economic soft power.[407][408]
-
-Nearly all present Americans or their ancestors came from Europe, Africa, and Asia ("the Old World") within the past five centuries.[409] Mainstream American culture is a Western culture largely derived from the traditions of European immigrants with influences from many other sources, such as traditions brought by slaves from Africa.[410] More recent immigration from Asia and especially Latin America has added to a cultural mix that has been described as a homogenizing melting pot, and a heterogeneous salad bowl, with immigrants contributing to, and often assimilating into, mainstream American culture. The American Dream, or the perception that Americans enjoy high social mobility, plays a key role in attracting immigrants.[411] Whether this perception is accurate has been a topic of debate.[412][413][414] While mainstream culture holds that the United States is a classless society,[415] scholars identify significant differences between the country's social classes, affecting socialization, language, and values.[416] Americans tend to greatly value socioeconomic achievement, but being ordinary or average is promoted by some as a noble condition as well.[417]
-
-The United States is considered to have the strongest protections of free speech of any country under the First Amendment,[418] which protects flag desecration, hate speech, blasphemy, and lese-majesty as forms of protected expression.[419][420][421] A 2016 Pew Research Center poll found that Americans were the most supportive of free expression of any polity measured.[422] They are the "most supportive of freedom of the press and the right to use the Internet without government censorship."[423] It is a socially progressive country[424] with permissive attitudes surrounding human sexuality.[425] LGBT rights in the United States are advanced by global standards.[425][426][427]
-
-Literature
-Main articles: American literature and American philosophy
-See also: List of American novelists
-Photograph of Mark Twain
-Mark Twain, who William Faulkner called "the father of American literature"[428]
-Colonial American authors were influenced by John Locke and various other Enlightenment philosophers.[429][430] Before and shortly after the Revolutionary War, the newspaper rose to prominence, filling a demand for anti-British national literature.[431][432] Led by Ralph Waldo Emerson and Margaret Fuller in New England,[433] transcendentalism branched from Unitarianism as the first major American philosophical movement.[434][435] During the nineteenth-century American Renaissance, writers like Walt Whitman and Harriet Beecher Stowe established a distinctive American literary tradition.[436][437] As literacy rates rose, periodicals published more stories centered around industrial workers, women, and the rural poor.[438][439] Naturalism, regionalism, and realism—the latter associated with Mark Twain—were the major literary movements of the period.[440][441]
-
-While modernism generally took on an international character, modernist authors working within the United States more often rooted their work in specific regions, peoples, and cultures.[442] Following the Great Migration to northern cities, African-American and black West Indian authors of the Harlem Renaissance developed an independent tradition of literature that rebuked a history of inequality and celebrated black culture. An important cultural export during the Jazz Age, these writings were a key influence on the négritude philosophy.[443][444] In the 1950s, an ideal of homogeneity led many authors to attempt to write the Great American Novel,[445] while the Beat Generation rejected this conformity, using styles that elevated the impact of the spoken word over mechanics to describe drug use, sexuality, and the failings of society.[446][447] Contemporary literature is more pluralistic than in previous eras, with the closest thing to a unifying feature being a trend toward self-conscious experiments with language.[448]
-
-Mass media
-Further information: Mass media in the United States
-See also: Newspapers in the United States, Television in the United States, Internet in the United States, Radio in the United States, and Video games in the United States
-
-Comcast Center in Philadelphia, headquarters of Comcast, the world's largest telecommunications and media conglomerate
-Media is broadly uncensored, with the First Amendment providing significant protections, as reiterated in New York Times Co. v. United States.[418] The four major broadcasters in the U.S. are the National Broadcasting Company (NBC), Columbia Broadcasting System (CBS), American Broadcasting Company (ABC), and Fox Broadcasting Company (FOX). The four major broadcast television networks are all commercial entities. Cable television offers hundreds of channels catering to a variety of niches.[449] As of 2021, about 83% of Americans over age 12 listen to broadcast radio, while about 40% listen to podcasts.[450] As of 2020, there were 15,460 licensed full-power radio stations in the U.S. according to the Federal Communications Commission (FCC).[451] Much of the public radio broadcasting is supplied by NPR, incorporated in February 1970 under the Public Broadcasting Act of 1967.[452]
-
-U.S. newspapers with a global reach and reputation include The Wall Street Journal, The New York Times, The Washington Post, and USA Today.[453] About 800 publications are produced in Spanish.[454][455] With few exceptions, newspapers are privately owned, either by large chains such as Gannett or McClatchy, which own dozens or even hundreds of newspapers; by small chains that own a handful of papers; or, in a situation that is increasingly rare, by individuals or families. Major cities often have alternative newspapers to complement the mainstream daily papers, such as The Village Voice in New York City and LA Weekly in Los Angeles. The five most popular websites used in the U.S. are Google, YouTube, Amazon, Yahoo, and Facebook, with all of them being American companies.[456]
-
-As of 2022, the video game market of the United States is the world's largest by revenue.[457] There are 444 publishers, developers, and hardware companies in California alone.[458]
-
-Theater
-Main article: Theater in the United States
-
-Broadway theatres in Theater District, Manhattan
-The United States is well known for its cinema and theater. Mainstream theater in the United States derives from the old European theatrical tradition and has been heavily influenced by the British theater.[459] By the middle of the 19th century America had created new distinct dramatic forms in the Tom Shows, the showboat theater and the minstrel show.[460] The central hub of the American theater scene is Manhattan, with its divisions of Broadway, off-Broadway, and off-off-Broadway.[461]
-
-Many movie and television stars have gotten their big break working in New York productions. Outside New York City, many cities have professional regional or resident theater companies that produce their own seasons. The biggest-budget theatrical productions are musicals. U.S. theater has an active community theater culture.[462]
-
-The Tony Awards recognizes excellence in live Broadway theatre and are presented at an annual ceremony in Manhattan. The awards are given for Broadway productions and performances. One is also given for regional theatre. Several discretionary non-competitive awards are given as well, including a Special Tony Award, the Tony Honors for Excellence in Theatre, and the Isabelle Stevenson Award.[463]
-
-Visual arts
-Main articles: Visual art of the United States and Architecture of the United States
-
-American Gothic (1930) by Grant Wood is one of the most famous American paintings and is widely parodied.[464]
-In the visual arts, the Hudson River School was a mid-19th-century movement in the tradition of European naturalism. The 1913 Armory Show in New York City, an exhibition of European modernist art, shocked the public and transformed the U.S. art scene.[465]
-
-Georgia O'Keeffe, Marsden Hartley, and others experimented with new and individualistic styles, which would become known as American modernism. Major artistic movements such as the abstract expressionism of Jackson Pollock and Willem de Kooning and the pop art of Andy Warhol and Roy Lichtenstein developed largely in the United States. Major photographers include Alfred Stieglitz, Edward Steichen, Dorothea Lange, Edward Weston, James Van Der Zee, Ansel Adams, and Gordon Parks.[466]
-
-The tide of modernism and then postmodernism has brought global fame to American architects, including Frank Lloyd Wright, Philip Johnson, and Frank Gehry.[467] The Metropolitan Museum of Art in Manhattan is the largest art museum in the United States.[468]
-
-Music
-Main article: Music of the United States
-American folk music encompasses numerous music genres, variously known as traditional music, traditional folk music, contemporary folk music, or roots music. Many traditional songs have been sung within the same family or folk group for generations, and sometimes trace back to such origins as the British Isles, Mainland Europe, or Africa.[469] The rhythmic and lyrical styles of African-American music in particular have influenced American music.[470] Banjos were brought to America through the slave trade. Minstrel shows incorporating the instrument into their acts led to its increased popularity and widespread production in the 19th century.[471][472] The electric guitar, first invented in the 1930s, and mass-produced by the 1940s, had an enormous influence on popular music, in particular due to the development of rock and roll.[473]
-
-
-The Country Music Hall of Fame and Museum in Nashville, Tennessee
-Elements from folk idioms such as the blues and old-time music were adopted and transformed into popular genres with global audiences. Jazz grew from blues and ragtime in the early 20th century, developing from the innovations and recordings of composers such as W.C. Handy and Jelly Roll Morton. Louis Armstrong and Duke Ellington increased its popularity early in the 20th century.[474] Country music developed in the 1920s,[475] rock and roll in the 1930s,[473] and bluegrass[476] and rhythm and blues in the 1940s.[477] In the 1960s, Bob Dylan emerged from the folk revival to become one of the country's most celebrated songwriters.[478] The musical forms of punk and hip hop both originated in the United States in the 1970s.[479]
-
-The United States has the world's largest music market with a total retail value of $15.9 billion in 2022.[480] Most of the world's major record companies are based in the U.S.; they are represented by the Recording Industry Association of America (RIAA).[481] Mid-20th-century American pop stars, such as Frank Sinatra[482] and Elvis Presley,[483] became global celebrities and best-selling music artists,[474] as have artists of the late 20th century, such as Michael Jackson,[484] Madonna,[485] Whitney Houston,[486] and Prince,[487] and of early 21st century such as Taylor Swift and Beyoncé.[488]
-
-Fashion
-Main article: Fashion in the United States
-
-Haute couture fashion models on the catwalk during New York Fashion Week
-The United States and China collectively account for the majority of global apparel demand. Apart from professional business attire, American fashion is eclectic and predominantly informal. While Americans' diverse cultural roots are reflected in their clothing, sneakers, jeans, T-shirts, and baseball caps are emblematic of American styles.[489] New York is considered to be one of the "big four" global fashion capitals, along with Paris, Milan, and London. A study demonstrated that general proximity to Manhattan's Garment District has been synonymous with American fashion since its inception in the early 20th century.[490]
-
-The headquarters of many designer labels reside in Manhattan. Labels cater to niche markets, such as pre teens. There has been a trend in the United States fashion towards sustainable clothing.[491] New York Fashion Week is one of the most influential fashion weeks in the world, and occurs twice a year.[492]
-
-Cinema
-Main article: Cinema of the United States
-
-The iconic Hollywood Sign, in the Hollywood Hills, often regarded as the symbol of the American film industry
-The U.S. film industry has a worldwide influence and following. Hollywood, a district in northern Los Angeles, the nation's second-most populous city, is also metonymous for the American filmmaking industry, the third-largest in the world, following India and Nigeria.[493][494][495] The major film studios of the United States are the primary source of the most commercially successful and most ticket-selling movies in the world.[496][497] Since the early 20th century, the U.S. film industry has largely been based in and around Hollywood, although in the 21st century an increasing number of films are not made there, and film companies have been subject to the forces of globalization.[498] The Academy Awards, popularly known as the Oscars, have been held annually by the Academy of Motion Picture Arts and Sciences since 1929,[499] and the Golden Globe Awards have been held annually since January 1944.[500]
-
-The industry enjoyed its golden years, in what is commonly referred to as the "Golden Age of Hollywood", from the early sound period until the early 1960s,[501] with screen actors such as John Wayne and Marilyn Monroe becoming iconic figures.[502][503] In the 1970s, "New Hollywood" or the "Hollywood Renaissance"[504] was defined by grittier films influenced by French and Italian realist pictures of the post-war period.[505] The 21st century was marked by the rise of American streaming platforms, which came to rival traditional cinema.[506][507]
-
-Cuisine
-Main article: American cuisine
-Further information: List of American regional and fusion cuisines
-
-A Thanksgiving dinner with roast turkey, mashed potatoes, pickles, corn, candied yams, cranberry jelly, shrimps, stuffing, green peas, deviled eggs, green salad and apple sauce
-Early settlers were introduced by Native Americans to foods such as turkey, sweet potatoes, corn, squash, and maple syrup. Of the most enduring and pervasive examples are variations of the native dish called succotash. Early settlers and later immigrants combined these with foods they were familiar with, such as wheat flour,[508] beef, and milk to create a distinctive American cuisine.[509][510] New World crops, especially pumpkin, corn, potatoes, and turkey as the main course are part of a shared national menu on Thanksgiving, when many Americans prepare or purchase traditional dishes to celebrate the occasion.[511]
-
-Characteristic American dishes such as apple pie, fried chicken, doughnuts, french fries, macaroni and cheese, ice cream, pizza, hamburgers, and hot dogs derive from the recipes of various immigrant groups.[512][513][514][515] Mexican dishes such as burritos and tacos preexisted the United States in areas later annexed from Mexico, and adaptations of Chinese cuisine as well as pasta dishes freely adapted from Italian sources are all widely consumed.[516] American chefs have had a significant impact on society both domestically and internationally. In 1946, the Culinary Institute of America was founded by Katharine Angell and Frances Roth. This would become the United States' most prestigious culinary school, where many of the most talented American chefs would study prior to successful careers.[517][518]
-
-The United States restaurant industry was projected at $899 billion in sales for 2020,[519][520] and employed more than 15 million people, representing 10% of the nation's workforce directly.[519] It is the country's second-largest private employer and the third-largest employer overall.[521][522] The United States is home to over 220 Michelin Star rated restaurants, 70 of which are in New York City alone.[523] Wine has been produced in what is now the United States since the 1500s, with the first widespread production beginning in what is now New Mexico in 1628.[524][525][526] Today, wine production is undertaken in all fifty states, with California producing 84 percent of all US wine. With more than 1,100,000 acres (4,500 km2) under vine, the United States is the fourth-largest wine producing country in the world, after Italy, Spain, and France.[527][528]
-
-The American fast-food industry, the world's first and largest, pioneered the drive-through format in the 1940s[529] and is often viewed as being a symbol of U.S. marketing dominance. American companies such as McDonald's,[530] Burger King, Pizza Hut, Kentucky Fried Chicken, and Domino's Pizza, among many others, have numerous outlets around the world.[531]