openai-cookbook/examples/Summarizing_with_controllab...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summarizing with Controllable Detail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The objective of this notebook is to demonstrate how to summarize large documents with a controllable level of detail. If you give a GPT model the task of summarizing a long document (e.g. 10k or more tokens), you'll tend to get back a relatively short summary that isn't proportional to the length of the document. For instance, a summary of a 20k token document will not be twice as long as a summary of a 10k token document. One way we can fix this is to split our document up into pieces, and produce a summary piecewise. After many queries to a GPT model, the full summary can be reconstructed. By controlling the number of text chunks and their sizes, we can ultimately control the level of detail in the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import tiktoken\n",
    "from typing import List, Tuple, Optional\n",
    "from tqdm import tqdm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# open dataset containing part of the text of the Wikipedia page for the United States\n",
    "with open(\"data/united_states_wikipedia.txt\", \"r\") as file:\n",
    "    united_states_wikipedia_text = file.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "15781"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load encoding and check the length of dataset\n",
    "encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
    "len(encoding.encode(united_states_wikipedia_text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll define a simple utility to wrap calls to the OpenAI API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_chat_completion(messages, model='gpt-3.5-turbo'):\n",
    "    response = openai.ChatCompletion.create(\n",
    "        model=model,\n",
    "        messages=messages,\n",
    "        temperature=0,\n",
    "    )\n",
    "    return response.choices[0].message['content']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we'll define some utilities to chunk a large document into smaller pieces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize(text: str) -> List[str]:\n",
    "    encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
    "    return encoding.encode(text)\n",
    "\n",
    "\n",
    "def chunk_on_delimiter(input_string, max_tokens, delimiter):\n",
    "    chunks = input_string.split(delimiter)\n",
    "    combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(\n",
    "        chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True\n",
    "    )\n",
    "    if dropped_chunk_count > 0:\n",
    "        print(f\"warning: {dropped_chunk_count} chunks were dropped due to overflow\")\n",
    "    combined_chunks = [f\"{chunk}{delimiter}\" for chunk in combined_chunks]\n",
    "    return combined_chunks\n",
    "\n",
    "\n",
    "def combine_chunks_with_no_minimum(\n",
    "    chunks: List[str],\n",
    "    max_tokens: int,\n",
    "    chunk_delimiter=\"\\n\\n\",\n",
    "    header: Optional[str] = None,\n",
    "    add_ellipsis_for_overflow=False,\n",
    ") -> Tuple[List[str], List[int]]:\n",
    "    dropped_chunk_count = 0\n",
    "    output = []  # list to hold the final combined chunks\n",
    "    output_indices = []  # list to hold the indices of the final combined chunks\n",
    "    candidate = (\n",
    "        [] if header is None else [header]\n",
    "    )  # list to hold the current combined chunk candidate\n",
    "    candidate_indices = []\n",
    "    for chunk_i, chunk in enumerate(chunks):\n",
    "        chunk_with_header = [chunk] if header is None else [header, chunk]\n",
    "        if len(tokenize(chunk_delimiter.join(chunk_with_header))) > max_tokens:\n",
    "            print(f\"warning: chunk overflow\")\n",
    "            if (\n",
    "                add_ellipsis_for_overflow\n",
    "                and len(tokenize(chunk_delimiter.join(candidate + [\"...\"]))) <= max_tokens\n",
    "            ):\n",
    "                candidate.append(\"...\")\n",
    "                dropped_chunk_count += 1\n",
    "            continue  # this case would break downstream assumptions\n",
    "        # estimate token count with the current chunk added\n",
    "        extended_candidate_token_count = len(tokenize(chunk_delimiter.join(candidate + [chunk])))\n",
    "        # If the token count exceeds max_tokens, add the current candidate to output and start a new candidate\n",
    "        if extended_candidate_token_count > max_tokens:\n",
    "            output.append(chunk_delimiter.join(candidate))\n",
    "            output_indices.append(candidate_indices)\n",
    "            candidate = chunk_with_header  # re-initialize candidate\n",
    "            candidate_indices = [chunk_i]\n",
    "        # otherwise keep extending the candidate\n",
    "        else:\n",
    "            candidate.append(chunk)\n",
    "            candidate_indices.append(chunk_i)\n",
    "    # add the remaining candidate to output if it's not empty\n",
    "    if (header is not None and len(candidate) > 1) or (header is None and len(candidate) > 0):\n",
    "        output.append(chunk_delimiter.join(candidate))\n",
    "        output_indices.append(candidate_indices)\n",
    "    return output, output_indices, dropped_chunk_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can define a utility to summarize text with a controllable level of detail (note the detail parameter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def summarize(text: str,\n",
    "              detail: float = 0,\n",
    "              model: str = 'gpt-3.5-turbo',\n",
    "              additional_instructions: Optional[str] = None,\n",
    "              minimum_chunk_size: Optional[int] = 500,\n",
    "              chunk_delimiter: str = \".\",\n",
    "              summarize_recursively = False,\n",
    "              verbose=False):\n",
    "    \"\"\"\n",
    "    Summarizes a given text by splitting it into chunks, each of which is summarized individually. \n",
    "    The level of detail in the summary can be adjusted, and the process can optionally be made recursive.\n",
    "\n",
    "    Parameters:\n",
    "    - text (str): The text to be summarized.\n",
    "    - detail (float, optional): A value between 0 and 1 indicating the desired level of detail in the summary.\n",
    "      0 leads to a higher level summary, and 1 results in a more detailed summary. Defaults to 0.\n",
    "    - model (str, optional): The model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.\n",
    "    - additional_instructions (Optional[str], optional): Additional instructions to provide to the model for customizing summaries.\n",
    "    - minimum_chunk_size (Optional[int], optional): The minimum size for text chunks. Defaults to 500.\n",
    "    - chunk_delimiter (str, optional): The delimiter used to split the text into chunks. Defaults to \".\".\n",
    "    - summarize_recursively (bool, optional): If True, summaries are generated recursively, using previous summaries for context.\n",
    "    - verbose (bool, optional): If True, prints detailed information about the chunking process.\n",
    "\n",
    "    Returns:\n",
    "    - str: The final compiled summary of the text.\n",
    "\n",
    "    The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on the `detail` parameter. \n",
    "    It then splits the text into chunks and summarizes each chunk. If `summarize_recursively` is True, each summary is based on the previous summaries, \n",
    "    adding more context to the summarization process. The function returns a compiled summary of all chunks.\n",
    "    \"\"\"\n",
    "    \n",
    "    # check detail is set correctly\n",
    "    assert 0 <= detail <= 1\n",
    "\n",
    "    # interpolate the number of chunks based to get specified level of detail\n",
    "    max_chunks = len(chunk_on_delimiter(text, minimum_chunk_size, chunk_delimiter))\n",
    "    min_chunks = 1\n",
    "    num_chunks = int(min_chunks + detail * (max_chunks - min_chunks))\n",
    "\n",
    "    # adjust chunk_size based on interpolated number of chunks\n",
    "    document_length = len(tokenize(text))\n",
    "    chunk_size = max(minimum_chunk_size, document_length // num_chunks)\n",
    "    text_chunks = chunk_on_delimiter(text, chunk_size, chunk_delimiter)\n",
    "    if verbose:\n",
    "        print(f\"Splitting the text into {len(text_chunks)} chunks to be summarized.\")\n",
    "        print(f\"Chunk lengths are {[len(tokenize(x)) for x in text_chunks]}\")\n",
    "\n",
    "    # set system message\n",
    "    system_message_content = \"Summarize the following text.\"\n",
    "    if additional_instructions is not None:\n",
    "        system_message_content += f\"\\n\\n{additional_instructions}\"\n",
    "\n",
    "    accumulated_summaries = []\n",
    "    for chunk in tqdm(text_chunks):\n",
    "        if summarize_recursively and accumulated_summaries:\n",
    "            # Creating a structured prompt for recursive summarization\n",
    "            accumulated_summaries_string = '\\n\\n'.join(accumulated_summaries)\n",
    "            user_message_content = f\"Previous summaries:\\n\\n{accumulated_summaries_string}\\n\\nText to summarize next:\\n\\n{chunk}\"\n",
    "        else:\n",
    "            # Directly passing the chunk for summarization without recursive context\n",
    "            user_message_content = chunk\n",
    "\n",
    "        # Constructing messages based on whether recursive summarization is applied\n",
    "        messages = [\n",
    "            {\"role\": \"system\", \"content\": system_message_content},\n",
    "            {\"role\": \"user\", \"content\": user_message_content}\n",
    "        ]\n",
    "\n",
    "        # Assuming this function gets the completion and works as expected\n",
    "        response = get_chat_completion(messages, model=model)\n",
    "        accumulated_summaries.append(response)\n",
    "\n",
    "    # Compile final summary from partial summaries\n",
    "    final_summary = '\\n\\n'.join(accumulated_summaries)\n",
    "\n",
    "    return final_summary"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "Now we can use this utility to produce summaries with varying levels of detail. By increasing 'detail' from 0 to 1 we get progressively longer summaries of the underlying document. A higher value for the detail parameter results in a more detailed summary because the utility first splits the document into a greater number of chunks. Each chunk is then summarized, and the final summary is a concatenation of all the chunk summaries."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 1 chunks to be summarized.\n",
      "Chunk lengths are [15781]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:05<00:00,  5.98s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_0 = summarize(united_states_wikipedia_text, detail=0, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 5 chunks to be summarized.\n",
      "Chunk lengths are [3945, 3941, 3943, 3915, 37]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5/5 [00:15<00:00,  3.18s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_pt1 = summarize(united_states_wikipedia_text, detail=0.1, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 8 chunks to be summarized.\n",
      "Chunk lengths are [2214, 2253, 2249, 2255, 2254, 2255, 2221, 84]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 8/8 [00:19<00:00,  2.46s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_pt2 = summarize(united_states_wikipedia_text, detail=0.2, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 14 chunks to be summarized.\n",
      "Chunk lengths are [1198, 1209, 1210, 1209, 1212, 1192, 1176, 1205, 1212, 1201, 1210, 1210, 1192, 154]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 14/14 [00:37<00:00,  2.69s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_pt4 = summarize(united_states_wikipedia_text, detail=0.4, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 27 chunks to be summarized.\n",
      "Chunk lengths are [602, 596, 601, 601, 604, 598, 572, 594, 592, 592, 604, 593, 578, 582, 597, 600, 596, 555, 582, 601, 582, 587, 581, 595, 598, 568, 445]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 27/27 [01:20<00:00,  2.99s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_pt8 = summarize(united_states_wikipedia_text, detail=0.8, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Splitting the text into 33 chunks to be summarized.\n",
      "Chunk lengths are [490, 443, 475, 490, 501, 470, 472, 487, 479, 477, 447, 442, 490, 468, 488, 477, 493, 493, 472, 491, 490, 501, 493, 468, 500, 500, 474, 460, 489, 462, 490, 482, 445]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 33/33 [01:25<00:00,  2.58s/it]\n"
     ]
    }
   ],
   "source": [
    "summary_with_detail_1 = summarize(united_states_wikipedia_text, detail=1.0, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original document is ~15k tokens long. Notice how large the gap is between the length of 'summary_pt0' and summary_pt10'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[291, 681, 965, 1734, 3542, 4182]"
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lengths of summaries\n",
    "[len(tokenize(x)) for x in [summary_with_detail_0, summary_with_detail_pt1, summary_with_detail_pt2, summary_with_detail_pt4, summary_with_detail_pt8, summary_with_detail_1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect the summaries to get a feel for what that means."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The United States of America is a diverse country located in North America, with a population exceeding 334 million. It is a federation of 50 states, a federal capital district, and various territories. The country has a rich history, from the migration of Paleo-Indians over 12,000 years ago to the American Revolution and the Civil War. The U.S. emerged as a superpower after World War II and played a significant role in the Cold War era.\n",
      "\n",
      "The U.S. government is a presidential constitutional republic with three separate branches: legislative, executive, and judicial. The country has a strong emphasis on liberty, equality under the law, individualism, and limited government. Economically, the U.S. has the largest nominal GDP in the world and is a leader in economic competitiveness, innovation, and human rights. The U.S. is also a founding member of various international organizations.\n",
      "\n",
      "The U.S. has a rich cultural landscape, with influences from various ethnic groups and traditions. American literature, music, cinema, theater, and visual arts have made significant contributions to global culture. The country is known for its diverse cuisine, with dishes influenced by various immigrant groups. The U.S. also has a strong tradition in education, with a focus on higher learning and research.\n",
      "\n",
      "Overall, the United States is a country with a rich history, diverse culture, and significant global influence in various fields such as politics, economics, and the arts.\n"
     ]
    }
   ],
   "source": [
    "print(summary_with_detail_0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The United States of America is a country located in North America, consisting of 50 states, a federal capital district, and various territories. It has a rich history, from the arrival of Paleo-Indians over 12,000 years ago to British colonization, the American Revolution, and the Civil War. The U.S. is a presidential constitutional republic with a strong emphasis on liberty, equality, and limited government. It is a global economic powerhouse, with the largest nominal GDP since 1890 and significant influence in international organizations. The country's history includes European colonization, conflicts with Native Americans, the Revolutionary War, and westward expansion. The U.S. Constitution, drafted in 1787, established a federal government with three branches and a system of checks and balances. The U.S. has played a significant role in world events, including World War II and the Cold War, emerging as a superpower after the collapse of the Soviet Union.\n",
      "\n",
      "The text discusses key events in American history from the Missouri Compromise to the rise of the United States as a superpower, the Cold War era, and contemporary history. It covers topics such as the Civil War, Reconstruction, post-Civil War era, immigration, industrialization, World Wars, Cold War, civil rights movement, and modern events like the September 11 attacks and the Great Recession. The text highlights significant developments in American society, politics, and economy over the years.\n",
      "\n",
      "The text discusses the geography, climate, biodiversity, conservation efforts, government and politics, political parties, subdivisions, and foreign relations of the United States. It highlights the country's physical features, climate diversity, environmental issues, governmental structure, political parties, state subdivisions, and diplomatic relations. The text also mentions the Capitol attack in January 2021 and the country's environmental commitments and challenges.\n",
      "\n",
      "The text discusses the United States' international relations, military capabilities, law enforcement, crime rates, economy, and scientific advancements. It highlights the country's membership in various international organizations, strong alliances with countries like the UK, Canada, and Japan, and its military spending and capabilities. The text also covers law enforcement agencies, incarceration rates, economic status as the world's largest economy, technological advancements, and scientific achievements, including the country's leadership in artificial intelligence and space exploration. Additionally, it addresses income inequality, poverty rates, and social welfare policies in the United States.\n",
      "\n",
      "The text provides information about various aspects of the United States, including its scientific and innovation rankings, energy consumption, transportation infrastructure, demographics, language diversity, immigration, religion, urbanization, and healthcare. The United States ranks high in scientific publications and patents, but also consumes a significant amount of fossil fuels and emits greenhouse gases. The country has a diverse population with various racial and ethnic groups, and English is the most commonly spoken language. Immigration plays a significant role in the U.S., with a large immigrant population. The country is religiously diverse, with a majority believing in a higher power. Urban areas are home to most Americans, with rapid population growth in many metropolitan areas. The U.S. has a complex healthcare system, with notable medical facilities like the Texas Medical Center in Houston.\n",
      "\n",
      "The text discusses various aspects of life in the United States, including changes in life expectancy, healthcare, education, culture, literature, and mass media. It mentions that life expectancy in the U.S. has fluctuated due to factors like the COVID-19 pandemic, opioid overdoses, and suicides. The U.S. healthcare system is criticized for its high spending and poor outcomes compared to other countries. The education system is decentralized, with high spending per student. The U.S. has a diverse culture and society, with strong protections for free speech and progressive attitudes towards human sexuality. American literature has a rich history, influenced by various movements and authors. The mass media landscape includes major broadcasters, cable television, radio, and newspapers with global reach.\n",
      "\n",
      "The text discusses various aspects of American culture, including alternative newspapers in major cities, popular websites, the video game market, theater, visual arts, music, fashion, cinema, and cuisine. It highlights the influence of American culture globally, such as in music, fashion, cinema, and cuisine. The text also mentions significant figures and events in American cultural history, such as the Tony Awards, Broadway theater, the Hudson River School, and the Golden Age of Hollywood. Additionally, it touches on the impact of American cuisine, the film industry, and the arts on both domestic and international audiences.\n",
      "\n",
      "The American fast-food industry, known for pioneering the drive-through format in the 1940s, is considered a symbol of U.S. marketing dominance. Major companies like McDonald's, Burger King, Pizza Hut, KFC, and Domino's Pizza have a global presence with numerous outlets worldwide.\n"
     ]
    }
   ],
   "source": [
    "print(summary_with_detail_pt2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this utility also allows passing additional instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5/5 [00:17<00:00,  3.54s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "- The USA is a federation of 50 states, a federal capital district, and 326 Indian reservations.\n",
      "- It has sovereignty over five major unincorporated island territories and various uninhabited islands.\n",
      "- The population of the USA exceeds 334 million.\n",
      "- The USA has the world's third-largest land area and largest maritime exclusive economic zone.\n",
      "- The USA has had the largest nominal GDP in the world since 1890 and accounted for over 25% of the global economy in 2023.\n",
      "- The USA has the highest median income per capita of any non-microstate.\n",
      "- The USA ranks high in economic competitiveness, productivity, innovation, human rights, and higher education.\n",
      "- The USA is a founding member of various international organizations such as the World Bank, IMF, NATO, and the UN Security Council.\n",
      "\n",
      "- In the early 1960s, President Lyndon Johnson's Great Society plan led to groundbreaking laws and policies to counteract institutional racism.\n",
      "- By 1985, the majority of women aged 16 and older in the US were employed.\n",
      "- In the 1990s, the US saw the longest economic expansion in its history, with advances in technology such as the World Wide Web and gene therapy.\n",
      "- The US spent $877 billion on its military in 2022, the largest amount globally, accounting for 39% of global military spending and 3.5% of the country's GDP.\n",
      "- The US has the third-largest combined armed forces in the world, behind China and India.\n",
      "- As of January 2023, the US had the sixth highest per-capita incarceration rate globally, with almost 2 million people incarcerated.\n",
      "- The US had a nominal GDP of $27 trillion in 2023, constituting over 25% of the global economy.\n",
      "\n",
      "- Real compounded annual GDP growth in the US was 3.3%, compared to 2.3% for the rest of the Group of Seven.\n",
      "- The US ranks first in the world by disposable income per capita and nominal GDP, second by GDP (PPP) after China, and ninth by GDP (PPP) per capita.\n",
      "- The US has 136 of the world's 500 largest companies headquartered in the country.\n",
      "- The US dollar is the most used currency in international transactions and is the world's foremost reserve currency.\n",
      "- The US ranked second in the Global Competitiveness Report in 2019.\n",
      "- The US is the second-largest manufacturing country after China as of 2021.\n",
      "- The US has the highest average household and employee income among OECD member states.\n",
      "- The US has 735 billionaires and nearly 22 million millionaires as of 2023.\n",
      "- In 2022, there were about 582,500 sheltered and unsheltered homeless persons in the US.\n",
      "- The US receives approximately 81% of its energy from fossil fuels.\n",
      "- The US has the highest vehicle ownership per capita in the world, with 910 vehicles per 1000 people.\n",
      "- The US has 333 incorporated municipalities with populations over 100,000.\n",
      "- The average American life expectancy at birth was 77.5 years in 2022.\n",
      "- Approximately one-third of the US adult population is obese, and another third is overweight.\n",
      "- The US spends more on education per student than any other country in the world, averaging $12,794 per year per public elementary and secondary school student in 2016-2017.\n",
      "\n",
      "- The United States has the most Nobel Prize winners in history, with 411 awards won.\n",
      "- American higher education is dominated by state university systems, with private universities enrolling about 20% of students.\n",
      "- The U.S. spends more per student on higher education than the OECD average and all other nations in combined public and private spending.\n",
      "- Student loan debt in the U.S. has increased by 102% in the last decade, exceeding 1.7 trillion dollars as of 2022.\n",
      "- Americans donated 1.44% of total GDP to charity, the highest rate in the world.\n",
      "- The U.S. has the world's largest music market with a total retail value of $15.9 billion in 2022.\n",
      "- The U.S. restaurant industry was projected at $899 billion in sales for 2020, employing over 15 million people.\n",
      "- The United States is home to over 220 Michelin Star rated restaurants, with 70 in New York City alone.\n",
      "- California produces 84% of all US wine, making the U.S. the fourth-largest wine producing country in the world.\n",
      "\n",
      "- American companies with global presence include McDonald's, Burger King, Pizza Hut, Kentucky Fried Chicken, and Domino's Pizza\n",
      "- These companies have numerous outlets around the world\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "summary_with_additional_instructions = summarize(united_states_wikipedia_text, detail=0.1, additional_instructions=\"Write in point form and focus on numerical data.\")\n",
    "print(summary_with_additional_instructions)"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "Finally, note that the utility allows for recursive summarization, where each summary is based on the previous summaries, adding more context to the summarization process. This can be enabled by setting the `summarize_recursively` parameter to True. This is more computationally expensive, but can increase consistency and coherence of the combined summary."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 5/5 [00:12<00:00,  2.40s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The United States of America is a country located in North America with 50 states, a federal capital district, and various territories. It has a rich history, from early indigenous migrations to European colonization, the American Revolution, Civil War, and emergence as a global superpower. The U.S. government is a presidential constitutional republic with a strong emphasis on liberty, equality, and limited government. Economically, the U.S. is a powerhouse with a significant global influence. The country has been involved in major historical events such as World War II, the Cold War, and the civil rights movement.\n",
      "\n",
      "In the 1960s, the U.S. saw significant social changes with President Lyndon Johnson's Great Society plan addressing institutional racism, the counterculture movement influencing attitudes towards drug use and sexuality, and opposition to the Vietnam War. The 1980s and 1990s marked the end of the Cold War, solidifying the U.S. as a superpower. The 1990s saw economic growth, technological advancements, and the Gulf War. The 21st century brought challenges like the September 11 attacks, the Great Recession, and increased political polarization. The U.S. has diverse geography and climate, rich biodiversity, and faces environmental issues. The government is a federal republic with a strong system of checks and balances, and a two-party political system. The U.S. has a significant global presence in foreign relations, a powerful military, and law enforcement agencies. Economically, it is a leading global player with a diverse economy and the world's largest GDP.\n",
      "\n",
      "The text discusses the economic strength of the United States, highlighting its high GDP, disposable income per capita, and global economic influence. It mentions the dominance of the U.S. dollar in international transactions and its leading position in various economic sectors such as technology, finance, and manufacturing. The text also touches on income inequality, poverty rates, and social issues like homelessness and food insecurity in the country. Additionally, it covers the U.S.'s role in scientific innovation, energy consumption, transportation infrastructure, demographics, immigration, religion, urbanization, healthcare, and education.\n",
      "\n",
      "The text discusses various aspects of American culture and society, including education, literature, mass media, theater, visual arts, music, fashion, cinema, and cuisine. It highlights the country's achievements in education, with a global reputation for tertiary education and numerous top-ranking universities. The text also covers American values, ethnic diversity, free speech protections, and progressive attitudes towards human sexuality and LGBT rights. Additionally, it mentions the country's contributions to literature, mass media, theater, visual arts, music, and cuisine, showcasing its significant influence and global presence in these areas.\n",
      "\n",
      "American fast-food chains like McDonald's, Burger King, Pizza Hut, Kentucky Fried Chicken, and Domino's Pizza have a widespread global presence with numerous outlets worldwide.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "recursive_summary = summarize(united_states_wikipedia_text, detail=0.1, summarize_recursively=True, additional_instructions=\"Don't overuse repetitive phrases to introduce each section\")\n",
    "print(recursive_summary)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}