You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openai-cookbook/examples/book_translation/translate_latex_book.ipynb

1177 lines
28 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Translate a book written in LaTeX from Slovenian into English\n",
"\n",
"With permission of the author, we will demonstrate how to translate the book [Euclidean Plane Geometry](https://sites.google.com/site/projektivna/), written by Milan Mitrović from Slovenian into English, without modifying any of the LaTeX commands.\n",
"\n",
"To achieve this, we will first split the book into chunks, each roughly a page long, then translate each chunk into English, and finally stitch them back together."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Read in the data"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1485565"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from openai import OpenAI\n",
"import os\n",
"from transformers import GPT2Tokenizer\n",
"\n",
"client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if you didn't set as an env var>\"))\n",
"\n",
"# OpenAI GPT-2 tokenizer is the same as GPT-3 tokenizer\n",
"# we use it to count the number of tokens in the text\n",
"tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
"\n",
"with open(\"data/geometry_slovenian.tex\", \"r\") as f:\n",
" text = f.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 Count the tokens in each chunk"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Token indices sequence length is longer than the specified maximum sequence length for this model (1327 > 1024). Running this sequence through the model will result in indexing errors\n"
]
},
{
"data": {
"text/plain": [
"1473"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chunks = text.split('\\n\\n')\n",
"ntokens = []\n",
"for chunk in chunks:\n",
" ntokens.append(len(tokenizer.encode(chunk)))\n",
"max(ntokens)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It turns out that a double newline is a good separator in this case, in order not to break the flow of the text. Also no individual chunk is larger than 1500 tokens. The model we will use is text-davinci-002, which has a limit of 4096 tokens, so we don't need to worry about breaking the chunks down further.\n",
"\n",
"We will group the shorter chunks into chunks of around 1000 tokens, to increase the coherence of the text, and decrease the frequency of breaks within the text."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"869"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def group_chunks(chunks, ntokens, max_len=1000, hard_max_len=3000):\n",
" \"\"\"\n",
" Group very short chunks, to form approximately page long chunks.\n",
" \"\"\"\n",
" batches = []\n",
" cur_batch = \"\"\n",
" cur_tokens = 0\n",
" \n",
" # iterate over chunks, and group the short ones together\n",
" for chunk, ntoken in zip(chunks, ntokens):\n",
" # discard chunks that exceed hard max length\n",
" if ntoken > hard_max_len:\n",
" print(f\"Warning: Chunk discarded for being too long ({ntoken} tokens > {hard_max_len} token limit). Preview: '{chunk[:50]}...'\")\n",
" continue\n",
"\n",
" # if room in current batch, add new chunk\n",
" if cur_tokens + 1 + ntoken <= max_len:\n",
" cur_batch += \"\\n\\n\" + chunk\n",
" cur_tokens += 1 + ntoken # adds 1 token for the two newlines\n",
" # otherwise, record the batch and start a new one\n",
" else:\n",
" batches.append(cur_batch)\n",
" cur_batch = chunk\n",
" cur_tokens = ntoken\n",
" \n",
" if cur_batch: # add the last batch if it's not empty\n",
" batches.append(cur_batch)\n",
" \n",
" return batches\n",
"\n",
"\n",
"chunks = group_chunks(chunks, ntokens)\n",
"len(chunks)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that adding a sample untranslated and translated first command, where only the content of the chapter name needs to be translated, helps to get more consistent results.\n",
"\n",
"The format of the prompt sent to the model consists of:\n",
"1. A high level instruction to translate only the text, but not commands into the desired language\n",
"2. A sample untranslated command, where only the content of the chapter name needs to be translated\n",
"3. The chunk of text to be translated\n",
"4. The translated sample command from 2, which shows the model the beginning of the translation process\n",
"\n",
"The expected output is the translated chunk of text."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Let $\\mathcal{I}=\\mathcal{S}_{AB} \\circ\\mathcal{S}_{CA}\n",
" \\circ\\mathcal{S}_{BC}$. By \\ref{izoZrcdrsprq} is\n",
" $\\mathcal{I}$ a mirror reflection. Let $A_1$, $B_1$ and $C_1$ be in order the center points of the lines $BC$, $AC$ and $AB$ of the triangle $ABC$.\n",
" Because it is a right triangle is $\\mathcal{I}(A_1C_1)=A_1C_1$, which\n",
" means that the line $A_1C_1$ is of this mirror reflection. It is not\n",
" difficult to prove that for the point $A'_1=\\mathcal{I}(A_1)$ (both\n",
" lie on the axis $A_1C_1$) is\n",
" $\\overrightarrow{A_1A'_1}=3\\overrightarrow{A_1C_1}$, so\n",
" $\\mathcal{I}=\\mathcal{G}_{3\\overrightarrow{A_1C_1}}$.\n",
"\n",
"\\item \\res{Given are the points $A$ and $B$ on the same side of the line\n",
"$p$.\n",
"Draw the line $XY$, which lies on the line $p$ and is consistent\n",
"with the given line $l$, so that the sum\n",
"$|AX|+|XY|+|YB|$ is minimal.}\n",
"\n",
"Let $A'=\\mathcal{G}_{\\overrightarrow{MN}}(A)$ (where $M,N\\in\n",
"p$ and $MN\\cong l$). The point $Y$ is obtained as the intersection of the lines $p$\n",
"and $X'Y$ (see also example \\ref{HeronProbl}).\n",
"\n",
"\\item \\res{Let $ABC$ be an isosceles right triangle with a right angle at the vertex $A$. What does the composite\n",
"$\\mathcal{G}_{\\overrightarrow{AB}}\\circ \\mathcal{G}_{\\overrightarrow{CA}}$ represent?}\n",
"\n",
"Let $p$ and $q$ be the simetrali of the sides $CA$ and $AB$ of the triangle\n",
"$ABC$. By \\ref{izoZrcDrsKompSrOsn} is:\n",
" $$\\mathcal{G}_{\\overrightarrow{AB}}\\circ\n",
" \\mathcal{G}_{\\overrightarrow{CA}}=\n",
" \\mathcal{S}_q\\circ\\mathcal{S}_A\\circ\\mathcal{S}_A\\circ\\mathcal{S}_p=\n",
" \\mathcal{S}_q\\circ\\mathcal{S}_p.$$ Because $ABC$ is an isosceles\n",
" right triangle with a right angle at the vertex $A$, the lines $p$ and $q$ are perpendicular and intersect at the center $S$\n",
" of the hypotenuse $BC$. Therefore\n",
" $\\mathcal{G}_{\\overrightarrow{AB}}\\circ\n",
" \\mathcal{G}_{\\overrightarrow{CA}}=\\mathcal{S}_q\n",
" \\circ\\mathcal{S}_p=\\mathcal{S}_S$.\n",
"\n",
"\\item \\res{In the same plane are given the lines\n",
"$a$, $b$ and $c$.\n",
"Draw the points $A\\in a$ and $B\\in b$\n",
"so that $\\mathcal{S}_c(A)=B$.}\n"
]
}
],
"source": [
"def translate_chunk(chunk, model='gpt-3.5-turbo',\n",
" dest_language='English',\n",
" sample_translation=(\"\\poglavje{Osnove Geometrije} \\label{osn9Geom}\", \"\\poglavje{The basics of Geometry} \\label{osn9Geom}\")\n",
" ):\n",
" prompt = f'''Translate only the text from the following LaTeX document into {dest_language}. Leave all LaTeX commands unchanged\n",
" \n",
"\"\"\"\n",
"{sample_translation[0]}\n",
"{chunk}\"\"\"\n",
"\n",
"{sample_translation[1]}\n",
"'''\n",
" response = client.chat.completions.create(\n",
" messages=[{\"role\": \"user\", \"content\":prompt}],\n",
" model=model,\n",
" temperature=0,\n",
" top_p=1,\n",
" max_tokens=1500,\n",
" )\n",
" result = response.choices[0].message.content.strip()\n",
" result = result.replace('\"\"\"', '') # remove the double quotes, as we used them to surround the text\n",
" return result\n",
"print(translate_chunk(chunks[800], model='gpt-3.5-turbo', dest_language='English'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see here that this one chunk in particular translates only the text, but leaves LaTeX commands intact.\n",
"\n",
"Let's now translate all the chunks in the book - this will take 2-3 hours, as we're processing requests sequentially."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 / 869\n",
"1 / 869\n",
"2 / 869\n",
"3 / 869\n",
"4 / 869\n",
"5 / 869\n",
"6 / 869\n",
"7 / 869\n",
"8 / 869\n",
"9 / 869\n",
"10 / 869\n",
"11 / 869\n",
"12 / 869\n",
"13 / 869\n",
"14 / 869\n",
"15 / 869\n",
"16 / 869\n",
"17 / 869\n",
"18 / 869\n",
"19 / 869\n",
"20 / 869\n",
"21 / 869\n",
"22 / 869\n",
"23 / 869\n",
"24 / 869\n",
"25 / 869\n",
"26 / 869\n",
"27 / 869\n",
"28 / 869\n",
"29 / 869\n",
"30 / 869\n",
"31 / 869\n",
"32 / 869\n",
"33 / 869\n",
"34 / 869\n",
"35 / 869\n",
"36 / 869\n",
"37 / 869\n",
"38 / 869\n",
"39 / 869\n",
"40 / 869\n",
"41 / 869\n",
"42 / 869\n",
"43 / 869\n",
"44 / 869\n",
"45 / 869\n",
"46 / 869\n",
"47 / 869\n",
"48 / 869\n",
"49 / 869\n",
"50 / 869\n",
"51 / 869\n",
"52 / 869\n",
"53 / 869\n",
"54 / 869\n",
"55 / 869\n",
"56 / 869\n",
"57 / 869\n",
"58 / 869\n",
"59 / 869\n",
"60 / 869\n",
"61 / 869\n",
"62 / 869\n",
"63 / 869\n",
"64 / 869\n",
"65 / 869\n",
"66 / 869\n",
"67 / 869\n",
"68 / 869\n",
"69 / 869\n",
"70 / 869\n",
"71 / 869\n",
"72 / 869\n",
"73 / 869\n",
"74 / 869\n",
"75 / 869\n",
"76 / 869\n",
"77 / 869\n",
"78 / 869\n",
"79 / 869\n",
"80 / 869\n",
"81 / 869\n",
"82 / 869\n",
"83 / 869\n",
"84 / 869\n",
"85 / 869\n",
"86 / 869\n",
"87 / 869\n",
"88 / 869\n",
"89 / 869\n",
"90 / 869\n",
"91 / 869\n",
"92 / 869\n",
"93 / 869\n",
"94 / 869\n",
"95 / 869\n",
"96 / 869\n",
"97 / 869\n",
"98 / 869\n",
"99 / 869\n",
"100 / 869\n",
"101 / 869\n",
"102 / 869\n",
"103 / 869\n",
"104 / 869\n",
"105 / 869\n",
"106 / 869\n",
"107 / 869\n",
"108 / 869\n",
"109 / 869\n",
"110 / 869\n",
"111 / 869\n",
"112 / 869\n",
"113 / 869\n",
"114 / 869\n",
"115 / 869\n",
"116 / 869\n",
"117 / 869\n",
"118 / 869\n",
"119 / 869\n",
"120 / 869\n",
"121 / 869\n",
"122 / 869\n",
"123 / 869\n",
"124 / 869\n",
"125 / 869\n",
"126 / 869\n",
"127 / 869\n",
"128 / 869\n",
"129 / 869\n",
"130 / 869\n",
"131 / 869\n",
"132 / 869\n",
"133 / 869\n",
"134 / 869\n",
"135 / 869\n",
"136 / 869\n",
"137 / 869\n",
"138 / 869\n",
"139 / 869\n",
"140 / 869\n",
"141 / 869\n",
"142 / 869\n",
"143 / 869\n",
"144 / 869\n",
"145 / 869\n",
"146 / 869\n",
"147 / 869\n",
"148 / 869\n",
"149 / 869\n",
"150 / 869\n",
"151 / 869\n",
"152 / 869\n",
"153 / 869\n",
"154 / 869\n",
"155 / 869\n",
"156 / 869\n",
"157 / 869\n",
"158 / 869\n",
"159 / 869\n",
"160 / 869\n",
"161 / 869\n",
"162 / 869\n",
"163 / 869\n",
"164 / 869\n",
"165 / 869\n",
"166 / 869\n",
"167 / 869\n",
"168 / 869\n",
"169 / 869\n",
"170 / 869\n",
"171 / 869\n",
"172 / 869\n",
"173 / 869\n",
"174 / 869\n",
"175 / 869\n",
"176 / 869\n",
"177 / 869\n",
"178 / 869\n",
"179 / 869\n",
"180 / 869\n",
"181 / 869\n",
"182 / 869\n",
"183 / 869\n",
"184 / 869\n",
"185 / 869\n",
"186 / 869\n",
"187 / 869\n",
"188 / 869\n",
"189 / 869\n",
"190 / 869\n",
"191 / 869\n",
"192 / 869\n",
"193 / 869\n",
"194 / 869\n",
"195 / 869\n",
"196 / 869\n",
"197 / 869\n",
"198 / 869\n",
"199 / 869\n",
"200 / 869\n",
"201 / 869\n",
"202 / 869\n",
"203 / 869\n",
"204 / 869\n",
"205 / 869\n",
"206 / 869\n",
"207 / 869\n",
"208 / 869\n",
"209 / 869\n",
"210 / 869\n",
"211 / 869\n",
"212 / 869\n",
"213 / 869\n",
"214 / 869\n",
"215 / 869\n",
"216 / 869\n",
"217 / 869\n",
"218 / 869\n",
"219 / 869\n",
"220 / 869\n",
"221 / 869\n",
"222 / 869\n",
"223 / 869\n",
"224 / 869\n",
"225 / 869\n",
"226 / 869\n",
"227 / 869\n",
"228 / 869\n",
"229 / 869\n",
"230 / 869\n",
"231 / 869\n",
"232 / 869\n",
"233 / 869\n",
"234 / 869\n",
"235 / 869\n",
"236 / 869\n",
"237 / 869\n",
"238 / 869\n",
"239 / 869\n",
"240 / 869\n",
"241 / 869\n",
"242 / 869\n",
"243 / 869\n",
"244 / 869\n",
"245 / 869\n",
"246 / 869\n",
"247 / 869\n",
"248 / 869\n",
"249 / 869\n",
"250 / 869\n",
"251 / 869\n",
"252 / 869\n",
"253 / 869\n",
"254 / 869\n",
"255 / 869\n",
"256 / 869\n",
"257 / 869\n",
"258 / 869\n",
"259 / 869\n",
"260 / 869\n",
"261 / 869\n",
"262 / 869\n",
"263 / 869\n",
"264 / 869\n",
"265 / 869\n",
"266 / 869\n",
"267 / 869\n",
"268 / 869\n",
"269 / 869\n",
"270 / 869\n",
"271 / 869\n",
"272 / 869\n",
"273 / 869\n",
"274 / 869\n",
"275 / 869\n",
"276 / 869\n",
"277 / 869\n",
"278 / 869\n",
"279 / 869\n",
"280 / 869\n",
"281 / 869\n",
"282 / 869\n",
"283 / 869\n",
"284 / 869\n",
"285 / 869\n",
"286 / 869\n",
"287 / 869\n",
"288 / 869\n",
"289 / 869\n",
"290 / 869\n",
"291 / 869\n",
"292 / 869\n",
"293 / 869\n",
"294 / 869\n",
"295 / 869\n",
"296 / 869\n",
"297 / 869\n",
"298 / 869\n",
"299 / 869\n",
"300 / 869\n",
"301 / 869\n",
"302 / 869\n",
"303 / 869\n",
"304 / 869\n",
"305 / 869\n",
"306 / 869\n",
"307 / 869\n",
"308 / 869\n",
"309 / 869\n",
"310 / 869\n",
"311 / 869\n",
"312 / 869\n",
"313 / 869\n",
"314 / 869\n",
"315 / 869\n",
"316 / 869\n",
"317 / 869\n",
"318 / 869\n",
"319 / 869\n",
"320 / 869\n",
"321 / 869\n",
"322 / 869\n",
"323 / 869\n",
"324 / 869\n",
"325 / 869\n",
"326 / 869\n",
"327 / 869\n",
"328 / 869\n",
"329 / 869\n",
"330 / 869\n",
"331 / 869\n",
"332 / 869\n",
"333 / 869\n",
"334 / 869\n",
"335 / 869\n",
"336 / 869\n",
"337 / 869\n",
"338 / 869\n",
"339 / 869\n",
"340 / 869\n",
"341 / 869\n",
"342 / 869\n",
"343 / 869\n",
"344 / 869\n",
"345 / 869\n",
"346 / 869\n",
"347 / 869\n",
"348 / 869\n",
"349 / 869\n",
"350 / 869\n",
"351 / 869\n",
"352 / 869\n",
"353 / 869\n",
"354 / 869\n",
"355 / 869\n",
"356 / 869\n",
"357 / 869\n",
"358 / 869\n",
"359 / 869\n",
"360 / 869\n",
"361 / 869\n",
"362 / 869\n",
"363 / 869\n",
"364 / 869\n",
"365 / 869\n",
"366 / 869\n",
"367 / 869\n",
"368 / 869\n",
"369 / 869\n",
"370 / 869\n",
"371 / 869\n",
"372 / 869\n",
"373 / 869\n",
"374 / 869\n",
"375 / 869\n",
"376 / 869\n",
"377 / 869\n",
"378 / 869\n",
"379 / 869\n",
"380 / 869\n",
"381 / 869\n",
"382 / 869\n",
"383 / 869\n",
"384 / 869\n",
"385 / 869\n",
"386 / 869\n",
"387 / 869\n",
"388 / 869\n",
"389 / 869\n",
"390 / 869\n",
"391 / 869\n",
"392 / 869\n",
"393 / 869\n",
"394 / 869\n",
"395 / 869\n",
"396 / 869\n",
"397 / 869\n",
"398 / 869\n",
"399 / 869\n",
"400 / 869\n",
"401 / 869\n",
"402 / 869\n",
"403 / 869\n",
"404 / 869\n",
"405 / 869\n",
"406 / 869\n",
"407 / 869\n",
"408 / 869\n",
"409 / 869\n",
"410 / 869\n",
"411 / 869\n",
"412 / 869\n",
"413 / 869\n",
"414 / 869\n",
"415 / 869\n",
"416 / 869\n",
"417 / 869\n",
"418 / 869\n",
"419 / 869\n",
"420 / 869\n",
"421 / 869\n",
"422 / 869\n",
"423 / 869\n",
"424 / 869\n",
"425 / 869\n",
"426 / 869\n",
"427 / 869\n",
"428 / 869\n",
"429 / 869\n",
"430 / 869\n",
"431 / 869\n",
"432 / 869\n",
"433 / 869\n",
"434 / 869\n",
"435 / 869\n",
"436 / 869\n",
"437 / 869\n",
"438 / 869\n",
"439 / 869\n",
"440 / 869\n",
"441 / 869\n",
"442 / 869\n",
"443 / 869\n",
"444 / 869\n",
"445 / 869\n",
"446 / 869\n",
"447 / 869\n",
"448 / 869\n",
"449 / 869\n",
"450 / 869\n",
"451 / 869\n",
"452 / 869\n",
"453 / 869\n",
"454 / 869\n",
"455 / 869\n",
"456 / 869\n",
"457 / 869\n",
"458 / 869\n",
"459 / 869\n",
"460 / 869\n",
"461 / 869\n",
"462 / 869\n",
"463 / 869\n",
"464 / 869\n",
"465 / 869\n",
"466 / 869\n",
"467 / 869\n",
"468 / 869\n",
"469 / 869\n",
"470 / 869\n",
"471 / 869\n",
"472 / 869\n",
"473 / 869\n",
"474 / 869\n",
"475 / 869\n",
"476 / 869\n",
"477 / 869\n",
"478 / 869\n",
"479 / 869\n",
"480 / 869\n",
"481 / 869\n",
"482 / 869\n",
"483 / 869\n",
"484 / 869\n",
"485 / 869\n",
"486 / 869\n",
"487 / 869\n",
"488 / 869\n",
"489 / 869\n",
"490 / 869\n",
"491 / 869\n",
"492 / 869\n",
"493 / 869\n",
"494 / 869\n",
"495 / 869\n",
"496 / 869\n",
"497 / 869\n",
"498 / 869\n",
"499 / 869\n",
"500 / 869\n",
"501 / 869\n",
"502 / 869\n",
"503 / 869\n",
"504 / 869\n",
"505 / 869\n",
"506 / 869\n",
"507 / 869\n",
"508 / 869\n",
"509 / 869\n",
"510 / 869\n",
"511 / 869\n",
"512 / 869\n",
"513 / 869\n",
"514 / 869\n",
"515 / 869\n",
"516 / 869\n",
"517 / 869\n",
"518 / 869\n",
"519 / 869\n",
"520 / 869\n",
"521 / 869\n",
"522 / 869\n",
"523 / 869\n",
"524 / 869\n",
"525 / 869\n",
"526 / 869\n",
"527 / 869\n",
"528 / 869\n",
"529 / 869\n",
"530 / 869\n",
"531 / 869\n",
"532 / 869\n",
"533 / 869\n",
"534 / 869\n",
"535 / 869\n",
"536 / 869\n",
"537 / 869\n",
"538 / 869\n",
"539 / 869\n",
"540 / 869\n",
"541 / 869\n",
"542 / 869\n",
"543 / 869\n",
"544 / 869\n",
"545 / 869\n",
"546 / 869\n",
"547 / 869\n",
"548 / 869\n",
"549 / 869\n",
"550 / 869\n",
"551 / 869\n",
"552 / 869\n",
"553 / 869\n",
"554 / 869\n",
"555 / 869\n",
"556 / 869\n",
"557 / 869\n",
"558 / 869\n",
"559 / 869\n",
"560 / 869\n",
"561 / 869\n",
"562 / 869\n",
"563 / 869\n",
"564 / 869\n",
"565 / 869\n",
"566 / 869\n",
"567 / 869\n",
"568 / 869\n",
"569 / 869\n",
"570 / 869\n",
"571 / 869\n",
"572 / 869\n",
"573 / 869\n",
"574 / 869\n",
"575 / 869\n",
"576 / 869\n",
"577 / 869\n",
"578 / 869\n",
"579 / 869\n",
"580 / 869\n",
"581 / 869\n",
"582 / 869\n",
"583 / 869\n",
"584 / 869\n",
"585 / 869\n",
"586 / 869\n",
"587 / 869\n",
"588 / 869\n",
"589 / 869\n",
"590 / 869\n",
"591 / 869\n",
"592 / 869\n",
"593 / 869\n",
"594 / 869\n",
"595 / 869\n",
"596 / 869\n",
"597 / 869\n",
"598 / 869\n",
"599 / 869\n",
"600 / 869\n",
"601 / 869\n",
"602 / 869\n",
"603 / 869\n",
"604 / 869\n",
"605 / 869\n",
"606 / 869\n",
"607 / 869\n",
"608 / 869\n",
"609 / 869\n",
"610 / 869\n",
"611 / 869\n",
"612 / 869\n",
"613 / 869\n",
"614 / 869\n",
"615 / 869\n",
"616 / 869\n",
"617 / 869\n",
"618 / 869\n",
"619 / 869\n",
"620 / 869\n",
"621 / 869\n",
"622 / 869\n",
"623 / 869\n",
"624 / 869\n",
"625 / 869\n",
"626 / 869\n",
"627 / 869\n",
"628 / 869\n",
"629 / 869\n",
"630 / 869\n",
"631 / 869\n",
"632 / 869\n",
"633 / 869\n",
"634 / 869\n",
"635 / 869\n",
"636 / 869\n",
"637 / 869\n",
"638 / 869\n",
"639 / 869\n",
"640 / 869\n",
"641 / 869\n",
"642 / 869\n",
"643 / 869\n",
"644 / 869\n",
"645 / 869\n",
"646 / 869\n",
"647 / 869\n",
"648 / 869\n",
"649 / 869\n",
"650 / 869\n",
"651 / 869\n",
"652 / 869\n",
"653 / 869\n",
"654 / 869\n",
"655 / 869\n",
"656 / 869\n",
"657 / 869\n",
"658 / 869\n",
"659 / 869\n",
"660 / 869\n",
"661 / 869\n",
"662 / 869\n",
"663 / 869\n",
"664 / 869\n",
"665 / 869\n",
"666 / 869\n",
"667 / 869\n",
"668 / 869\n",
"669 / 869\n",
"670 / 869\n",
"671 / 869\n",
"672 / 869\n",
"673 / 869\n",
"674 / 869\n",
"675 / 869\n",
"676 / 869\n",
"677 / 869\n",
"678 / 869\n",
"679 / 869\n",
"680 / 869\n",
"681 / 869\n",
"682 / 869\n",
"683 / 869\n",
"684 / 869\n",
"685 / 869\n",
"686 / 869\n",
"687 / 869\n",
"688 / 869\n",
"689 / 869\n",
"690 / 869\n",
"691 / 869\n",
"692 / 869\n",
"693 / 869\n",
"694 / 869\n",
"695 / 869\n",
"696 / 869\n",
"697 / 869\n",
"698 / 869\n",
"699 / 869\n",
"700 / 869\n",
"701 / 869\n",
"702 / 869\n",
"703 / 869\n",
"704 / 869\n",
"705 / 869\n",
"706 / 869\n",
"707 / 869\n",
"708 / 869\n",
"709 / 869\n",
"710 / 869\n",
"711 / 869\n",
"712 / 869\n",
"713 / 869\n",
"714 / 869\n",
"715 / 869\n",
"716 / 869\n",
"717 / 869\n",
"718 / 869\n",
"719 / 869\n",
"720 / 869\n",
"721 / 869\n",
"722 / 869\n",
"723 / 869\n",
"724 / 869\n",
"725 / 869\n",
"726 / 869\n",
"727 / 869\n",
"728 / 869\n",
"729 / 869\n",
"730 / 869\n",
"731 / 869\n",
"732 / 869\n",
"733 / 869\n",
"734 / 869\n",
"735 / 869\n",
"736 / 869\n",
"737 / 869\n",
"738 / 869\n",
"739 / 869\n",
"740 / 869\n",
"741 / 869\n",
"742 / 869\n",
"743 / 869\n",
"744 / 869\n",
"745 / 869\n",
"746 / 869\n",
"747 / 869\n",
"748 / 869\n",
"749 / 869\n",
"750 / 869\n",
"751 / 869\n",
"752 / 869\n",
"753 / 869\n",
"754 / 869\n",
"755 / 869\n",
"756 / 869\n",
"757 / 869\n",
"758 / 869\n",
"759 / 869\n",
"760 / 869\n",
"761 / 869\n",
"762 / 869\n",
"763 / 869\n",
"764 / 869\n",
"765 / 869\n",
"766 / 869\n",
"767 / 869\n",
"768 / 869\n",
"769 / 869\n",
"770 / 869\n",
"771 / 869\n",
"772 / 869\n",
"773 / 869\n",
"774 / 869\n",
"775 / 869\n",
"776 / 869\n",
"777 / 869\n",
"778 / 869\n",
"779 / 869\n",
"780 / 869\n",
"781 / 869\n",
"782 / 869\n",
"783 / 869\n",
"784 / 869\n",
"785 / 869\n",
"786 / 869\n",
"787 / 869\n",
"788 / 869\n",
"789 / 869\n",
"790 / 869\n",
"791 / 869\n",
"792 / 869\n",
"793 / 869\n",
"794 / 869\n",
"795 / 869\n",
"796 / 869\n",
"797 / 869\n",
"798 / 869\n",
"799 / 869\n",
"800 / 869\n",
"801 / 869\n",
"802 / 869\n",
"803 / 869\n",
"804 / 869\n",
"805 / 869\n",
"806 / 869\n",
"807 / 869\n",
"808 / 869\n",
"809 / 869\n",
"810 / 869\n",
"811 / 869\n",
"812 / 869\n",
"813 / 869\n",
"814 / 869\n",
"815 / 869\n",
"816 / 869\n",
"817 / 869\n",
"818 / 869\n",
"819 / 869\n",
"820 / 869\n",
"821 / 869\n",
"822 / 869\n",
"823 / 869\n",
"824 / 869\n",
"825 / 869\n",
"826 / 869\n",
"827 / 869\n",
"828 / 869\n",
"829 / 869\n",
"830 / 869\n",
"831 / 869\n",
"832 / 869\n",
"833 / 869\n",
"834 / 869\n",
"835 / 869\n",
"836 / 869\n",
"837 / 869\n",
"838 / 869\n",
"839 / 869\n",
"840 / 869\n",
"841 / 869\n",
"842 / 869\n",
"843 / 869\n",
"844 / 869\n",
"845 / 869\n",
"846 / 869\n",
"847 / 869\n",
"848 / 869\n",
"849 / 869\n",
"850 / 869\n",
"851 / 869\n",
"852 / 869\n",
"853 / 869\n",
"854 / 869\n",
"855 / 869\n",
"856 / 869\n",
"857 / 869\n",
"858 / 869\n",
"859 / 869\n",
"860 / 869\n",
"861 / 869\n",
"862 / 869\n",
"863 / 869\n",
"864 / 869\n",
"865 / 869\n",
"866 / 869\n",
"867 / 869\n",
"868 / 869\n"
]
}
],
"source": [
"dest_language = \"English\"\n",
"\n",
"translated_chunks = []\n",
"for i, chunk in enumerate(chunks):\n",
" print(str(i+1) + \" / \" + str(len(chunks)))\n",
" # translate each chunk\n",
" translated_chunks.append(translate_chunk(chunk, model='gpt-3.5-turbo', dest_language=dest_language))\n",
"\n",
"# join the chunks together\n",
"result = '\\n\\n'.join(translated_chunks)\n",
"\n",
"# save the final result\n",
"with open(f\"data/geometry_{dest_language}.tex\", \"w\") as f:\n",
" f.write(result)"
]
}
],
"metadata": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
},
"kernelspec": {
"display_name": "Python 3.9.10 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}