langchain/docs/use_cases/evaluation
vowelparrot 5ca7ce77cd
Remove pythonrepl from LLM-MathChain (#2943)
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.

Tested against the (limited) math dataset and got the same score as
before.

For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.

See https://github.com/hwchase17/langchain/issues/814
2023-04-16 08:50:32 -07:00
..
agent_benchmarking.ipynb Remove pythonrepl from LLM-MathChain (#2943) 2023-04-16 08:50:32 -07:00
agent_vectordb_sota_pg.ipynb bump version to 131 (#2391) 2023-04-04 07:21:50 -07:00
benchmarking_template.ipynb Harrison/agent eval (#1620) 2023-03-14 12:37:48 -07:00
data_augmented_question_answering.ipynb Typo docs - Update data_augmented_question_answering.ipynb propriterary-> proprietary (#2626) 2023-04-09 12:24:53 -07:00
huggingface_datasets.ipynb Update huggingface_datasets.ipynb (#1417) 2023-03-04 00:22:31 -08:00
llm_math.ipynb Harrison/llm math (#1808) 2023-03-20 07:53:26 -07:00
openapi_eval.ipynb Harrison/move eval (#2533) 2023-04-07 07:53:13 -07:00
qa_benchmarking_pg.ipynb WIP: Harrison/base retriever (#1765) 2023-03-24 07:46:49 -07:00
qa_benchmarking_sota.ipynb WIP: Harrison/base retriever (#1765) 2023-03-24 07:46:49 -07:00
qa_generation.ipynb Harrison/agent eval (#1620) 2023-03-14 12:37:48 -07:00
question_answering.ipynb fix typo (#2532) 2023-04-07 07:25:22 -07:00
sql_qa_benchmarking_chinook.ipynb Harrison/agent eval (#1620) 2023-03-14 12:37:48 -07:00