You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/use_cases/evaluation
vowelparrot 5ca7ce77cd
Remove pythonrepl from LLM-MathChain (#2943)
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.

Tested against the (limited) math dataset and got the same score as
before.

For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.

See https://github.com/hwchase17/langchain/issues/814
1 year ago
..
agent_benchmarking.ipynb Remove pythonrepl from LLM-MathChain (#2943) 1 year ago
agent_vectordb_sota_pg.ipynb bump version to 131 (#2391) 1 year ago
benchmarking_template.ipynb Harrison/agent eval (#1620) 2 years ago
data_augmented_question_answering.ipynb Typo docs - Update data_augmented_question_answering.ipynb propriterary-> proprietary (#2626) 1 year ago
huggingface_datasets.ipynb Update huggingface_datasets.ipynb (#1417) 2 years ago
llm_math.ipynb Harrison/llm math (#1808) 2 years ago
openapi_eval.ipynb Harrison/move eval (#2533) 1 year ago
qa_benchmarking_pg.ipynb WIP: Harrison/base retriever (#1765) 2 years ago
qa_benchmarking_sota.ipynb WIP: Harrison/base retriever (#1765) 2 years ago
qa_generation.ipynb Harrison/agent eval (#1620) 2 years ago
question_answering.ipynb fix typo (#2532) 1 year ago
sql_qa_benchmarking_chinook.ipynb Harrison/agent eval (#1620) 2 years ago