langchain/docs/scripts/execute_notebooks.sh
ccurme 595dc592c9
docs: run how-to guides in CI (#27615)
Add how-to guides to [Run notebooks
job](https://github.com/langchain-ai/langchain/actions/workflows/run_notebooks.yml)
and fix existing notebooks.

- As with tutorials, cassettes must be updated when HTTP calls in guides
change (by running existing
[script](https://github.com/langchain-ai/langchain/blob/master/docs/scripts/update_cassettes.sh)).
- Cassettes now total ~62mb over 474 files.
- `docs/scripts/prepare_notebooks_for_ci.py` lists a number of notebooks
that do not run (e.g., due to requiring additional infra, slowness,
requiring `input()`, etc.).
2024-10-30 12:35:38 -04:00

47 lines
1.6 KiB
Bash
Executable File

#!/bin/bash
# Read the list of notebooks to skip from the JSON file
SKIP_NOTEBOOKS=$(python -c "import json; print('\n'.join(json.load(open('docs/notebooks_no_execution.json'))))")
# Get the working directory or specific notebook file from the input parameter
WORKING_DIRECTORY=$1
# Function to execute a single notebook
execute_notebook() {
file="$1"
index="$2"
total="$3"
echo "Starting execution of $file ($index/$total)"
start_time=$(date +%s)
if ! output=$(time poetry run jupyter nbconvert --to notebook --execute --ExecutePreprocessor.kernel_name=python3 $file 2>&1); then
end_time=$(date +%s)
execution_time=$((end_time - start_time))
echo "Error in $file. Execution time: $execution_time seconds"
echo "Error details: $output"
exit 1
fi
end_time=$(date +%s)
execution_time=$((end_time - start_time))
echo "Finished $file. Execution time: $execution_time seconds"
}
export -f execute_notebook
# Determine the list of notebooks to execute
if [ "$WORKING_DIRECTORY" == "all" ]; then
notebooks=$(find docs/docs/tutorials docs/docs/how_to -name "*.ipynb" | grep -v ".ipynb_checkpoints" | grep -vFf <(echo "$SKIP_NOTEBOOKS"))
else
notebooks=$(find "$WORKING_DIRECTORY" -name "*.ipynb" | grep -v ".ipynb_checkpoints" | grep -vFf <(echo "$SKIP_NOTEBOOKS"))
fi
# Convert the list of notebooks to an array
notebooks_array=($notebooks)
total_notebooks=${#notebooks_array[@]}
# Execute notebooks sequentially with progress indication
for i in "${!notebooks_array[@]}"; do
file="${notebooks_array[$i]}"
index=$((i + 1))
execute_notebook "$file" "$index" "$total_notebooks"
done