langchain/tests/unit_tests
Cristóbal Carnero Liñán e494b0a09f
feat (documents): add a source code loader based on AST manipulation (#6486)
#### Summary

A new approach to loading source code is implemented:

Each top-level function and class in the code is loaded into separate
documents. Then, an additional document is created with the top-level
code, but without the already loaded functions and classes.

This could improve the accuracy of QA chains over source code.

For instance, having this script:

```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")

def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()

if __name__ == '__main__':
    main()
```

The loader will create three documents with this content:

First document:
```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")
```

Second document:
```
def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()
```

Third document:
```
# Code for: class MyClass:

# Code for: def main():

if __name__ == '__main__':
    main()
```

A threshold parameter is added to control whether small scripts are
split in this way or not.

At this moment, only Python and JavaScript are supported. The
appropriate parser is determined by examining the file extension.

#### Tests

This PR adds:

- Unit tests
- Integration tests

#### Dependencies

Only one dependency was added as optional (needed for the JavaScript
parser).

#### Documentation

A notebook is added showing how the loader can be used.

#### Who can review?

@eyurtsev @hwchase17

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-27 15:58:47 -07:00
..
agents Fix breaking tags (#6765) 2023-06-26 09:28:11 -07:00
callbacks split up batch llm calls into separate runs (#5804) 2023-06-24 21:03:31 -07:00
chains split up batch llm calls into separate runs (#5804) 2023-06-24 21:03:31 -07:00
chat_models add FunctionMessage support to _convert_dict_to_message() in OpenAI chat model (#6382) 2023-06-20 08:25:55 -07:00
client Update to RunOnDataset helper functions to accept evaluator callbacks (#6629) 2023-06-26 23:58:13 -07:00
data
docstore
document_loaders feat (documents): add a source code loader based on AST manipulation (#6486) 2023-06-27 15:58:47 -07:00
evaluation Permit Constitutional Principles (#6807) 2023-06-27 00:23:54 -07:00
examples Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
llms split up batch llm calls into separate runs (#5804) 2023-06-24 21:03:31 -07:00
load Include placeholder value for all secrets, not just kwargs (#6421) 2023-06-19 15:41:45 +01:00
memory Implemented appending arbitrary messages (#5293) 2023-05-29 07:18:59 -07:00
output_parsers Update String Evaluator (#6615) 2023-06-26 14:16:14 -07:00
prompts Fix for #6431 - chatprompt template with partial variables giing validation error (#6456) 2023-06-19 22:08:15 -07:00
retrievers Harrison/myscale self query (#6376) 2023-06-18 16:53:10 -07:00
tools Zapier update oauth support (#6780) 2023-06-27 11:46:32 -07:00
utilities Fix graphql tool (#4984) 2023-05-19 15:27:50 -07:00
vectorstores Add maximal relevance search to SKLearnVectorStore (#5430) 2023-05-30 16:13:33 -07:00
__init__.py
conftest.py Add pytest --only-extended and --only-core options (#4494) 2023-05-12 11:35:22 -04:00
test_bash.py Add Mastodon toots loader (#5036) 2023-05-22 16:43:07 -07:00
test_cache.py Add caching to BaseChatModel (issue #1644) (#5089) 2023-06-24 11:45:09 -07:00
test_dependencies.py Fix class promotion (#6187) 2023-06-18 16:55:18 -07:00
test_document_transformers.py
test_formatting.py
test_math_utils.py add get_top_k_cosine_similarity method to get max top k score and index (#5059) 2023-05-22 11:55:48 -07:00
test_pytest_config.py Block sockets for unit-tests (#4803) 2023-05-16 14:41:24 -04:00
test_python.py option for csv agent to not include df in prompt (#4610) 2023-05-12 21:55:22 -07:00
test_schema.py
test_sql_database_schema.py
test_sql_database.py Fix SQLAlchemy truncating text when it is too big (#5206) 2023-06-01 21:33:31 -04:00
test_text_splitter.py MD header text splitter returns Documents (#6571) 2023-06-22 09:25:38 -07:00