If you use `refresh_schema=False`, then the metadata constraint doesn't
exist. ATM, we used default `None` in the constraint check, but then
`any` fails because it can't iterate over None value
Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
Current implementation doesn't have an indexed property that would
optimize the import. I have added a `baseEntityLabel` parameter that
allows you to add a secondary node label, which has an indexed id
`property`. By default, the behaviour is identical to previous version.
Since multi-labeled nodes are terrible for text2cypher, I removed the
secondary label from schema representation object and string, which is
used in text2cypher.
Previously, if this did not find a mypy cache then it wouldnt run
this makes it always run
adding mypy ignore comments with existing uncaught issues to unblock other prs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
The timeout function comes in handy when you want to kill longrunning
queries.
The value sanitization removes all lists that are larger than 128
elements. The idea here is to remove embedding properties from results.