langchain/libs/community/tests/integration_tests/document_loaders/test_geodataframe.py

from __future__ import annotations

from typing import TYPE_CHECKING

import pytest
from langchain_core.documents import Document

from langchain_community.document_loaders import GeoDataFrameLoader

if TYPE_CHECKING:
    from geopandas import GeoDataFrame
else:
    GeoDataFrame = "geopandas.GeoDataFrame"


@pytest.mark.requires("geopandas")
def sample_gdf() -> GeoDataFrame:
    import geopandas

    # TODO: geopandas.datasets will be deprecated in 1.0
    path_to_data = geopandas.datasets.get_path("nybb")
    gdf = geopandas.read_file(path_to_data)
    gdf["area"] = gdf.area
    gdf["crs"] = gdf.crs.to_string()
    return gdf.head(2)


@pytest.mark.requires("geopandas")
def test_load_returns_list_of_documents(sample_gdf: GeoDataFrame) -> None:
    loader = GeoDataFrameLoader(sample_gdf)
    docs = loader.load()
    assert isinstance(docs, list)
    assert all(isinstance(doc, Document) for doc in docs)
    assert len(docs) == 2


@pytest.mark.requires("geopandas")
def test_load_converts_dataframe_columns_to_document_metadata(
    sample_gdf: GeoDataFrame,
) -> None:
    loader = GeoDataFrameLoader(sample_gdf)
    docs = loader.load()
    for i, doc in enumerate(docs):
        assert doc.metadata["area"] == sample_gdf.loc[i, "area"]
        assert doc.metadata["crs"] == sample_gdf.loc[i, "crs"]
fix sched ci (more) (#9056) 2023-08-10 17:39:29 +00:00			`from __future__ import annotations`

Add Geopandas.GeoDataFrame Document Loader (#3817) Work in Progress. WIP Not ready... Adds Document Loader support for [Geopandas.GeoDataFrames](https://geopandas.org/) Example: - [x] stub out `GeoDataFrameLoader` class - [x] stub out integration tests - [ ] Experiment with different geometry text representations - [ ] Verify CRS is successfully added in metadata - [ ] Test effectiveness of searches on geometries - [ ] Test with different geometry types (point, line, polygon with multi-variants). - [ ] Add documentation --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> 2023-07-19 19:14:41 +00:00			`from typing import TYPE_CHECKING`

			`import pytest`
REFACTOR: Refactor langchain_core (#13627) Changes: - remove langchain_core/schema since no clear distinction b/n schema and non-schema modules - make every module that doesn't end in -y plural - where easy have 1-2 classes per file - no more than one level of nesting in directories - only import from top level core modules in langchain 2023-11-21 16:35:29 +00:00			`from langchain_core.documents import Document`
Add Geopandas.GeoDataFrame Document Loader (#3817) Work in Progress. WIP Not ready... Adds Document Loader support for [Geopandas.GeoDataFrames](https://geopandas.org/) Example: - [x] stub out `GeoDataFrameLoader` class - [x] stub out integration tests - [ ] Experiment with different geometry text representations - [ ] Verify CRS is successfully added in metadata - [ ] Test effectiveness of searches on geometries - [ ] Test with different geometry types (point, line, polygon with multi-variants). - [ ] Add documentation --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> 2023-07-19 19:14:41 +00:00
community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) Moved the following modules to new package langchain-community in a backwards compatible fashion: ``` mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community ``` Moved the following to core ``` mv langchain/langchain/utils/json_schema.py core/langchain_core/utils mv langchain/langchain/utils/html.py core/langchain_core/utils mv langchain/langchain/utils/strings.py core/langchain_core/utils cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py rm langchain/langchain/utils/env.py ``` See .scripts/community_split/script_integrations.sh for all changes 2023-12-11 21:53:30 +00:00			`from langchain_community.document_loaders import GeoDataFrameLoader`
Add Geopandas.GeoDataFrame Document Loader (#3817) Work in Progress. WIP Not ready... Adds Document Loader support for [Geopandas.GeoDataFrames](https://geopandas.org/) Example: - [x] stub out `GeoDataFrameLoader` class - [x] stub out integration tests - [ ] Experiment with different geometry text representations - [ ] Verify CRS is successfully added in metadata - [ ] Test effectiveness of searches on geometries - [ ] Test with different geometry types (point, line, polygon with multi-variants). - [ ] Add documentation --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> 2023-07-19 19:14:41 +00:00
			`if TYPE_CHECKING:`
			`from geopandas import GeoDataFrame`
			`else:`
			`GeoDataFrame = "geopandas.GeoDataFrame"`


			`@pytest.mark.requires("geopandas")`
			`def sample_gdf() -> GeoDataFrame:`
fix sched ci (more) (#9056) 2023-08-10 17:39:29 +00:00			`import geopandas`

Added Geometry Validation, Geometry Metadata, and WKT instead of Python str() to GeoDataFrame Loader (#9466) @rlancemartin The current implementation within `Geopandas.GeoDataFrame` loader uses the python builtin `str()` function on the input geometries. While this looks very close to WKT (Well known text), Python's str function doesn't guarantee that. In the interest of interop., I've changed to the of use `wkt` property on the Shapely geometries for generating the text representation of the geometries. Also, included here: - validation of the input `page_content_column` as being a GeoSeries. - geometry `crs` (Coordinate Reference System) / bounds (xmin/ymin/xmax/ymax) added to Document metadata. Having the CRS is critical... having the bounds is just helpful! I think there is a larger question of "Should the geometry live in the `page_content`, or should the record be better summarized and tuck the geom into metadata?" ...something for another day and another PR. 2023-08-19 04:35:39 +00:00			`# TODO: geopandas.datasets will be deprecated in 1.0`
Add Geopandas.GeoDataFrame Document Loader (#3817) Work in Progress. WIP Not ready... Adds Document Loader support for [Geopandas.GeoDataFrames](https://geopandas.org/) Example: - [x] stub out `GeoDataFrameLoader` class - [x] stub out integration tests - [ ] Experiment with different geometry text representations - [ ] Verify CRS is successfully added in metadata - [ ] Test effectiveness of searches on geometries - [ ] Test with different geometry types (point, line, polygon with multi-variants). - [ ] Add documentation --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> 2023-07-19 19:14:41 +00:00			`path_to_data = geopandas.datasets.get_path("nybb")`
			`gdf = geopandas.read_file(path_to_data)`
			`gdf["area"] = gdf.area`
			`gdf["crs"] = gdf.crs.to_string()`
			`return gdf.head(2)`


			`@pytest.mark.requires("geopandas")`
			`def test_load_returns_list_of_documents(sample_gdf: GeoDataFrame) -> None:`
			`loader = GeoDataFrameLoader(sample_gdf)`
			`docs = loader.load()`
			`assert isinstance(docs, list)`
			`assert all(isinstance(doc, Document) for doc in docs)`
			`assert len(docs) == 2`


			`@pytest.mark.requires("geopandas")`
			`def test_load_converts_dataframe_columns_to_document_metadata(`
			`sample_gdf: GeoDataFrame,`
			`) -> None:`
			`loader = GeoDataFrameLoader(sample_gdf)`
			`docs = loader.load()`
			`for i, doc in enumerate(docs):`
			`assert doc.metadata["area"] == sample_gdf.loc[i, "area"]`
			`assert doc.metadata["crs"] == sample_gdf.loc[i, "crs"]`