mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
b485c3048b
Causing problems indexing docs and notebook images don't render after markdown conversion anyways
189 lines
6.1 KiB
Plaintext
189 lines
6.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ca4c8c2a",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Geopandas\n",
|
|
"\n",
|
|
"[Geopandas](https://geopandas.org/en/stable/index.html) is an open source project to make working with geospatial data in python easier. \n",
|
|
"\n",
|
|
"GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. \n",
|
|
"\n",
|
|
"Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and matplotlib for plotting.\n",
|
|
"\n",
|
|
"LLM applications (chat, QA) that utilize geospatial data are an interesting area for exploration."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "00b3bf80",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install sodapy\n",
|
|
"! pip install pandas\n",
|
|
"! pip install geopandas"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "cecc9320",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import ast\n",
|
|
"import pandas as pd\n",
|
|
"import geopandas as gpd\n",
|
|
"from langchain.document_loaders import OpenCityDataLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "04981332",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a GeoPandas dataframe from [`Open City Data`](https://python.langchain.com/docs/integrations/document_loaders/open_city_data) as an example input."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5e7de46b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load Open City Data\n",
|
|
"dataset = \"tmnf-yvry\" # San Francisco crime data\n",
|
|
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\", dataset_id=dataset, limit=5000)\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 30,
|
|
"id": "7cda2e38",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Convert list of dictionaries to DataFrame\n",
|
|
"df = pd.DataFrame([ast.literal_eval(d.page_content) for d in docs])\n",
|
|
"\n",
|
|
"# Extract latitude and longitude\n",
|
|
"df[\"Latitude\"] = df[\"location\"].apply(lambda loc: loc[\"coordinates\"][1])\n",
|
|
"df[\"Longitude\"] = df[\"location\"].apply(lambda loc: loc[\"coordinates\"][0])\n",
|
|
"\n",
|
|
"# Create geopandas DF\n",
|
|
"gdf = gpd.GeoDataFrame(\n",
|
|
" df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude), crs=\"EPSG:4326\"\n",
|
|
")\n",
|
|
"\n",
|
|
"# Only keep valid longitudes and latitudes for San Francisco\n",
|
|
"gdf = gdf[\n",
|
|
" (gdf[\"Longitude\"] >= -123.173825)\n",
|
|
" & (gdf[\"Longitude\"] <= -122.281780)\n",
|
|
" & (gdf[\"Latitude\"] >= 37.623983)\n",
|
|
" & (gdf[\"Latitude\"] <= 37.929824)\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "030a535c",
|
|
"metadata": {},
|
|
"source": [
|
|
"Visiualization of the sample of SF crimne data. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8148a63e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"# Load San Francisco map data\n",
|
|
"sf = gpd.read_file(\"https://data.sfgov.org/resource/3psu-pn9h.geojson\")\n",
|
|
"\n",
|
|
"# Plot the San Francisco map and the points\n",
|
|
"fig, ax = plt.subplots(figsize=(10, 10))\n",
|
|
"sf.plot(ax=ax, color=\"white\", edgecolor=\"black\")\n",
|
|
"gdf.plot(ax=ax, color=\"red\", markersize=5)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a081a9d1",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load GeoPandas dataframe as a `Document` for downstream processing (embedding, chat, etc). \n",
|
|
"\n",
|
|
"The `geometry` will be the default `page_content` columns, and all other columns are placed in `metadata`.\n",
|
|
"\n",
|
|
"But, we can specify the `page_content_column`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"id": "381a5f7b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.document_loaders import GeoDataFrameLoader\n",
|
|
"\n",
|
|
"loader = GeoDataFrameLoader(data_frame=gdf, page_content_column=\"geometry\")\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"id": "74baf6ee",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Document(page_content='POINT (-122.420084075249 37.7083109744362)', metadata={'pdid': '4133422003074', 'incidntnum': '041334220', 'incident_code': '03074', 'category': 'ROBBERY', 'descript': 'ROBBERY, BODILY FORCE', 'dayofweek': 'Monday', 'date': '2004-11-22T00:00:00.000', 'time': '17:50', 'pddistrict': 'INGLESIDE', 'resolution': 'NONE', 'address': 'GENEVA AV / SANTOS ST', 'x': '-122.420084075249', 'y': '37.7083109744362', 'location': {'type': 'Point', 'coordinates': [-122.420084075249, 37.7083109744362]}, ':@computed_region_26cr_cadq': '9', ':@computed_region_rxqg_mtj9': '8', ':@computed_region_bh8s_q3mv': '309', ':@computed_region_6qbp_sg9q': nan, ':@computed_region_qgnn_b9vv': nan, ':@computed_region_ajp5_b2md': nan, ':@computed_region_yftq_j783': nan, ':@computed_region_p5aj_wyqh': nan, ':@computed_region_fyvs_ahh9': nan, ':@computed_region_6pnf_4xz7': nan, ':@computed_region_jwn9_ihcz': nan, ':@computed_region_9dfj_4gjx': nan, ':@computed_region_4isq_27mq': nan, ':@computed_region_pigm_ib2e': nan, ':@computed_region_9jxd_iqea': nan, ':@computed_region_6ezc_tdp2': nan, ':@computed_region_h4ep_8xdi': nan, ':@computed_region_n4xg_c4py': nan, ':@computed_region_fcz8_est8': nan, ':@computed_region_nqbw_i6c3': nan, ':@computed_region_2dwj_jsy4': nan, 'Latitude': 37.7083109744362, 'Longitude': -122.420084075249})"
|
|
]
|
|
},
|
|
"execution_count": 33,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"docs[0]"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|