{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c5fdf490-fa77-4e56-92d1-53101fff75ba", "metadata": {}, "source": [ "# cuSpatial Python User's Guide\n", "\n", "cuSpatial is a GPU-accelerated Python library for spatial data analysis including distance and \n", "trajectory computations, spatial data indexing and spatial join operations. cuSpatial's \n", "Python API provides an accessible interface to high-performance spatial algorithms accelerated\n", "by CUDA-enabled GPUs." ] }, { "attachments": {}, "cell_type": "markdown", "id": "caadf3ca-be3c-4523-877c-4c35dd25093a", "metadata": {}, "source": [ "## Contents\n", "\n", "This guide provides a working example for all of the python API components of cuSpatial. \n", "The following list links to each subsection.\n", "\n", "* [Installing cuSpatial](#Installing-cuSpatial)\n", "* [GPU accelerated memory layout](#GPU-accelerated-memory-layout)\n", "* [Input / Output](#Input-/-Output)\n", "* [Geopandas and cuDF integration](#Geopandas-and-cuDF-integration)\n", "* [Trajectories](#Trajectories)\n", "* [Bounding](#Bounding)\n", "* [Projection](#Projection)\n", "* [Distance](#Distance)\n", "* [Filtering](#Filtering)\n", "* [Spatial joins](#Spatial-joins)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "115c8382-f83f-476f-9a26-a64a45b3a8da", "metadata": {}, "source": [ "## Installing cuSpatial\n", "Read the [RAPIDS Quickstart Guide]( https://rapids.ai/start.html ) to learn more about installing all RAPIDS libraries, including cuSpatial.\n", "\n", "If you are working on a system with a CUDA-enabled GPU and have CUDA installed, uncomment the \n", "following cell and install cuSpatial:" ] }, { "cell_type": "code", "execution_count": 1, "id": "7265f9d2-9203-4da2-bbb2-b35c7f933641", "metadata": {}, "outputs": [], "source": [ "# !conda create -n rapids-24.02 -c rapidsai -c conda-forge -c nvidia \\ \n", "# cuspatial=24.02 python=3.9 cudatoolkit=11.5 " ] }, { "attachments": {}, "cell_type": "markdown", "id": "051b6e68-9ffd-473a-89e2-313fe1c59d18", "metadata": {}, "source": [ "For other options to create a RAPIDS environment, such as docker or build from source, see \n", "[RAPIDS Release Selector]( https://rapids.ai/start.html#get-rapids). \n", "\n", "If you wish to contribute to cuSpatial, you should create a source build using the excellent [rapids-compose](https://github.com/trxcllnt/rapids-compose)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7b770cb4-793e-467a-a306-2d3409545748", "metadata": {}, "source": [ "## GPU accelerated memory layout\n", "\n", "cuSpatial uses `GeoArrow` buffers, a GPU-friendly data format for geometric data that is well \n", "suited for massively parallel programming. See [I/O]((#Input-/-Output) on the fastest methods to get your \n", "data into cuSpatial. GeoArrow extends [PyArrow](\n", "https://arrow.apache.org/docs/python/index.html ) bindings and introduces several new types suited \n", "for geometry applications. GeoArrow supports [ListArrays](\n", "https://arrow.apache.org/docs/python/data.html#arrays) for `Points`, `MultiPoints`, \n", "`LineStrings`, `MultiLineStrings`, `Polygons`, and `MultiPolygons`. Using an Arrow [DenseArray](\n", "https://arrow.apache.org/docs/python/data.html#union-arrays), \n", "GeoArrow stores heterogeneous types of Features. DataFrames of geometry objects and their \n", "metadata can be loaded and transformed in a method similar to those in [GeoPandas.GeoSeries](\n", "https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.html)." ] }, { "cell_type": "code", "execution_count": 2, "id": "88d05bb9-c924-4d0b-8736-cd5183602d76", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Imports used throughout this notebook.\n", "import cuspatial\n", "import cudf\n", "import cupy\n", "import geopandas\n", "import pandas as pd\n", "import numpy as np\n", "from shapely.geometry import *\n", "from shapely import wkt" ] }, { "cell_type": "code", "execution_count": 3, "id": "4937d4aa-4e32-49ab-a22e-96dfb0098d07", "metadata": { "tags": [] }, "outputs": [], "source": [ "# For deterministic result\n", "np.random.seed(0)\n", "cupy.random.seed(0)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4b1251d1-558a-4899-8e7a-8066db0ad091", "metadata": {}, "source": [ "## Input / Output\n", "\n", "The primary method of loading features into cuSpatial is using [cuspatial.from_geopandas](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas).\n", "\n", "One can also create feature geometries directly using any Python buffer that supports \n", "`__array_interface__` for coordinates and their feature offsets." ] }, { "attachments": {}, "cell_type": "markdown", "id": "11b973bd-87e1-4b67-ab8c-23c3b8291335", "metadata": {}, "source": [ "### [cuspatial.from_geopandas](https://docs.rapids.ai/api/cuspatial/stable/api_docs/io.html?highlight=from_geopandas#cuspatial.from_geopandas)\n", "\n", "The easiest way to get data into cuSpatial is via `cuspatial.from_geopandas`." ] }, { "cell_type": "code", "execution_count": 4, "id": "255fbfbe-8be1-498c-9a26-f4a3f31bdded", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/3038749724.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pop_est continent name iso_a3 gdp_md_est \\\n", "0 889953.0 Oceania Fiji FJI 5496 \n", "1 58005463.0 Africa Tanzania TZA 63177 \n", "2 603253.0 Africa W. Sahara ESH 907 \n", "3 37589262.0 North America Canada CAN 1736425 \n", "4 328239523.0 North America United States of America USA 21433226 \n", "\n", " geometry \n", "0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... \n", "1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... \n", "2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... \n", "3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... \n", "4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... \n", "(GPU)\n", "\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\n", " \"naturalearth_lowres\"\n", "))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "print(gpu_dataframe.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "da5c775b-7458-4e3c-a573-e1bd060e3365", "metadata": {}, "source": [ "## Geopandas and cuDF integration\n", "\n", "A cuSpatial [GeoDataFrame](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoDataFrame ) is a collection of [cudf](\n", "https://docs.rapids.ai/api/cudf/stable/ ) [Series](\n", "https://docs.rapids.ai/api/cudf/stable/api_docs/series.html ) and\n", "[cuspatial.GeoSeries](\n", "https://docs.rapids.ai/api/cuspatial/stable/api_docs/geopandas_compatibility.html#cuspatial.GeoSeries ) `\"geometry\"` objects. \n", "Both types of series are stored on the GPU, and\n", "`GeoSeries` is represented internally using `GeoArrow` data layout.\n", "\n", "One of the most important features of cuSpatial is that it is highly integrated with `cuDF`. \n", "You can use any `cuDF` operation on cuSpatial non-feature columns, and most operations will work \n", "with a `geometry` column. Operations that reduce or collate the number of rows in your DataFrame, \n", "for example `groupby`, are not supported at this time." ] }, { "cell_type": "code", "execution_count": 5, "id": "956451e2-a520-441d-a939-575ed179917b", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " pop_est continent name iso_a3 gdp_md_est \\\n", "103 38041754.0 Asia Afghanistan AFG 19291 \n", "125 2854191.0 Europe Albania ALB 15279 \n", "82 43053054.0 Africa Algeria DZA 171091 \n", "74 31825295.0 Africa Angola AGO 88815 \n", "159 4490.0 Antarctica Antarctica ATA 898 \n", ".. ... ... ... ... ... \n", "2 603253.0 Africa W. Sahara ESH 907 \n", "157 29161922.0 Asia Yemen YEM 22581 \n", "70 17861030.0 Africa Zambia ZMB 23309 \n", "48 14645468.0 Africa Zimbabwe ZWE 21440 \n", "73 1148130.0 Africa eSwatini SWZ 4471 \n", "\n", " geometry \n", "103 POLYGON ((66.51861 37.36278, 67.07578 37.35614... \n", "125 POLYGON ((21.02004 40.84273, 20.99999 40.58000... \n", "82 POLYGON ((-8.68440 27.39574, -8.66512 27.58948... \n", "74 MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... \n", "159 MULTIPOLYGON (((-48.66062 -78.04702, -48.15140... \n", ".. ... \n", "2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... \n", "157 POLYGON ((52.00001 19.00000, 52.78218 17.34974... \n", "70 POLYGON ((30.74001 -8.34001, 31.15775 -8.59458... \n", "48 POLYGON ((31.19141 -22.25151, 30.65987 -22.151... \n", "73 POLYGON ((32.07167 -26.73382, 31.86806 -27.177... \n", "\n", "[177 rows x 6 columns]\n", "(GPU)\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/567044009.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "continents_dataframe = gpu_dataframe.sort_values(\"name\")\n", "print(continents_dataframe)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5453f308-317f-4775-ba3c-ff0633755bc4", "metadata": {}, "source": [ "You can also convert between GPU-backed `cuspatial.GeoDataFrame` and CPU-backed \n", "`geopandas.GeoDataFrame` with `from_geopandas` and `to_geopandas`, enabling you to \n", "take advantage of any native GeoPandas operation. Note, however, that GeoPandas runs on \n", " the CPU and therefore will not have as high performance as cuSpatial operations. The following \n", "example displays the `Polygon` associated with the first item in the dataframe sorted \n", "alphabetically by name." ] }, { "cell_type": "code", "execution_count": 6, "id": "cd8d1c39-b44f-4d06-9e11-04c2dca4cf15", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/1940355512.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "sorted_dataframe = gpu_dataframe.sort_values(\"name\")\n", "host_dataframe = sorted_dataframe.to_geopandas()\n", "host_dataframe['geometry'].iloc[0]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "56b80722-38b6-457c-a7f0-591af6efd3ff", "metadata": {}, "source": [ "## Trajectories\n", "\n", "A trajectory is a `LineString` coupled with a time sample for each point in the `LineString`. \n", "Use `cuspatial.trajectory.derive_trajectories` to group trajectory datasets and sort by time.\n", "\n", "\n", "\n", "### [cuspatial.derive_trajectories](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.derive_trajectories)" ] }, { "cell_type": "code", "execution_count": 7, "id": "cb5acdad-53aa-418f-9948-8445515bd2b2", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " object_id x y timestamp\n", "0 1 0.680146 0.874341 1970-01-01 00:00:00.125\n", "1 1 0.843522 0.044402 1970-01-01 00:00:00.834\n", "2 1 0.837039 0.351025 1970-01-01 00:00:01.335\n", "3 1 0.946184 0.479038 1970-01-01 00:00:01.791\n", "4 1 0.117322 0.182117 1970-01-01 00:00:02.474\n", "0 0\n", "1 2455\n", "2 4899\n", "3 7422\n", "4 9924\n", " ... \n", "394 987408\n", "395 989891\n", "396 992428\n", "397 994975\n", "398 997448\n", "Length: 399, dtype: int32\n" ] } ], "source": [ "# 1m random trajectory samples\n", "ids = cupy.random.randint(1, 400, 1000000)\n", "timestamps = cupy.random.random(1000000)*1000000\n", "xy= cupy.random.random(2000000)\n", "trajs = cuspatial.GeoSeries.from_points_xy(xy)\n", "sorted_trajectories, trajectory_offsets = \\\n", " cuspatial.core.trajectory.derive_trajectories(ids, trajs, timestamps)\n", "# sorted_trajectories is a DataFrame containing all trajectory samples\n", "# sorted first by `object_id` and then by `timestamp`.\n", "print(sorted_trajectories.head())\n", "# trajectory_offsets is a Series containing the start position of each\n", "# trajectory in sorted_trajectories.\n", "print(trajectory_offsets)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3c4a90f9-8661-4fda-9026-473e6ce87bd2", "metadata": {}, "source": [ "`derive_trajectories` sorts the trajectories by `object_id`, then `timestamp`, and returns a \n", "tuple containing the sorted trajectory data frame in the first index position and the offsets \n", "buffer defining the start and stop of each trajectory in the second index position. \n", "\n", "### [cuspatial.trajectory_distances_and_speeds](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_distances_and_speeds)\n", "\n", "Use `trajectory_distance_and_speed` to calculate the overall distance travelled in meters and \n", "the speed of a set of trajectories with the same format as the result returned by `derive_trajectories`." ] }, { "cell_type": "code", "execution_count": 8, "id": "03b75847-090d-40f8-8147-cb10b900d6ec", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " distance speed\n", "trajectory_id \n", "0 1.278996e+06 1280.320089\n", "1 1.267179e+06 1268.370390\n", "2 1.294437e+06 1295.905261\n", "3 1.323413e+06 1323.956714\n", "4 1.309590e+06 1311.561012\n" ] } ], "source": [ "trajs = cuspatial.GeoSeries.from_points_xy(\n", " sorted_trajectories[[\"x\", \"y\"]].interleave_columns()\n", ")\n", "d_and_s = cuspatial.core.trajectory.trajectory_distances_and_speeds(\n", " len(cudf.Series(ids).unique()),\n", " sorted_trajectories['object_id'],\n", " trajs,\n", " sorted_trajectories['timestamp']\n", ")\n", "print(d_and_s.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "eeb0da7c-0bdf-49ec-8244-4165acc96074", "metadata": {}, "source": [ "Finally, compute the bounding boxes of trajectories that follow the format of the above two \n", "examples:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "12a3ce7e-82b7-4b51-8227-5e157a48701c", "metadata": { "tags": [] }, "source": [ "## Bounding\n", "\n", "Compute the bounding boxes of `n` polygons or linestrings:\n", "\n", "\n", "\n", "### [cuspatial.trajectory_bounding_boxes](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.trajectory_bounding_boxes)\n", "\n", "`trajectory_bounding_boxes` works out of the box with the values returned by `derive_trajectories`. \n", "Its arguments are the number of incoming objects, the offsets of those objects, and x and y point buffers." ] }, { "cell_type": "code", "execution_count": 9, "id": "452f60cb-28cc-4ad8-8aa2-9d73e3d56ec6", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x_min y_min x_max y_max\n", "0 0.000098 0.000243 0.999422 0.999068\n", "1 0.000663 0.000414 0.999813 0.998456\n", "2 0.000049 0.000220 0.999748 0.999220\n", "3 0.000006 0.000303 0.999729 0.999762\n", "4 0.001190 0.000074 0.999299 0.999858\n" ] } ], "source": [ "bounding_boxes = cuspatial.core.trajectory.trajectory_bounding_boxes(\n", " len(cudf.Series(ids, dtype=\"int32\").unique()),\n", " sorted_trajectories['object_id'],\n", " trajs\n", ")\n", "print(bounding_boxes.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a56dfe17-1739-4b20-85c9-fcb5902c1585", "metadata": {}, "source": [ "### [cuspatial.polygon_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.polygon_bounding_boxes)\n", "\n", "`polygon_bounding_boxes` supports more complex geometry objects such as `Polygon`s with multiple \n", "rings. The combination of `part_offset` and `ring_offset` allows the function to use only the \n", "exterior ring for computing the bounding box." ] }, { "cell_type": "code", "execution_count": 10, "id": "9266aac4-f925-4fb7-b287-5f0b795d5756", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " minx miny maxx maxy\n", "0 29.339998 -11.720938 40.316590 -0.950000\n", "1 -17.063423 20.999752 -8.665124 27.656426\n", "2 46.466446 40.662325 87.359970 55.385250\n", "3 55.928917 37.144994 73.055417 45.586804\n", "4 12.182337 -13.257227 31.174149 5.256088\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/1016569337.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "single_polygons = cuspatial.from_geopandas(\n", " host_dataframe['geometry'][host_dataframe['geometry'].type == \"Polygon\"]\n", ")\n", "bounding_box_polygons = cuspatial.core.spatial.bounding.polygon_bounding_boxes(\n", " single_polygons\n", ")\n", "print(bounding_box_polygons.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "85197478-801c-4d2d-8b10-c1136d7bb15c", "metadata": {}, "source": [ "### [cuspatial.linestring_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.linestring_bounding_boxes)\n", "\n", "Equivalently, we can treat trajectories as Linestrings and compute the same bounding boxes from \n", "the above trajectory calculation more generally:" ] }, { "cell_type": "code", "execution_count": 11, "id": "15b5bb38-702f-4360-b48c-2e49ffd650d7", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " minx miny maxx maxy\n", "0 -0.000002 0.000143 0.999522 0.999168\n", "1 0.000563 0.000314 0.999913 0.998556\n", "2 -0.000051 0.000120 0.999848 0.999320\n", "3 -0.000094 0.000203 0.999829 0.999862\n", "4 0.001090 -0.000026 0.999399 0.999958\n" ] } ], "source": [ "lines = cuspatial.GeoSeries.from_linestrings_xy(\n", " trajs.points.xy, trajectory_offsets, cupy.arange(len(trajectory_offsets))\n", ")\n", "trajectory_bounding_boxes = cuspatial.core.spatial.bounding.linestring_bounding_boxes(\n", " lines, 0.0001\n", ")\n", "print(trajectory_bounding_boxes.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "81c4d3ca-5d3f-4ae1-ae8e-ac1e252f3e17", "metadata": {}, "source": [ "## Projection\n", "\n", "cuSpatial provides a simple sinusoidal longitude / latitude to Cartesian coordinate transform. \n", "This function requires an origin point to determine the scaling parameters for the lonlat inputs. \n", "\n", "### [cuspatial.sinusoidal_projection](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.sinusoidal_projection)\n", "\n", "The following cell converts the lonlat coordinates of the country of Afghanistan to Cartesian \n", "coordinates in km, centered around the center of the country, suitable for graphing and display." ] }, { "cell_type": "code", "execution_count": 12, "id": "a7a870dd-c0ae-41c1-a66c-cff4bd2db0ec", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 POINT (112.174 -281.590)\n", "1 POINT (62.152 -280.852)\n", "2 POINT (-5.573 -257.391)\n", "3 POINT (-33.071 -243.849)\n", "4 POINT (-98.002 -279.540)\n", "dtype: geometry\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/2658722012.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "afghanistan = gpu_dataframe['geometry'][gpu_dataframe['name'] == 'Afghanistan']\n", "points = cuspatial.GeoSeries.from_points_xy(afghanistan.polygons.xy)\n", "projected = cuspatial.sinusoidal_projection(\n", " afghanistan.polygons.x.mean(),\n", " afghanistan.polygons.y.mean(),\n", " points\n", ")\n", "print(projected.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c9085e80-e7ba-48e9-8c50-12e544e3af46", "metadata": {}, "source": [ "## Distance\n", "cuSpatial provides a growing suite of distance computation functions. Parallel distance functions \n", "come in two main forms: pairwise, which computes a distance for each corresponding pair of input \n", "geometries; and all-pairs, which computes a distance for the each element of the Cartesian product \n", "of input geometries (for each input geometry in A, compute the distance from A to each input\n", "geometry in B).\"\n", " \n", "Two pairwise distance functions are included in cuSpatial: `haversine` and `pairwise_linestring`. \n", "The `hausdorff` clustering distances algorithm is also available, computing the hausdorff \n", "distance across the cartesian product of its single input.\n", "\n", "### [cuspatial.directed_hausdorff_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/trajectory.html#cuspatial.directed_hausdorff_distance)\n", "\n", "The directed Hausdorff distance from one space to another is the greatest of all the distances \n", "between any point in the first space to the closet point in the second. This is especially useful \n", "as a similarity metric between trajectories.\n", "\n", "\n", "\n", "[Hausdorff distance](https://en.wikipedia.org/wiki/Hausdorff_distance)" ] }, { "cell_type": "code", "execution_count": 13, "id": "e75b0352-0f80-404d-a113-f301601cd5a3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 1 2 3 4 5 6 \\\n", "0 0.000000 0.034755 0.031989 0.031959 0.031873 0.038674 0.029961 \n", "1 0.030328 0.000000 0.038672 0.032086 0.031049 0.032170 0.032275 \n", "2 0.027640 0.030539 0.000000 0.036737 0.033055 0.043447 0.028812 \n", "3 0.031497 0.033380 0.035224 0.000000 0.032581 0.035484 0.030339 \n", "4 0.031079 0.032256 0.035731 0.039084 0.000000 0.036416 0.031369 \n", "\n", " 7 8 9 ... 388 389 390 391 \\\n", "0 0.029117 0.040962 0.033259 ... 0.031614 0.036447 0.035548 0.028233 \n", "1 0.030215 0.034443 0.032998 ... 0.030594 0.035665 0.031473 0.031916 \n", "2 0.031807 0.039269 0.033250 ... 0.031998 0.033636 0.034646 0.032615 \n", "3 0.034792 0.045755 0.031810 ... 0.033623 0.031359 0.034923 0.032287 \n", "4 0.030388 0.033751 0.034029 ... 0.030705 0.040339 0.034328 0.029027 \n", "\n", " 392 393 394 395 396 397 \n", "0 0.034176 0.030057 0.033863 0.031111 0.034590 0.033850 \n", "1 0.037483 0.033489 0.041403 0.029784 0.035374 0.038179 \n", "2 0.036681 0.030642 0.038432 0.032481 0.034810 0.036695 \n", "3 0.032808 0.029771 0.040891 0.030802 0.032279 0.038443 \n", "4 0.035645 0.027703 0.037529 0.029356 0.031260 0.035501 \n", "\n", "[5 rows x 398 columns]\n" ] } ], "source": [ "coordinates = sorted_trajectories[['x', 'y']].interleave_columns()\n", "spaces = cuspatial.GeoSeries.from_multipoints_xy(\n", " coordinates, trajectory_offsets\n", ")\n", "hausdorff_distances = cuspatial.core.spatial.distance.directed_hausdorff_distance(\n", " spaces\n", ")\n", "print(hausdorff_distances.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "993d7566-203e-4b87-adad-088e2fd92eed", "metadata": {}, "source": [ "### [cuspatial.haversine_distance](https://docs.rapids.ai/api/cuspatial/stable/api_docs/gis.html#cuspatial.haversine_distance)\n", "\n", "Haversine distance is the great circle distance between longitude and latitude pairs. cuSpatial \n", "uses the `lon/lat` ordering to better reflect the cartesian coordinates of great circle \n", "coordinates: `x/y`.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 14, "id": "70f66319-c4d2-4a93-ab98-0debcce4a719", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/491857862.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "data": { "text/plain": [ "0 9959.695143\n", "1 9803.166859\n", "2 9876.857085\n", "3 9925.097106\n", "4 9927.268486\n", "Name: None, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "polygons_first = gpu_dataframe['geometry'][0:10]\n", "polygons_second = gpu_dataframe['geometry'][10:20]\n", "\n", "points_first = polygons_first.polygons.xy[0:1000]\n", "points_second = polygons_second.polygons.xy[0:1000]\n", "\n", "first = cuspatial.GeoSeries.from_points_xy(points_first)\n", "second = cuspatial.GeoSeries.from_points_xy(points_second)\n", "\n", "# The number of coordinates in two sets of polygons vary, so\n", "# we'll just compare the first set of 1000 values here.\n", "distances_in_meters = cuspatial.haversine_distance(\n", " first, second\n", ")\n", "cudf.Series(distances_in_meters).head()" ] }, { "cell_type": "markdown", "id": "f21c1709-dfc1-411b-b9d5-ec5dc345367f", "metadata": {}, "source": [ "This above method reads the GeoPandas data from CPU memory into GPU memory and then cuSpatial processes it. If the data is already in a cuDF GPU dataframe, you can quickly calculate Haversine distances using the method below. This maximizes speed by keeping all the processing on the GPU and is very useful when working on large datasets." ] }, { "cell_type": "code", "execution_count": 15, "id": "c32741b0-c550-4d0f-b494-45191f5b60fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " latitude longitude atlanta_lat atlanta_lng atlanta_dist\n", "0 17.1167 -61.7833 33.749 -84.388 11961.556540\n", "1 17.1333 -61.7833 33.749 -84.388 11963.392729\n", "2 25.3330 55.5170 33.749 -84.388 12243.126130\n", "3 25.2550 55.3640 33.749 -84.388 12233.867463\n", "4 24.4330 54.6510 33.749 -84.388 12139.822218\n", "5 24.2620 55.6090 33.749 -84.388 12124.483127\n", "6 35.3170 69.0170 33.749 -84.388 13418.538383\n", "7 34.2100 62.2280 33.749 -84.388 13258.725239\n", "8 34.5660 69.2120 33.749 -84.388 13336.375942\n", "9 31.5000 65.8500 33.749 -84.388 12976.749248\n", "10 36.7167 3.2500 33.749 -84.388 13547.245294\n", "11 30.5667 2.8667 33.749 -84.388 12866.528267\n", "12 28.0500 9.6331 33.749 -84.388 12554.544289\n", "13 22.8000 5.4331 33.749 -84.388 11990.825098\n", "14 35.7297 0.6500 33.749 -84.388 13451.775999\n", "15 36.9700 7.7900 33.749 -84.388 13553.372737\n", "16 36.7800 3.0700 33.749 -84.388 13555.211584\n", "17 36.8000 3.0300 33.749 -84.388 13557.641136\n", "18 36.8000 3.0400 33.749 -84.388 13557.588738\n", "19 36.7200 4.0500 33.749 -84.388 13543.496327\n" ] } ], "source": [ "# Generate data to be used to create a cuDF dataframe. \n", "# The data to be processed by Haversine MUST be a Float.\n", "a = {\"latitude\":[17.1167, 17.1333, 25.333, 25.255, 24.433, 24.262, 35.317, 34.21, 34.566, 31.5, 36.7167, 30.5667, 28.05, 22.8, 35.7297, 36.97, 36.78, 36.8, 36.8, 36.72],\n", " \"longitude\": [-61.7833, -61.7833, 55.517, 55.364, 54.651, 55.609, 69.017, 62.228, 69.212, 65.85, 3.25, 2.8667, 9.6331, 5.4331, 0.65, 7.79, 3.07, 3.03, 3.04, 4.05]}\n", "df = cudf.DataFrame(data=a)\n", "\n", "# Create cuSpatial GeoSeries from cuDF Dataframe\n", "cuGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['longitude', 'latitude']].interleave_columns())\n", "\n", "# Create Comparator cuSpatial GeoSeries from a comparator point\n", "df['atlanta_lat'] = 33.7490\n", "df['atlanta_lng'] = -84.3880\n", "atlGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['atlanta_lat', 'atlanta_lng']].interleave_columns())\n", "\n", "# Calculate Haversine Distance of cuDF dataframe to comparator point\n", "df['atlanta_dist'] = cuspatial.haversine_distance(cuGeoSeries, atlGeoSeries)\n", "print(df.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7f2239c5-58d0-4912-9bd7-246cc6741c0a", "metadata": {}, "source": [ "### Pairwise distance\n", "\n", "`pairwise_linestring_distance` computes the distance between a `GeoSeries` of Linestrings of \n", "length `n` and a corresponding `GeoSeries` of Linestrings of `n` length. It returns the \n", "minimum distance from any point in the first linestring of the pair to the nearest segment \n", "or point within the second Linestring of the pair.\n", "\n", "The input accepts a pair of geoseries as input sequences of linestring arrays.\n", "\n", "The below example uses the polygons from `naturalearth_lowres` and treats them as linestrings. \n", "The first example computes the distances between all polygons and themselves, while the second \n", "example computes the distance between the first 50 polygons and the second 50 polygons.\n", "\n", "### [cuspatial.pairwise_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_linestring_distance)" ] }, { "cell_type": "code", "execution_count": 16, "id": "35dfb7c9-1914-488a-b22e-8d0067ea7a8b", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/1097934054.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0 0.0\n", "1 0.0\n", "2 0.0\n", "3 0.0\n", "4 0.0\n", "dtype: float64\n", "0 152.200610\n", "1 44.076445\n", "2 2.417269\n", "3 44.197151\n", "4 75.821029\n", "dtype: float64\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "\n", "gpu_boundaries = cuspatial.from_geopandas(host_dataframe.geometry.boundary)\n", "zeros = cuspatial.pairwise_linestring_distance(\n", " gpu_boundaries[0:50],\n", " gpu_boundaries[0:50]\n", ")\n", "print(zeros.head())\n", "lines1 = gpu_boundaries[0:50]\n", "lines2 = gpu_boundaries[50:100]\n", "distances = cuspatial.pairwise_linestring_distance(\n", " lines1, lines2\n", ")\n", "print(distances.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "de6b73ac-1b48-422c-8463-37367ad73507", "metadata": {}, "source": [ "`pairwise_point_linestring_distance` computes the distance between pairs of points and \n", "linestrings. It can be used with polygons treated as linestrings as well. In the following \n", "example the minimum distance from a country's center to it's border is computed.\n", "\n", "### [cuspatial.pairwise_point_linestring_distance](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.pairwise_point_linestring_distance)\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 17, "id": "40e9a41e-21af-47cc-a142-b19a67941f7f", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/2846028812.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pop_est continent name iso_a3 gdp_md_est \\\n", "1 58005463.0 Africa Tanzania TZA 63177 \n", "2 603253.0 Africa W. Sahara ESH 907 \n", "5 18513930.0 Asia Kazakhstan KAZ 181665 \n", "6 33580650.0 Asia Uzbekistan UZB 57921 \n", "11 86790567.0 Africa Dem. Rep. Congo COD 50400 \n", "\n", " geometry border_distance \n", "1 POLYGON ((3774143.866 -105758.362, 3792946.708... 8047.288391 \n", "2 POLYGON ((-964649.018 3205725.605, -964597.245... 593137.492497 \n", "5 POLYGON ((9724867.413 6311418.173, 9640131.701... 37091.213890 \n", "6 POLYGON ((6230350.563 5057973.384, 6225978.591... 278633.467299 \n", "11 POLYGON ((3266113.592 -501451.658, 3286149.877... 35812.988244 \n", "(GPU)\n", "\n" ] } ], "source": [ "# Convert input dataframe to Pseudo-Mercator projection.\n", "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n", "polygons = host_dataframe[host_dataframe['geometry'].type == \"Polygon\"]\n", "gpu_polygons = cuspatial.from_geopandas(polygons)\n", "# Extract mean_x and mean_y from each country\n", "mean_x = [gpu_polygons['geometry'].iloc[[ix]].polygons.x.mean() for ix in range(len(gpu_polygons))]\n", "mean_y = [gpu_polygons['geometry'].iloc[[ix]].polygons.y.mean() for ix in range(len(gpu_polygons))]\n", "# Convert mean_x/mean_y values into Points for use in API.\n", "points = cuspatial.GeoSeries([Point(point) for point in zip(mean_x, mean_y)])\n", "# Convert Polygons into Linestrings for use in API.\n", "linestring_df = cuspatial.from_geopandas(geopandas.geoseries.GeoSeries(\n", " [MultiLineString(mapping(polygons['geometry'].iloc[ix])[\"coordinates\"]) for ix in range(len(polygons))]\n", "))\n", "gpu_polygons['border_distance'] = cuspatial.pairwise_point_linestring_distance(\n", " points, linestring_df\n", ")\n", "print(gpu_polygons.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "008d320d-ca47-459f-9fff-8769494c8a61", "metadata": {}, "source": [ "### cuspatial.pairwise_point_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 18, "id": "258c9a8c-7fe3-4047-80b7-00878d9fb2f1", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/3261459244.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_cities' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " cities = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_cities\")).to_crs(3857)\n", "/tmp/ipykernel_5364/3261459244.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pop_estcontinentnameiso_a3gdp_md_estgeometrydistance_fromdistance
0889953.0OceaniaFijiFJI5496MULTIPOLYGON (((20037508.343 -1812498.413, 200...Vatican City1.969350e+07
158005463.0AfricaTanzaniaTZA63177POLYGON ((3774143.866 -105758.362, 3792946.708...San Marino5.929777e+06
2603253.0AfricaW. SaharaESH907POLYGON ((-964649.018 3205725.605, -964597.245...Vaduz3.421172e+06
337589262.0North AmericaCanadaCAN1736425MULTIPOLYGON (((-13674486.249 6274861.394, -13...Lobamba1.296059e+07
4328239523.0North AmericaUnited States of AmericaUSA21433226MULTIPOLYGON (((-13674486.249 6274861.394, -13...Luxembourg8.174897e+06
\n", "
" ], "text/plain": [ " pop_est continent name iso_a3 gdp_md_est \\\n", "0 889953.0 Oceania Fiji FJI 5496 \n", "1 58005463.0 Africa Tanzania TZA 63177 \n", "2 603253.0 Africa W. Sahara ESH 907 \n", "3 37589262.0 North America Canada CAN 1736425 \n", "4 328239523.0 North America United States of America USA 21433226 \n", "\n", " geometry distance_from \\\n", "0 MULTIPOLYGON (((20037508.343 -1812498.413, 200... Vatican City \n", "1 POLYGON ((3774143.866 -105758.362, 3792946.708... San Marino \n", "2 POLYGON ((-964649.018 3205725.605, -964597.245... Vaduz \n", "3 MULTIPOLYGON (((-13674486.249 6274861.394, -13... Lobamba \n", "4 MULTIPOLYGON (((-13674486.249 6274861.394, -13... Luxembourg \n", "\n", " distance \n", "0 1.969350e+07 \n", "1 5.929777e+06 \n", "2 3.421172e+06 \n", "3 1.296059e+07 \n", "4 8.174897e+06 \n", "(GPU)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cities = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_cities\")).to_crs(3857)\n", "countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n", "\n", "gpu_cities = cuspatial.from_geopandas(cities)\n", "gpu_countries = cuspatial.from_geopandas(countries)\n", "\n", "dist = cuspatial.pairwise_point_polygon_distance(\n", " gpu_cities.geometry[:len(gpu_countries)], gpu_countries.geometry\n", ")\n", "\n", "gpu_countries[\"distance_from\"] = cities.name\n", "gpu_countries[\"distance\"] = dist\n", "\n", "gpu_countries.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e0b4d618", "metadata": {}, "source": [ "### cuspatial.pairwise_linestring_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 19, "id": "5863871e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namegeometry
index
0Columbus AvenueLINESTRING (-8234860.077 4980333.535, -8234863...
1West 80th StreetLINESTRING (-8235173.854 4980508.442, -8235160...
2Amsterdam AvenueLINESTRING (-8235173.854 4980508.442, -8235168...
3West 80th StreetLINESTRING (-8235369.475 4980617.398, -8235349...
4BroadwayLINESTRING (-8235369.475 4980617.398, -8235373...
\n", "
" ], "text/plain": [ " name geometry\n", "index \n", "0 Columbus Avenue LINESTRING (-8234860.077 4980333.535, -8234863...\n", "1 West 80th Street LINESTRING (-8235173.854 4980508.442, -8235160...\n", "2 Amsterdam Avenue LINESTRING (-8235173.854 4980508.442, -8235168...\n", "3 West 80th Street LINESTRING (-8235369.475 4980617.398, -8235349...\n", "4 Broadway LINESTRING (-8235369.475 4980617.398, -8235373..." ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# all driveways within 2km range of central park, nyc\n", "\n", "# The dataset is downloaded and processed as follows:\n", "# import osmnx as ox\n", "# graph = ox.graph_from_point((40.769361, -73.977655), dist=2000, network_type=\"drive\")\n", "# nodes, streets = ox.graph_to_gdfs(graph)\n", "# streets = streets.to_crs(3857)\n", "# streets = streets.reset_index(drop=True)\n", "# streets.index.name = \"index\"\n", "# streets[[\"name\", \"geometry\"]].to_csv(\"streets_3857.csv\")\n", "\n", "# The data is under notebooks/streets_3857.csv\n", "streets = pd.read_csv(\"./streets_3857.csv\", index_col=\"index\")\n", "streets.geometry = streets.geometry.apply(wkt.loads)\n", "streets = geopandas.GeoDataFrame(streets)\n", "streets.head()" ] }, { "cell_type": "code", "execution_count": 20, "id": "f4c67464", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geometry
index
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
0POLYGON ((-8236139.639 4975314.625, -8235990.3...
\n", "
" ], "text/plain": [ " geometry\n", "index \n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3...\n", "0 POLYGON ((-8236139.639 4975314.625, -8235990.3..." ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The polygon of the Empire State Building\n", "\n", "# The dataset is downloaded and processed as follows:\n", "# esb = ox.geometries.geometries_from_place('Empire State Building, New York', tags={\"building\": True})\n", "# esb = esb.to_crs(3857)\n", "# esb = esb.geometry.reset_index(drop=True)\n", "# esb.index.name = \"index\"\n", "# esb.to_csv(\"esb_3857.csv\")\n", "\n", "# The data is under notebooks/esb_3857.csv\n", "esb = pd.read_csv(\"./esb_3857.csv\", index_col=\"index\")\n", "esb.geometry = esb.geometry.apply(wkt.loads)\n", "esb = geopandas.GeoDataFrame(esb)\n", "esb = pd.concat([esb.iloc[0:1]] * len(streets))\n", "esb.head()" ] }, { "cell_type": "code", "execution_count": 21, "id": "d4e68e87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedist
0Columbus Avenue4993.583717
1West 80th Street5103.472213
2Amsterdam Avenue5208.373183
3West 80th Street5276.886560
4Broadway5178.999774
.........
1859East 70th Street4193.471831
1860West 65th Street3999.118604
1861Dyer Avenue1470.617115
1862Dyer Avenue1468.714127
1863Dyer Avenue1548.205363
\n", "

1864 rows × 2 columns

\n", "
" ], "text/plain": [ " name dist\n", "0 Columbus Avenue 4993.583717\n", "1 West 80th Street 5103.472213\n", "2 Amsterdam Avenue 5208.373183\n", "3 West 80th Street 5276.886560\n", "4 Broadway 5178.999774\n", "... ... ...\n", "1859 East 70th Street 4193.471831\n", "1860 West 65th Street 3999.118604\n", "1861 Dyer Avenue 1470.617115\n", "1862 Dyer Avenue 1468.714127\n", "1863 Dyer Avenue 1548.205363\n", "\n", "[1864 rows x 2 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Straight line distance between the driveways to the Empire State Building\n", "gpu_streets = cuspatial.from_geopandas(streets.geometry)\n", "gpu_esb = cuspatial.from_geopandas(esb.geometry)\n", "\n", "dist = cuspatial.pairwise_linestring_polygon_distance(gpu_streets, gpu_esb).rename(\"dist\")\n", "pd.concat([streets[\"name\"].reset_index(drop=True), dist.to_pandas()], axis=1)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3e1bb692", "metadata": {}, "source": [ "### cuspatial.pairwise_polygon_distance\n", "\n", "Using WGS 84 Pseudo-Mercator, distances are in meters." ] }, { "cell_type": "code", "execution_count": 22, "id": "951625da", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/3213563529.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n" ] } ], "source": [ "countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\")).to_crs(3857)\n", "gpu_countries = cuspatial.from_geopandas(countries)\n", "\n", "african_countries = gpu_countries[gpu_countries.continent == \"Africa\"].sort_values(\"pop_est\", ascending=False)\n", "asian_countries = gpu_countries[gpu_countries.continent == \"Asia\"].sort_values(\"pop_est\", ascending=False)" ] }, { "cell_type": "code", "execution_count": 23, "id": "2f0e1118", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AfricaAsiadist
0NigeriaChina7.383366e+06
1EthiopiaIndia2.883313e+06
2EgyptIndonesia6.776043e+06
3Dem. Rep. CongoPakistan4.227767e+06
4South AfricaBangladesh8.189086e+06
5TanzaniaJapan1.096947e+07
6KenyaPhilippines8.399282e+06
7UgandaVietnam7.773975e+06
8AlgeriaTurkey2.000180e+06
9SudanIran1.625828e+06
\n", "
" ], "text/plain": [ " Africa Asia dist\n", "0 Nigeria China 7.383366e+06\n", "1 Ethiopia India 2.883313e+06\n", "2 Egypt Indonesia 6.776043e+06\n", "3 Dem. Rep. Congo Pakistan 4.227767e+06\n", "4 South Africa Bangladesh 8.189086e+06\n", "5 Tanzania Japan 1.096947e+07\n", "6 Kenya Philippines 8.399282e+06\n", "7 Uganda Vietnam 7.773975e+06\n", "8 Algeria Turkey 2.000180e+06\n", "9 Sudan Iran 1.625828e+06" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Straight line distance between the top 10 most populated countries in Asia and Africa\n", "population_top10_africa = african_countries[:10].reset_index(drop=True)\n", "population_top10_asia = asian_countries[:10].reset_index(drop=True)\n", "dist = cuspatial.pairwise_polygon_distance(\n", " population_top10_africa.geometry, population_top10_asia.geometry)\n", "\n", "cudf.concat([\n", " population_top10_africa[\"name\"].rename(\"Africa\"),\n", " population_top10_asia[\"name\"].rename(\"Asia\"), \n", " dist.rename(\"dist\")], axis=1\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f5f27dc3-46ee-4a62-82de-20f76744382f", "metadata": {}, "source": [ "## Filtering\n", "\n", "The filtering module contains `points_in_spatial_window`, which returns from a set of points only those points that fall within a spatial window defined by four bounding coordinates: `min_x`, `max_x`, `min_y`, and `max_y`. The following example finds only the points of polygons that fall within 1 standard deviation of the mean of all of the polygons.\n", "\n", "\n", "\n", "### [cuspatial.points_in_spatial_window](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.points_in_spatial_window)" ] }, { "cell_type": "code", "execution_count": 24, "id": "d1ade9da-c9e2-45c4-9685-dffeda3fd358", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/3414785716.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0 POINT (33.90371 -0.95000)\n", "1 POINT (34.07262 -1.05982)\n", "2 POINT (37.69869 -3.09699)\n", "3 POINT (37.76690 -3.67712)\n", "4 POINT (39.20222 -4.67677)\n", "dtype: geometry\n" ] } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "gpu_dataframe = cuspatial.from_geopandas(host_dataframe)\n", "geometry = gpu_dataframe['geometry']\n", "points = cuspatial.GeoSeries.from_points_xy(geometry.polygons.xy)\n", "mean_x, std_x = (geometry.polygons.x.mean(), geometry.polygons.x.std())\n", "mean_y, std_y = (geometry.polygons.y.mean(), geometry.polygons.y.std())\n", "avg_points = cuspatial.points_in_spatial_window(\n", " points,\n", " mean_x - std_x,\n", " mean_x + std_x,\n", " mean_y - std_y,\n", " mean_y + std_y\n", ")\n", "print(avg_points.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5027a3dd-78bb-4d17-af94-506d0ed689c8", "metadata": {}, "source": [ "With some careful grouping, one can reconstruct the original complete polygons that fall within the range." ] }, { "attachments": {}, "cell_type": "markdown", "id": "3b33ce2b-965f-42a1-a89e-66d7ca80d907", "metadata": {}, "source": [ "## Set Operations" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d73548f3-c9bb-43ff-9788-858f3b7d08e4", "metadata": {}, "source": [ "### Linestring Intersections\n", "\n", "cuSpatial provides a linestring-linestring intersection algorithm to compute the overlapping geometries between two linestrings.\n", "The API also returns the ids for each returned geometry to help user to trace back the source geometry." ] }, { "cell_type": "code", "execution_count": 25, "id": "cc72a44d-a9bf-4432-9898-de899ac45869", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/199363072.py:3: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] } ], "source": [ "from cuspatial.core.binops.intersection import pairwise_linestring_intersection\n", "\n", "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "usa_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.name == \"United States of America\"].geometry.boundary)\n", "canada_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.name == \"Canada\"].geometry.boundary)\n", "\n", "list_offsets, geometries, look_back_ids = pairwise_linestring_intersection(usa_boundary, canada_boundary)" ] }, { "cell_type": "code", "execution_count": 26, "id": "1125fd17-afe1-4b9c-b48d-8842dd3700b3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "\n", "[\n", " 0,\n", " 142\n", "]\n", "dtype: int32" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The first integer series shows that the result contains 1 row (since we only have 1 pair of linestrings as input).\n", "# This row contains 144 geometires.\n", "list_offsets" ] }, { "cell_type": "code", "execution_count": 27, "id": "b281e3bb-42d4-4d60-9cb2-b7dcc20b4776", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 POINT (-130.53611 54.80275)\n", "1 POINT (-130.53611 54.80278)\n", "2 POINT (-130.53611 54.80275)\n", "3 POINT (-129.98000 55.28500)\n", "4 POINT (-130.53611 54.80278)\n", " ... \n", "137 LINESTRING (-120.00000 49.00000, -117.03121 49...\n", "138 LINESTRING (-122.84000 49.00000, -120.00000 49...\n", "139 LINESTRING (-117.03121 49.00000, -107.05000 49...\n", "140 LINESTRING (-83.89077 46.11693, -83.61613 46.1...\n", "141 LINESTRING (-82.69009 41.67511, -82.43928 41.6...\n", "Length: 142, dtype: geometry" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The second element is a geoseries that contains the intersecting geometries, with 144 rows, including points and linestrings.\n", "geometries" ] }, { "cell_type": "code", "execution_count": 28, "id": "e19873b9-2614-4242-ad67-caa47f807d04", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lhs_linestring_idlhs_segment_idrhs_linestring_idrhs_segment_id
0[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...[18, 16, 18, 15, 17, 14, 16, 13, 15, 14, 11, 1...[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...[9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 15, 15...
\n", "
" ], "text/plain": [ " lhs_linestring_id \\\n", "0 [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ... \n", "\n", " lhs_segment_id \\\n", "0 [18, 16, 18, 15, 17, 14, 16, 13, 15, 14, 11, 1... \n", "\n", " rhs_linestring_id \\\n", "0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n", "\n", " rhs_segment_id \n", "0 [9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 15, 15... " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The third element is a dataframe that contains IDs to the input segments and linestrings, 4 for each result row.\n", "# Each represents ids to lhs, rhs linestring and segment ids.\n", "look_back_ids" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a17bd64a", "metadata": {}, "source": [ "## Spatial Joins\n", "\n", "cuSpatial provides a number of functions to facilitate high-performance spatial joins, \n", "including unindexed and quadtree-indexed point-in-polygon and quadtree-indexed point to nearest\n", "linestring.\n", "\n", "The API for spatial joins does not yet match GeoPandas, but with knowledge of cuSpatial data formats\n", "you can call `cuspatial.point_in_polygon` for large numbers of points on 32 polygons or less, or\n", "call `cuspatial.quadtree_point_in_polygon` for large numbers of points and polygons. \n", "\n", "### Unindexed Point-in-polygon Join\n", "\n", "### [cuspatial.point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.point_in_polygon)" ] }, { "cell_type": "code", "execution_count": 29, "id": "bf7b2256", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/1271339229.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n" ] }, { "data": { "text/plain": [ "1 11896\n", "2 1268\n", "5 50835\n", "6 7792\n", "11 29318\n", "dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "host_dataframe = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "single_polygons = host_dataframe[host_dataframe['geometry'].type == \"Polygon\"]\n", "gpu_dataframe = cuspatial.from_geopandas(single_polygons)\n", "x_points = (cupy.random.random(10000000) - 0.5) * 360\n", "y_points = (cupy.random.random(10000000) - 0.5) * 180\n", "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n", "points = cuspatial.GeoSeries.from_points_xy(xy)\n", "\n", "short_dataframe = gpu_dataframe.iloc[0:31]\n", "geometry = short_dataframe['geometry']\n", "\n", "points_in_polygon = cuspatial.point_in_polygon(\n", " points, geometry\n", ")\n", "sum_of_points_in_polygons_0_to_31 = points_in_polygon.sum()\n", "sum_of_points_in_polygons_0_to_31.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fd9c4eef", "metadata": {}, "source": [ "cuSpatial includes another join algorithm, `quadtree_point_in_polygon` that uses an indexing \n", "quadtree for faster calculations. `quadtree_point_in_polygon` also supports a number of \n", "polygons limited only by memory constraints." ] }, { "attachments": {}, "cell_type": "markdown", "id": "387565b3-75ae-4789-a950-daffc2d4da01", "metadata": {}, "source": [ "### Quadtree Indexing\n", "\n", "The indexing module is used to create a spatial quadtree. Use \n", "```\n", "cuspatial.quadtree_on_points(\n", " points,\n", " x_min,\n", " x_max,\n", " y_min,\n", " y_max,\n", " scale,\n", " max_depth,\n", " max_size\n", ")\n", "```\n", "to create the quadtree object that is used by the `quadtree_point_in_polygon` \n", "function in the `join` module.\n", "\n", "The function uses a set of points and a user-defined bounding box to build an \n", "indexing quad tree. Be sure to adjust the parameters appropriately, with larger \n", "parameter values for larger datasets.\n", "\n", "`scale`: A scaling function that increases the size of the point space from an \n", "origin defined by `{x_min, y_min}`. This can increase the likelihood of generating \n", "well-separated quads.\n", "\n", "`max_depth`: In order for a quadtree to index points effectively, it must have a\n", "depth that is log-scaled with the size of the number of points. Each level of the \n", "quad tree contains 4 quads. The number of available quads $q$ for indexing is then \n", "equal to $q = 4^{d}$ where $d$ is the `max_depth` parameter. With an input size \n", "of `10m` points and `max_depth = 7`, $\\frac{10^6}{4^7}$ points will be most \n", "efficiently packed into the leaves of the quad tree.\n", "\n", "`max_size`: The maximum number of points allowed in an internal node before it is\n", "split into four leaf notes. As the quadtree is generated, a leaf node containing\n", "usable index points will be created as points are added. If the number of points\n", "in this leaf exceeds `max_size`, the leaf will be subdivided, with four new\n", "leaves added and the original node removed from the set of leaves. This number is\n", "probably optimized in most datasets by making it a significant fraction of the\n", "optimal leaf size computation from above. Consider $10,000,000 / 4^7 / 4 = 153$.\n", "\n", "### [cuspatial.quadtree_on_points](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_on_points)" ] }, { "cell_type": "code", "execution_count": 30, "id": "e3a0a9a3-0bdd-4f05-bcb5-7db4b99a44a3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1507\n", "1 1726\n", "2 4242\n", "3 7371\n", "4 11341\n", "dtype: uint32\n", " key level is_internal_node length offset\n", "0 0 0 True 4 2\n", "1 1 0 True 2 6\n", "2 0 1 True 4 8\n", "3 1 1 True 4 12\n", "4 2 1 True 2 16\n" ] } ], "source": [ "x_points = (cupy.random.random(10000000) - 0.5) * 360\n", "y_points = (cupy.random.random(10000000) - 0.5) * 180\n", "xy = cudf.DataFrame({\"x\": x_points, \"y\": y_points}).interleave_columns()\n", "points = cuspatial.GeoSeries.from_points_xy(xy)\n", "\n", "scale = 5\n", "max_depth = 7\n", "max_size = 125\n", "point_indices, quadtree = cuspatial.quadtree_on_points(points,\n", " x_points.min(),\n", " x_points.max(),\n", " y_points.min(),\n", " y_points.max(),\n", " scale,\n", " max_depth,\n", " max_size)\n", "print(point_indices.head())\n", "print(quadtree.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0ab7e64d-1199-498c-b020-b3e7393337a5", "metadata": {}, "source": [ "### Indexed Spatial Joins\n", "\n", "The quadtree spatial index (`point_indices` and `quadtree`) is used by `quadtree_point_in_polygon`\n", "and `quadtree_point_to_nearest_linestring` to accelerate larger spatial joins. \n", "`quadtree_point_in_polygon` depends on a number of intermediate products calculated here using the\n", "following functions.\n", "\n", "### [cuspatial.join_quadtree_and_bounding_boxes](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.join_quadtree_and_bounding_boxes)\n", "### [cuspatial.quadtree_point_in_polygon](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_in_polygon)" ] }, { "cell_type": "code", "execution_count": 31, "id": "023bd25a-35be-435d-ab0b-ecbd7a47e147", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Empty DataFrame\n", "Columns: [polygon_index, point_index]\n", "Index: []\n" ] } ], "source": [ "polygons = gpu_dataframe['geometry']\n", "\n", "poly_bboxes = cuspatial.polygon_bounding_boxes(\n", " polygons\n", ")\n", "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n", " quadtree,\n", " poly_bboxes,\n", " polygons.polygons.x.min(),\n", " polygons.polygons.x.max(),\n", " polygons.polygons.y.min(),\n", " polygons.polygons.y.max(),\n", " scale,\n", " max_depth\n", ")\n", "polygons_and_points = cuspatial.quadtree_point_in_polygon(\n", " intersections,\n", " quadtree,\n", " point_indices,\n", " points,\n", " polygons\n", ")\n", "print(polygons_and_points.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aa1552a1-fe4b-4d30-b76f-054a060593ae", "metadata": {}, "source": [ "You can see above that polygon 270 maps to the first 5 points. In order to bring this back to \n", "a specific row of the original dataframe, the individual polygons must be mapped back to their \n", "original MultiPolygon row. This is left an an exercise.\n", "\n", "### [cuspatial.quadtree_point_to_nearest_linestring](https://docs.rapids.ai/api/cuspatial/nightly/api_docs/spatial.html#cuspatial.quadtree_point_to_nearest_linestring)\n", "\n", "`cuspatial.quadtree_point_to_nearest_linestring` can be used to find the Polygon or Linestring \n", "nearest to a set of points from another set of mixed geometries. " ] }, { "cell_type": "code", "execution_count": 32, "id": "784aff8e-c9ed-4a81-aa87-bf301b3b90af", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_5364/2951982051.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "/tmp/ipykernel_5364/2951982051.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_cities' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.\n", " host_cities = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_cities\"))\n" ] } ], "source": [ "host_countries = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_lowres\"))\n", "host_cities = geopandas.read_file(geopandas.datasets.get_path(\"naturalearth_cities\"))\n", "gpu_countries = cuspatial.from_geopandas(host_countries[host_countries['geometry'].type == \"Polygon\"])\n", "gpu_cities = cuspatial.from_geopandas(host_cities[host_cities['geometry'].type == 'Point'])" ] }, { "cell_type": "code", "execution_count": 33, "id": "fea24c78-cf5c-45c6-b860-338238e61323", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " point_index linestring_index distance\n", "0 0 21 10.857363\n", "1 1 21 10.937690\n", "2 2 19 0.522859\n", "3 3 19 0.050204\n", "4 4 129 0.104261\n" ] } ], "source": [ "polygons = gpu_countries['geometry'].polygons\n", "\n", "boundaries = cuspatial.GeoSeries.from_linestrings_xy(\n", " cudf.DataFrame({\"x\": polygons.x, \"y\": polygons.y}).interleave_columns(),\n", " polygons.ring_offset,\n", " cupy.arange(len(polygons.ring_offset))\n", ")\n", "\n", "point_indices, quadtree = cuspatial.quadtree_on_points(gpu_cities['geometry'],\n", " polygons.x.min(),\n", " polygons.x.max(),\n", " polygons.y.min(),\n", " polygons.y.max(),\n", " scale,\n", " max_depth,\n", " max_size)\n", "poly_bboxes = cuspatial.linestring_bounding_boxes(\n", " boundaries,\n", " 2.0\n", ")\n", "intersections = cuspatial.join_quadtree_and_bounding_boxes(\n", " quadtree,\n", " poly_bboxes,\n", " polygons.x.min(),\n", " polygons.x.max(),\n", " polygons.y.min(),\n", " polygons.y.max(),\n", " scale,\n", " max_depth\n", ")\n", "result = cuspatial.quadtree_point_to_nearest_linestring(\n", " intersections,\n", " quadtree,\n", " point_indices,\n", " gpu_cities['geometry'],\n", " boundaries\n", ")\n", "print(result.head())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3e4e07f6", "metadata": {}, "source": [ "_Images used with permission from Wikipedia Creative Commons_" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "vscode": { "interpreter": { "hash": "ef2a625a21f49284d4111fd61c77079c8ec37c2ac9f170a08eb051e93ed3e888" } } }, "nbformat": 4, "nbformat_minor": 5 }