{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Multiprocessing**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kenoz/SITS_utils/blob/main/docs/source/tutorials/colab_sits_ex04.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "K2FIkbDYrq9l" }, "source": [ "---\n", "\n", "We aim to retrieve satellite time series for a set of points randomly located in Europe. Rather than processing the points sequentially, we use here the capacities offered by the `sits.Multiproc()` class to distribute the calculations and thus optimize the processing times.\n", "\n", "

\"random

\n", "

\n", "\n", "\n", "> _`sits.Multiproc()` method needs a multi-core CPU to work efficiently._\n", "\n", "---\n", "\n", "## 1. Installation of SITS package and its depedencies\n", "\n", "First, install `sits` package with [pip](https://pypi.org/project/SITS/). We also need some other packages for displaying data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 33951, "status": "ok", "timestamp": 1738164860034, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "xoL6NstiVcp9", "outputId": "07ba85a1-4718-4771-db40-2b371dd55b9b" }, "outputs": [], "source": [ "# SITS package\n", "!pip install -q --upgrade sits\n", "\n", "# other packages\n", "!pip install -q \"dask[dataframe]\"\n", "!pip install -q mapclassify\n", "#!pip install -q netCDF4\n", "#!pip install -q folium\n", "#!pip install -q matplotlib" ] }, { "cell_type": "markdown", "metadata": { "id": "OYAb0l91zqj4" }, "source": [ "Now we can import `sits` and some other libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "executionInfo": { "elapsed": 11651, "status": "ok", "timestamp": 1738164875497, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "neqafgGHWIkj" }, "outputs": [], "source": [ "import os\n", "# sits lib\n", "from sits import sits\n", "# geospatial libs\n", "import geopandas as gpd\n", "import pandas as pd\n", "# date format\n", "from datetime import datetime\n", "# ignore warnings messages\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": { "id": "o4n__qxr0Kqi" }, "source": [ "## 2. Handling the input vector file\n", "\n", "### 2.1. Data loading\n", "\n", "The geojson vector file, stored in the [Github repository](https://github.com/kenoz/SITS_utils), includes 24 points over Europe. We download it into our current workspace. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 732, "status": "ok", "timestamp": 1738164882882, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "soIORtc8bGDd", "outputId": "6cb289c1-0ecd-4958-ff6b-c461c70f67d2" }, "outputs": [], "source": [ "!mkdir -p test_data\n", "![ ! -f test_data/rand_pts.geojson ] && wget https://raw.githubusercontent.com/kenoz/SITS_utils/refs/heads/main/sits/data/rand_pts.geojson -P test_data" ] }, { "cell_type": "markdown", "metadata": { "id": "n0jbqcwQ1ppL" }, "source": [ "We load the vector file, named `rand_pts.geojson`, as a geoDataFrame object with the `sits` method: `sits.Vec2gdf()`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 715, "status": "ok", "timestamp": 1738164886246, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "7MWDWpr7nxB3", "outputId": "ccf75a57-6edf-4bd1-ba96-5499a2a0f34c" }, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idpt_idgeometry
011POINT (8.49138 49.85437)
132POINT (8.41277 53.14555)
273POINT (11.17678 50.01380)
394POINT (23.79724 40.06894)
4105POINT (16.80020 48.98809)
\n", "
" ], "text/plain": [ " id pt_id geometry\n", "0 1 1 POINT (8.49138 49.85437)\n", "1 3 2 POINT (8.41277 53.14555)\n", "2 7 3 POINT (11.17678 50.01380)\n", "3 9 4 POINT (23.79724 40.06894)\n", "4 10 5 POINT (16.80020 48.98809)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_dir = 'test_data'\n", "random_pts = sits.Vec2gdf(os.path.join(data_dir, 'rand_pts.geojson'))\n", "random_pts.gdf.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 332, "status": "ok", "timestamp": 1738164889222, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "1Godd47g2qoL", "outputId": "d56be488-a437-4077-8a58-bb1f48834626" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epsg code for 'random_pts.gdf': 4326\n" ] } ], "source": [ "# check epsg\n", "print(f\"epsg code for 'random_pts.gdf': {random_pts.gdf.crs.to_epsg()}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ipDQq-9jqWjo" }, "source": [ "### 2.2. Buffer and Bounding box calculation\n", "\n", "We check the coordinate reference system (CRS). We calculate a polygon for each point according to a given buffer distance with the method `set_buffer()` of class `sits.Vec2gdf`. Then we extract the bounding box with the method `set_bbox()` of class `sits.Vec2gdf`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "executionInfo": { "elapsed": 451, "status": "ok", "timestamp": 1738164892681, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "NeEJovzoICUn" }, "outputs": [], "source": [ "# buffer distance of 0.01 degree (1.11 km approx.)\n", "random_pts.set_buffer('gdf', 0.01)\n", "\n", "# bbox of buffer polygon\n", "random_pts.set_bbox('buffer')" ] }, { "cell_type": "markdown", "metadata": { "id": "s_EwD_3SyNqF" }, "source": [ "We display the `sits.Vec2gdf` objects on an interactive map.\n", "\n", "* `.gdf` _in green_\n", "* `.buffer` _in blue_\n", "* `.bbox` in _red_" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 321 }, "executionInfo": { "elapsed": 7795, "status": "ok", "timestamp": 1738164902990, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "G4br5m8AoOKQ", "outputId": "2b9c9ffd-a527-4d1b-8b6c-1445b768b470" }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import folium\n", "\n", "f = folium.Figure(height=300)\n", "m = folium.Map(location=[45.0, 10], zoom_start=4).add_to(f)\n", "random_pts.gdf.explore(m=m, height=400, color='green')\n", "random_pts.buffer.explore(m=m, height=400)\n", "random_pts.bbox.explore(m=m, height=400, color='red')" ] }, { "cell_type": "markdown", "metadata": { "id": "wBDpvyb0y6ko" }, "source": [ "### 2.3. CRS management\n", "\n", "In order to request data on a STAC catalog, we need to provide the bounding box coordinates in Lat/Long, i.e the EPSG:4326. Then we also need to specify in which CRS we want to obtain the satellite time series. As we are working in Europe, one of the most appropriate CRS is the EPSG 3035 (ETRS89-extended).\n", "\n", "Here we calculate the coordinates in EPSG:4326 and EPSG:3035. Since there are several features (polygons), we keep the coordinates into a dataframe." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 226, "status": "ok", "timestamp": 1738164908671, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "fvx9nHWDjjyN", "outputId": "e37643e0-0906-47e0-d5d5-749ebc0677ee", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bbox_4326bbox_3035bbox_tuple
0[8.481383388733258, 49.84437017219194, 8.50138...[4211764.670013768, 2971302.3517032275, 421324...([8.481383388733258, 49.84437017219194, 8.5013...
1[8.402773153496353, 53.135550217099215, 8.4227...[4214105.200943511, 3337504.924185555, 4215492...([8.402773153496353, 53.135550217099215, 8.422...
2[11.166778399392497, 50.00380142653234, 11.186...[4404616.995128865, 2988598.266294405, 4406085...([11.166778399392497, 50.00380142653234, 11.18...
3[23.78724480625752, 40.05894472525313, 23.8072...[5495790.682699846, 1992060.233281068, 5497840...([23.78724480625752, 40.05894472525313, 23.807...
4[16.790197637816156, 48.978094737864616, 16.81...[4817203.221624014, 2896784.6045746827, 481886...([16.790197637816156, 48.978094737864616, 16.8...
\n", "
" ], "text/plain": [ " bbox_4326 \\\n", "0 [8.481383388733258, 49.84437017219194, 8.50138... \n", "1 [8.402773153496353, 53.135550217099215, 8.4227... \n", "2 [11.166778399392497, 50.00380142653234, 11.186... \n", "3 [23.78724480625752, 40.05894472525313, 23.8072... \n", "4 [16.790197637816156, 48.978094737864616, 16.81... \n", "\n", " bbox_3035 \\\n", "0 [4211764.670013768, 2971302.3517032275, 421324... \n", "1 [4214105.200943511, 3337504.924185555, 4215492... \n", "2 [4404616.995128865, 2988598.266294405, 4406085... \n", "3 [5495790.682699846, 1992060.233281068, 5497840... \n", "4 [4817203.221624014, 2896784.6045746827, 481886... \n", "\n", " bbox_tuple \n", "0 ([8.481383388733258, 49.84437017219194, 8.5013... \n", "1 ([8.402773153496353, 53.135550217099215, 8.422... \n", "2 ([11.166778399392497, 50.00380142653234, 11.18... \n", "3 ([23.78724480625752, 40.05894472525313, 23.807... \n", "4 ([16.790197637816156, 48.978094737864616, 16.8... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# extraction of bbox coordinates in EPSG:4326\n", "bbox_latlong = pd.concat([random_pts.bbox, random_pts.bbox['geometry'].bounds], axis=1)\n", "bbox_latlong['bbox_4326'] = bbox_latlong[['minx', 'miny', 'maxx', 'maxy']].values.tolist()\n", "\n", "# extraction of bbox coordinates in EPSG:3035\n", "bbox_3035 = random_pts.bbox.to_crs(3035)\n", "test_3035_bounds = pd.concat([bbox_3035, bbox_3035['geometry'].bounds], axis=1)\n", "test_3035_bounds['bbox_3035'] = test_3035_bounds[['minx', 'miny', 'maxx', 'maxy']].values.tolist()\n", "\n", "# concatenation of both coordinates (EPSG:4326 + EPSG:3035)\n", "test_process = pd.concat([bbox_latlong['bbox_4326'], test_3035_bounds['bbox_3035']], axis=1)\n", "test_process['bbox_tuple'] = test_process.apply(lambda row: (row['bbox_4326'], row['bbox_3035']), axis=1)\n", "\n", "# quicklook of the output table\n", "test_process.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "RZ8CtvFHtH78" }, "source": [ "## 3. Multiprocessing approach\n", "\n", "### 3.1. How does it work?\n", "\n", "If you need to process several points or polygons, we recommand the use of `sits.Multiproc()` class. This class call in the background the `sits.stacAttack()` class, distributing the process through the available CPUs.\n", "\n", "You can tune the process with the `sits.Multiproc().addParams_*()` methods:\n", "\n", "* `Multiproc().addParams_stacAttack()`: configure the `sits.stacAttack()` instance,\n", "* `Multiproc().addParams_searchItems()`: configure the `sits.stacAttack.searchItems()` method,\n", "* `Multiproc().addParams_loadCube()`: configure the `sits.stacAttack().loadCube()` method,\n", "* `Multiproc().addParams_mask()`: configure the `sits.stacAttack().mask()` method.\n", "\n", "Then the `Multiproc().fetch_func()` calls `dask.delayed()` that is a function that defers execution of Python code, building a task graph for parallel computation. It turns functions/operations into lazy tasks. `Multiproc().dask_compute()` calls `dask.compute()` that schedules and runs efficiently (e.g., on multiple cores or a cluster) these operations.\n", "\n", "### 3.2. Producing images from the vector layer\n", "\n", "Here is an example of parallelization to produce images that have the same dimensions as the input bounding boxes (_see 'image' argument_). So the output images share only the same spatial resolution. The dimensions (size in x and y) will different.\n", "\n", "

\"multiproc_image\"

\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 409682, "status": "ok", "timestamp": 1738168270835, "user": { "displayName": "kose tetistraining", "userId": "06823399031118728700" }, "user_tz": -60 }, "id": "2wXfY_z_jjyN", "outputId": "98fad65a-c984-4563-e267-5d75baa2dca3", "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "multi = sits.Multiproc('image', 'nc', data_dir)\n", "\n", "multi.addParams_stacAttack(bands=['B03', 'B04', 'B08', 'SCL'])\n", "multi.addParams_searchItems(date_start=datetime(2024, 1, 1),\n", " date_end=datetime(2025, 1, 1),\n", " query={\"eo:cloud_cover\": {\"lt\": 10}})\n", "multi.addParams_loadCube(resolution=20)\n", "multi.addParams_mask(mask_values=[0, 1, 3, 8, 9, 10])\n", "\n", "for gid, i in enumerate(test_process['bbox_tuple'][:2]): # here we process only the two first images, remove or modify the slicing\n", " multi.fetch_func(i[0], i[1], gid, mask=True, gapfill=True)\n", "multi.dask_compute();" ] }, { "cell_type": "markdown", "metadata": { "id": "ww_Zjvx9391P" }, "source": [ "### 3.3. Producing patches from the vector layer\n", "\n", "It is also possible to specify the output image size. The option, called \"patch\", refers to a small, localized region or segment of an input image. These patches need to be of the same size in deep leaning models to ensure consistent processing, especially in architectures like convolutional neural networks (CNNs) or vision transformers (ViTs).\n", "\n", "

\"multiproc_patch\"

\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-YudS0E8jjyN", "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "multi = sits.Multiproc('patch', 'nc', data_dir)\n", "\n", "multi.addParams_stacAttack(bands=['B03', 'B04', 'B08', 'SCL'])\n", "multi.addParams_searchItems(date_start=datetime(2024, 1, 1),\n", " date_end=datetime(2025, 2, 1),\n", " query={\"eo:cloud_cover\": {\"lt\": 10}})\n", "multi.addParams_loadCube(dimx=10, dimy=10, resolution=20)\n", "multi.addParams_mask()\n", "\n", "for gid, i in enumerate(test_process['bbox_tuple'][:2]): # here we process only the two first patches, remove or modify the slicing\n", " multi.fetch_func(i[0], i[1], gid, mask=True, gapfill=True)\n", "multi.dask_compute();" ] } ], "metadata": { "colab": { "provenance": [ { "file_id": "https://github.com/kenoz/SITS_utils/blob/main/examples/colab_sits_ex02.ipynb", "timestamp": 1738147041012 } ] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 4 }