microsoft · aboydnw · Jun 29, 2026 · Jul 1, 2026 · Jul 1, 2026 · Jul 1, 2026
diff --git a/quickstarts/rustac.ipynb b/quickstarts/rustac.ipynb
@@ -0,0 +1,257 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Querying Planetary Computer STAC items with rustac",
+    "",
+    "This notebook queries the [Sentinel-2 L2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a) STAC-GeoParquet snapshot with [rustac](https://github.com/stac-utils/rustac-py), a Rust-powered STAC toolkit that reads GeoParquet through a bundled DuckDB engine. Key benefits:",
+    "",
+    "1. **Filter pushdown**: spatial, temporal, and property predicates go into the Parquet read, so only matching items cross the network.",
+    "2. **No download step**: query the snapshot in place instead of pulling the whole file into pandas first.",
+    "3. **Bundled engine**: the DuckDB reader ships with rustac, so there's no separate database to install.",
+    "4. **Arrow-native**: results come back as an Arrow table you can write to Parquet or hand to GeoPandas without a Python-dictionary round-trip.",
+    "5. **Bulk scale**: built for scanning across many items at once, where the STAC API would need many paged requests.",
+    "",
+    "We'll pull a handful of low-cloud Sentinel-2 scenes over Portland, Oregon from the historical snapshot, then filter them, export to Parquet, and bridge them into GeoPandas.",
+    "",
+    "The companion [rustac tutorial](../overview/rustac.md) has the full narrative.",
+    "",
+    "> **Note on coverage:** the Sentinel-2 GeoParquet snapshot is a point-in-time export covering 2015-07 through 2018-10, not the live archive. Use it for bulk historical analysis. For current acquisitions, query the live STAC API with `pystac-client`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Install"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install --quiet 'rustac[arrow]' pystac-client planetary-computer geopandas pyarrow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Find the GeoParquet snapshot",
+    "",
+    "Every Planetary Computer collection carries a collection-level `geoparquet-items` asset that points at its STAC-items snapshot. The href resolves to a directory of monthly `part-NNNN` Parquet files on the `pcstacitems` storage account.",
+    "",
+    "**Expected result:** `abfs://items/sentinel-2-l2a.parquet`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pystac_client\n",
+    "\n",
+    "catalog = pystac_client.Client.open(\"https://planetarycomputer.microsoft.com/api/stac/v1\")\n",
+    "asset = catalog.get_collection(\"sentinel-2-l2a\").assets[\"geoparquet-items\"]\n",
+    "asset.href"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Authenticate the DuckDB engine",
+    "",
+    "The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
-    "The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
+    "Reading from the account needs a Shared Access Signature (SAS) token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
-    "The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
+    "Reading from the account needs a Shared Access Signature (SAS) token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
+    "",
+    "The `azure_transport_option_type = 'curl'` line is not optional. Without it, DuckDB's default Azure transport fails with an opaque SSL CA-certificate error. The SAS expires after about an hour, so long-running jobs re-fetch it.",
+    "",
+    "**Expected result:** a configured `client`, no output printed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import urllib.request\n",
+    "import rustac\n",
+    "\n",
+    "sas = json.load(urllib.request.urlopen(\n",
+    "    \"https://planetarycomputer.microsoft.com/api/sas/v1/token/pcstacitems/items\"\n",
+    "))[\"token\"]\n",
+    "\n",
+    "client = rustac.DuckdbClient()\n",
+    "client.execute(\"INSTALL azure; LOAD azure; SET azure_transport_option_type = 'curl';\")\n",
+    "client.execute(\n",
+    "    \"CREATE SECRET pc (TYPE azure, PROVIDER config, ACCOUNT_NAME 'pcstacitems', \"\n",
+    "    f\"CONNECTION_STRING 'BlobEndpoint=https://pcstacitems.blob.core.windows.net;SharedAccessSignature={sas}')\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Query the snapshot",
+    "",
+    "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",
-    "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",
+    "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly partition files.",
-    "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",
+    "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly partition files.",
+    "",
+    "`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.",
-    "`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.",
+    "`DuckdbClient.search` is synchronous (no `await`); this is a search over X parquet files without an index, so expect it to take a couple of minutes.",
-    "`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.",
+    "`DuckdbClient.search` is synchronous (no `await`); this is a search over X parquet files without an index, so expect it to take a couple of minutes.",
+    "",
+    "**Expected result:** a handful of Portland scenes (5)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SNAPSHOT = \"az://items/sentinel-2-l2a.parquet/*.parquet\"\n",
+    "\n",
+    "items = client.search(\n",
+    "    SNAPSHOT,\n",
+    "    collections=[\"sentinel-2-l2a\"],\n",
+    "    bbox=[-122.7, 45.5, -122.6, 45.6],\n",
+    "    datetime=\"2017-07-01/2017-08-01\",\n",
+    ")\n",
+    "len(items)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Filter on space, time, and properties",
+    "",
+    "`search()` accepts the STAC query parameters you would expect, including CQL2-JSON property filters. Here we add a cloud-cover threshold.",
+    "",
+    "**Expected result:** a list of low-cloud scenes over the wider summer window."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "items = client.search(\n",
+    "    SNAPSHOT,\n",
+    "    collections=[\"sentinel-2-l2a\"],\n",
+    "    bbox=[-122.7, 45.5, -122.6, 45.6],\n",
+    "    datetime=\"2017-06-01/2017-09-01\",\n",
+    "    filter={\"op\": \"<\", \"args\": [{\"property\": \"eo:cloud_cover\"}, 20]},\n",
+    ")\n",
+    "len(items)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Write results without materializing Python objects",
+    "",
+    "For bulk work, skip the round-trip through Python dictionaries. `search_to_arrow` takes the same arguments as `search` and returns an Arrow table. It is backed by `arro3` (rustac's Arrow runtime), so adopt it into PyArrow with `pa.table(...)`, a zero-copy hand-off, before writing it to a new Parquet file.",
+    "",
+    "**Expected result:** a `portland-2017.parquet` file written locally; `table.num_rows` is 5."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pyarrow as pa\n",
+    "import pyarrow.parquet as pq\n",
+    "\n",
+    "table = pa.table(client.search_to_arrow(\n",
+    "    SNAPSHOT,\n",
+    "    collections=[\"sentinel-2-l2a\"],\n",
+    "    bbox=[-122.7, 45.5, -122.6, 45.6],\n",
+    "    datetime=\"2017-07-01/2017-08-01\",\n",
+    "))\n",
+    "pq.write_table(table, \"portland-2017.parquet\")\n",
+    "table.num_rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Bridge to GeoPandas",
+    "",
+    "To analyze results as a GeoDataFrame, convert the Arrow table. The snapshot geometries are lon/lat, but the CRS is not carried on the Arrow table, so set it explicitly.",
+    "",
+    "**Expected result:** a 5-row GeoDataFrame with `Polygon` geometry."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import geopandas\n",
+    "\n",
+    "gdf = geopandas.GeoDataFrame.from_arrow(table).set_crs(4326)\n",
+    "gdf[[\"id\", \"datetime\", \"eo:cloud_cover\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## From metadata to pixels",
+    "",
+    "A search returns item metadata, not imagery. To read the actual rasters, pull asset hrefs from the results and sign them, then hand them to a reader such as [async-geotiff](../overview/async-geotiff.md) for windowed reads.",
+    "",
+    "**Expected result:** a signed HTTPS URL for the B04 (red) band of the first scene."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import planetary_computer\n",
+    "\n",
+    "href = planetary_computer.sign(items[0][\"assets\"][\"B04\"][\"href\"])\n",
+    "href"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## You're done",
+    "",
+    "If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.",
-    "If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.",
+    "If every cell above ran, you used a spatial, temporal, and cloud-cover filter to read matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame.",
-    "If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.",
+    "If every cell above ran, you used a spatial, temporal, and cloud-cover filter to read matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame.",
+    "",
+    "Swap in your own bbox, datetime window, or collection and the same pattern applies. For small or current-data queries, the live STAC API through `pystac-client` is the better tool. To ask analytical questions or join the catalog against other datasets, see the [DuckDB tutorial](../overview/duckdb.md), which uses the same authentication. For the imagery behind the metadata, hand a signed asset href to the [async-geotiff tutorial](../overview/async-geotiff.md)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.12.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}