Docs add rustac quickstart#319
Conversation
Companion notebook for the rustac tutorial: queries the Sentinel-2 L2A STAC-GeoParquet snapshot over Portland via rustac's bundled DuckDB engine, with property filters, an Arrow/Parquet write, and a GeoPandas bridge. Verified end-to-end against the live Planetary Computer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Key benefits list and a We'll setup line to the intro, and close with a You're done recap, matching the lonboard quickstart's structure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l-quickstart reference) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| "import pystac_client\n", | ||
| "\n", | ||
| "catalog = pystac_client.Client.open(\"https://planetarycomputer.microsoft.com/api/stac/v1\")\n", | ||
| "asset = catalog.get_collection(\"sentinel-2-l2a\").assets[\"geoparquet-items\"]\n", | ||
| "asset.href" |
There was a problem hiding this comment.
Since this is a rustac tutorial, we can use rustac to search as well.
| "source": [ | ||
| "## Query the snapshot", | ||
| "", | ||
| "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.", |
There was a problem hiding this comment.
I think it calls them "part" a couple other places as well, should be partition.
| "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.", | |
| "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly partition files.", |
| "## Find the GeoParquet snapshot", | ||
| "", | ||
| "Every Planetary Computer collection carries a collection-level `geoparquet-items` asset that points at its STAC-items snapshot. The href resolves to a directory of monthly `part-NNNN` Parquet files on the `pcstacitems` storage account.", | ||
| "", | ||
| "**Expected result:** `abfs://items/sentinel-2-l2a.parquet`." |
There was a problem hiding this comment.
It might be useful to list the partition files, e.g. with obstore.
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
from obstore.store import AzureStore
store = AzureStore(
credential_provider=PlanetaryComputerCredentialProvider(
"https://pcstacitems.blob.core.windows.net/items"
)
)
for list_stream in store.list(prefix="sentinel-2-l2a.parquet"):
for object_meta in list_stream:
print(object_meta)| "", | ||
| "Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.", | ||
| "", | ||
| "`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.", |
There was a problem hiding this comment.
We can get a count of the number of parquet files from the obstore list that I suggest above.
| "`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.", | |
| "`DuckdbClient.search` is synchronous (no `await`); this is a search over X parquet files without an index, so expect it to take a couple of minutes.", |
| "outputs": [], | ||
| "source": [ | ||
| "items = client.search(\n", | ||
| " SNAPSHOT,\n", |
There was a problem hiding this comment.
Since we're narrowing the datetime, we could show how you can speed up the query by narrowing the glob to something like az://items/sentinel-2-l2a.parquet/part-*_2017-*.parquet.
| "source": [ | ||
| "## Authenticate the DuckDB engine", | ||
| "", | ||
| "The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.", |
There was a problem hiding this comment.
I don't know what "looks public" means.
| "The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.", | |
| "Reading from the account needs a Shared Access Signature (SAS) token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.", |
| "source": [ | ||
| "## You're done", | ||
| "", | ||
| "If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.", |
There was a problem hiding this comment.
| "If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.", | |
| "If every cell above ran, you used a spatial, temporal, and cloud-cover filter to read matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame.", |
Draft rustac quickstart notebook: filter, export, and GeoPandas-bridge Sentinel-2 items from the GeoParquet snapshot.
Related to the tutorial pr here: microsoft/PlanetaryComputerDataCatalog#533