Skip to content

Docs add rustac quickstart#319

Draft
aboydnw wants to merge 4 commits into
microsoft:mainfrom
aboydnw:docs-add-rustac-quickstart
Draft

Docs add rustac quickstart#319
aboydnw wants to merge 4 commits into
microsoft:mainfrom
aboydnw:docs-add-rustac-quickstart

Conversation

@aboydnw

@aboydnw aboydnw commented Jul 1, 2026

Copy link
Copy Markdown

Draft rustac quickstart notebook: filter, export, and GeoPandas-bridge Sentinel-2 items from the GeoParquet snapshot.

Related to the tutorial pr here: microsoft/PlanetaryComputerDataCatalog#533

aboydnw and others added 4 commits June 29, 2026 20:40
Companion notebook for the rustac tutorial: queries the Sentinel-2 L2A
STAC-GeoParquet snapshot over Portland via rustac's bundled DuckDB engine,
with property filters, an Arrow/Parquet write, and a GeoPandas bridge.
Verified end-to-end against the live Planetary Computer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Key benefits list and a We'll setup line to the intro, and close
with a You're done recap, matching the lonboard quickstart's structure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l-quickstart reference)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gadomski gadomski self-requested a review July 2, 2026 13:14
Comment thread quickstarts/rustac.ipynb
Comment on lines +57 to +61
"import pystac_client\n",
"\n",
"catalog = pystac_client.Client.open(\"https://planetarycomputer.microsoft.com/api/stac/v1\")\n",
"asset = catalog.get_collection(\"sentinel-2-l2a\").assets[\"geoparquet-items\"]\n",
"asset.href"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a rustac tutorial, we can use rustac to search as well.

Comment thread quickstarts/rustac.ipynb
"source": [
"## Query the snapshot",
"",
"Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it calls them "part" a couple other places as well, should be partition.

Suggested change
"Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",
"Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly partition files.",

Comment thread quickstarts/rustac.ipynb
Comment on lines +44 to +48
"## Find the GeoParquet snapshot",
"",
"Every Planetary Computer collection carries a collection-level `geoparquet-items` asset that points at its STAC-items snapshot. The href resolves to a directory of monthly `part-NNNN` Parquet files on the `pcstacitems` storage account.",
"",
"**Expected result:** `abfs://items/sentinel-2-l2a.parquet`."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to list the partition files, e.g. with obstore.

from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
from obstore.store import AzureStore

store = AzureStore(
    credential_provider=PlanetaryComputerCredentialProvider(
        "https://pcstacitems.blob.core.windows.net/items"
    )
)
for list_stream in store.list(prefix="sentinel-2-l2a.parquet"):
    for object_meta in list_stream:
        print(object_meta)

Comment thread quickstarts/rustac.ipynb
"",
"Point `search()` at the snapshot glob and pass the filters you want pushed into the read, so only matching rows cross the network. The glob (`/*.parquet`) spans the monthly part files.",
"",
"`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get a count of the number of parquet files from the obstore list that I suggest above.

Suggested change
"`DuckdbClient.search` is synchronous (no `await`); the bundled engine scans the part files for you, which takes a little time on the first call.",
"`DuckdbClient.search` is synchronous (no `await`); this is a search over X parquet files without an index, so expect it to take a couple of minutes.",

Comment thread quickstarts/rustac.ipynb
"outputs": [],
"source": [
"items = client.search(\n",
" SNAPSHOT,\n",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're narrowing the datetime, we could show how you can speed up the query by narrowing the glob to something like az://items/sentinel-2-l2a.parquet/part-*_2017-*.parquet.

Comment thread quickstarts/rustac.ipynb
"source": [
"## Authenticate the DuckDB engine",
"",
"The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "looks public" means.

Suggested change
"The account name looks public, but reads still need a SAS token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",
"Reading from the account needs a Shared Access Signature (SAS) token. The high-level `rustac.search()` coroutine cannot pass Azure credentials, so use `rustac.DuckdbClient` instead and configure its connection once: fetch a container SAS from the Planetary Computer token API, register an Azure secret, and switch the Azure transport to curl.",

Comment thread quickstarts/rustac.ipynb
"source": [
"## You're done",
"",
"If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"If every cell above ran, you pushed a spatial, temporal, and cloud-cover filter into a remote GeoParquet snapshot, read back only the matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame, with no full download.",
"If every cell above ran, you used a spatial, temporal, and cloud-cover filter to read matching Sentinel-2 scenes, wrote them to Parquet, and bridged them into GeoPandas: STAC-GeoParquet \u2192 filtered Arrow table \u2192 GeoDataFrame.",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants