Skip to content

fix: build StaticTable version-hint metadata locations with forward slashes#3604

Open
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:fix/static-table-version-hint-uri-separator
Open

fix: build StaticTable version-hint metadata locations with forward slashes#3604
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:fix/static-table-version-hint-uri-separator

Conversation

@anxkhn

@anxkhn anxkhn commented Jul 5, 2026

Copy link
Copy Markdown

Rationale for this change

StaticTable._metadata_location_from_version_hint builds the version-hint.text
location and the resolved metadata-file location with os.path.join:

version_hint_location = os.path.join(metadata_location, "metadata", "version-hint.text")
...
return os.path.join(metadata_location, "metadata", content)

Iceberg locations are URIs (s3://, gs://, abfss://, file://, ...) and are
always forward-slash separated. os.path.join uses the host OS separator, so on
Windows it produces backslash-separated keys such as
s3://warehouse/wh/nyc.db/taxis\metadata\version-hint.text. That is an invalid
object-store key, so FileIO.new_input(...) cannot find the file and the lookup
fails.

This breaks the documented "pass the table root path" usage of
StaticTable.from_metadata on Windows. The API docs advertise exactly this path:

If your table metadata directory contains a version-hint.text file, you can
just specify the table root path, and PyIceberg will read the version hint file
and load the latest metadata file.
StaticTable.from_metadata("s3://warehouse/wh/nyc.db/taxis")

This is the only place in the codebase that builds a table/metadata location
with os.path.join; every other location builder already uses explicit forward
slashes (for example pyiceberg/table/locations.py, and the metadata-path helpers
in pyiceberg/catalog/__init__.py). This change makes
_metadata_location_from_version_hint consistent with them:

metadata_dir = f"{metadata_location.rstrip('/')}/metadata"
version_hint_location = f"{metadata_dir}/version-hint.text"

Behavior is unchanged on POSIX. The now-unused os import (this was its only use
in the module) is removed to keep the linter happy.

Are these changes tested?

Yes. tests/table/test_init.py gains
test_static_table_version_hint_location_uses_forward_slashes, a regression test
that is deliberately OS-independent: it swaps os.path.join for ntpath.join
(the Windows join) via monkeypatch, stubs load_file_io with a small fake
FileIO, and asserts the resolved metadata location stays a valid forward-slash
URI (.../metadata/version-hint.text and .../metadata/<file>.metadata.json,
with no backslash). It is parametrized over the three version-hint content forms
the method handles (full *.metadata.json filename, a numeric version, and a
non-numeric value). It fails on the old os.path.join code (backslash separators)
and passes after this change, on any platform.

The pre-existing test_static_table_version_hint_same_as_table only exercises a
local temp path, where the separator is already / on Linux CI, so it never
caught this. Windows is not yet in CI (tracked separately in #2477), which is why
the OS-independent approach is used here.

  • pytest tests/table/test_init.py -k version_hint -> passes (existing + 3 new cases).
  • pytest tests/table/test_init.py -> passes.
  • pytest tests/test_serializers.py -> passes (exercises the real
    from_metadata -> _metadata_location_from_version_hint path over local files).
  • make lint -> clean.

The integration suite (Docker + Spark) was not run locally; this change is covered
by the unit tests above and does not alter POSIX behavior.

Are there any user-facing changes?

No API or signature changes. This is a bug fix: StaticTable.from_metadata(<table root>) and StaticTable._metadata_location_from_version_hint now build correct
forward-slash metadata URIs on Windows. POSIX behavior is unchanged.

Related: #2477 (infra: run tests for windows).

…lashes

StaticTable._metadata_location_from_version_hint built the version-hint and
resolved metadata locations with os.path.join. Iceberg locations are URIs and
must always be forward-slash separated, but os.path.join uses the OS separator,
so on Windows it produced backslash-separated keys such as
s3://.../taxis\metadata\version-hint.text. That is an invalid object-store key,
which breaks the documented "pass the table root path" usage of
StaticTable.from_metadata on Windows.

Join the components with explicit forward slashes, matching how locations are
built elsewhere in pyiceberg (for example pyiceberg/table/locations.py). Behavior
is unchanged on POSIX. The now-unused os import is removed.

Add an OS-independent regression test that swaps os.path.join for the Windows
join so the metadata location is asserted to stay a valid forward-slash URI on
any platform (the existing test only exercised local temp paths, which use the
POSIX separator on Linux CI and never caught this).

Related: apache#2477
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant