Skip to content

fix: decode shell skill output as UTF-8#6288

Open
VectorPeak wants to merge 2 commits into
google:mainfrom
VectorPeak:skill-script-utf8-output
Open

fix: decode shell skill output as UTF-8#6288
VectorPeak wants to merge 2 commits into
google:mainfrom
VectorPeak:skill-script-utf8-output

Conversation

@VectorPeak

@VectorPeak VectorPeak commented Jul 3, 2026

Copy link
Copy Markdown

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

Problem:

What Problem This Solves

RunSkillScriptTool builds a generated Python wrapper when executing shell-based skill scripts. In the shell script path, that wrapper currently calls:

subprocess.run(..., capture_output=True, text=True)

without passing an explicit encoding or errors policy.

Because the wrapper requests text-mode captured output, Python decodes the child process stdout and stderr before the wrapper prints the JSON shell-result envelope back to ADK. When no explicit encoding is supplied, that decode path can depend on the platform default text encoding. On Windows environments using a non-UTF-8 default encoding such as cp936/gbk, UTF-8 bytes emitted by bash or a shell script can fail while subprocess is reading the captured stream.

In the local Windows full unit-test log, this appeared as a reader-thread failure inside Python's subprocess implementation:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 47: illegal multibyte sequence

That decode failure caused the generated wrapper to lose the captured stderr path. As a result, existing shell integration tests that expected stderr to map to warning or error instead observed success.

Changes

This PR makes the generated shell wrapper decode captured stdout/stderr with an explicit UTF-8 contract:

subprocess.run(
    ...,
    capture_output=True,
    text=True,
    encoding="utf-8",
    errors="replace",
    ...
)

The change is intentionally narrow:

  • it only affects shell script execution inside RunSkillScriptTool
  • it keeps the existing capture_output=True and text=True behavior
  • it keeps the existing timeout, working directory, JSON envelope, and status classification logic
  • it keeps non-zero shell return codes visible even when stderr already contains output
  • it does not change Python skill script execution, skill loading, resource materialization, shell argument construction, or JSON envelope parsing

errors="replace" prevents a single undecodable byte from making the wrapper fail before it can return stdout, stderr, and the command return code to the caller.

Evidence

The previous local Windows full unit-test run contained two focused failures in the shell skill path:

test_integration_shell_stdout_and_stderr
AssertionError: assert 'success' == 'warning'

Exception in thread Thread-2 (_readerthread):
  File "...\\Lib\\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 47: illegal multibyte sequence

and:

test_integration_shell_stderr_only
AssertionError: assert 'success' == 'error'

Exception in thread Thread-2 (_readerthread):
  File "...\\Lib\\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 47: illegal multibyte sequence

The PR also updates the generated-wrapper unit coverage to assert that shell wrappers include explicit UTF-8 decoding and replacement error handling:

assert "encoding='utf-8'" in code_input.code
assert "errors='replace'" in code_input.code

Focused local validation passed on Windows:

uv run pytest tests/unittests/tools/test_skill_toolset.py -q

Result:

113 passed, 4 warnings in 17.62s

Focused WSL/Linux validation also passed for the PR-relevant shell wrapper tests:

/home/zxy/.local/bin/uv run --extra test pytest tests/unittests/tools/test_skill_toolset.py::test_execute_script_shell_success tests/unittests/tools/test_skill_toolset.py::test_integration_shell_stdout_and_stderr tests/unittests/tools/test_skill_toolset.py::test_integration_shell_stderr_only tests/unittests/tools/test_skill_toolset.py::test_integration_shell_nonzero_exit -q

Result:

4 passed, 4 warnings in 47.03s

Manual RunSkillScriptTool shell execution validation also passed through UnsafeLocalCodeExecutor for:

both.sh -> warning, stdout preserved, stderr preserved
err.sh  -> error, stderr preserved
fail.sh -> error, Exit code 42 preserved

The full tests/unittests suite is not claimed as passing here. A full local Windows run still had unrelated failures outside this PR:

23 failed, 7923 passed, 21 skipped, 31 xfailed, 9 xpassed, 1803 warnings, 5 errors, 6 subtests passed

Those failures include existing Windows path/temporary-directory cleanup, load_web_page, artifact file:// URI, edit-file newline, telemetry, sample, and evaluation-manager failures unrelated to this shell wrapper change.

Possible call chain / impact

The affected path is shell skill execution through RunSkillScriptTool:

RunSkillScriptTool.run_async()
  -> _SkillScriptCodeExecutor.execute_script_async()
    -> generated Python wrapper
      -> shell script branch for .sh / .bash files
        -> subprocess.run(..., capture_output=True, text=True)
        -> JSON shell result envelope
      -> RunSkillScriptTool parses stdout/stderr/returncode
      -> result status becomes success / warning / error

When stderr and return codes are preserved correctly, the existing status mapping can work as intended:

stdout only        -> success
stdout + stderr   -> warning
stderr only       -> error
non-zero return   -> error

Without explicit decoding, a Windows locale mismatch can make the subprocess reader fail before the wrapper returns the captured stderr content. That can hide stderr from the ADK tool result and make shell script failures or warnings look like successful executions.

This PR does not attempt to preserve arbitrary legacy-encoded shell output byte-for-byte. If a shell script intentionally emits non-UTF-8 bytes, invalid byte sequences are represented with replacement characters so the wrapper can still return the shell result envelope instead of dropping the result.

Solution:

Decode generated shell wrapper output explicitly as UTF-8 and tolerate undecodable bytes with replacement characters:

  • add encoding="utf-8" to the generated subprocess.run(...)
  • add errors="replace" so malformed bytes do not prevent the JSON result envelope from being emitted
  • keep timeout, cwd, stdout/stderr capture, returncode handling, Python-script execution, resource materialization, and status classification behavior unchanged
  • keep non-zero shell exit codes visible even when stderr already contains output

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Focused unit tests passed locally on Windows:

uv run pytest tests/unittests/tools/test_skill_toolset.py -q

Result:

113 passed, 4 warnings in 17.62s

Focused WSL/Linux validation also passed for the PR-relevant shell wrapper tests:

/home/zxy/.local/bin/uv run --extra test pytest tests/unittests/tools/test_skill_toolset.py::test_execute_script_shell_success tests/unittests/tools/test_skill_toolset.py::test_integration_shell_stdout_and_stderr tests/unittests/tools/test_skill_toolset.py::test_integration_shell_stderr_only tests/unittests/tools/test_skill_toolset.py::test_integration_shell_nonzero_exit -q

Result:

4 passed, 4 warnings in 47.03s

Full unit suites were attempted in both Windows and WSL/Linux environments, but this PR does not claim the full suite as passing because both runs still have failures outside this change.

Windows full unit suite:

uv run pytest tests/unittests -q

Result:

23 failed, 7923 passed, 21 skipped, 31 xfailed, 9 xpassed, 1803 warnings, 5 errors, 6 subtests passed in 270.77s

WSL/Linux full unit suite:

/home/zxy/.local/bin/uv run --extra test pytest tests/unittests -q

Result:

1 failed, 7963 passed, 19 skipped, 31 xfailed, 9 xpassed, 1817 warnings, 6 subtests passed in 1062.94s

The remaining WSL/Linux failure is stable when rerun by itself and is outside this PR's touched area:

tests/unittests/workflow/test_workflow_parallel_worker.py::test_parallel_worker_failure_propagates_and_cancels_others
AssertionError: tracker included {'task-3': True} when the test expected only {'task-1': True}

Manual End-to-End (E2E) Tests:

Manually validated the real RunSkillScriptTool shell execution path with UnsafeLocalCodeExecutor using shell scripts that produce stdout, stderr, and non-zero exit status.

Result summary:

both.sh -> warning, stdout preserved, stderr preserved
err.sh  -> error, stderr preserved
fail.sh -> error, Exit code 42 preserved

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Python's subprocess.run() supports explicit encoding and errors parameters for text-mode captured streams. This PR uses those parameters so generated shell wrappers do not depend on the host locale when decoding captured stdout and stderr.

No dependent downstream changes are required for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RunSkillScriptTool shell output can fail to decode on Windows locale defaults

1 participant