[bot] Merge master/2a72b14c into rel/dev#1681
Merged
Merged
Conversation
_args_match checked emit_widget (renamed to user_requested_search in the tool schema) and limit (optional, server-side default). Both mismatched on every real model call, so tool_correctness was always False regardless of whether the model used the right keywords and object types. Fix: evaluate only keywords (case-insensitive) and object_types — the two fields that actually determine whether the search was semantically correct.
parsed_arguments() returns raw model-emitted JSON, so a bad tool call
like {"keywords":[1]} or mixed-type object_types would raise on
.lower()/sorted() and abort the whole eval run instead of scoring
tool_correctness=False. Add _normalize_str_list to drop non-string
entries defensively; valid comparisons (incl. case-insensitive keyword
match) are unchanged.
Addresses CodeRabbit review on PR #1675.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review nits on search_tool _args_match: - reword _normalize_str_list comment: dropping non-string entries prevents a crash, it does not force a mismatch (surviving strings still compare) - note that object_types is compared case-sensitively on purpose (controlled ObjectType StrEnum values emitted verbatim) No behavior change. keywords/object_types are declared list[str] in the search_objects schema, so string-collapse false-negatives cannot occur. JIRA: TRIVIAL risk: nonprod
fix(eval): fix search_tool correctness always scoring 0%
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## rel/dev #1681 +/- ##
========================================
Coverage 77.79% 77.80%
========================================
Files 271 271
Lines 18599 18602 +3
========================================
+ Hits 14470 14474 +4
+ Misses 4129 4128 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Automated PR to perform merge from master into rel/dev with changes up to 2a72b14 (created by https://github.com/gooddata/gooddata-python-sdk/actions/runs/28648738929).