Clarify how to get a real benchmark run from Benchmark Comparison#19
Clarify how to get a real benchmark run from Benchmark Comparison#19sharpninja merged 2 commits intomainfrom
Conversation
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/41510239-d5d7-4e97-8ce6-2dcdf2b97a3c
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
This PR improves the UX around Benchmark Comparison runs by making --dry-run output explicitly explain how to trigger a full benchmark run (i.e., which secrets are required and where to rerun the workflow).
Changes:
- Updates the Benchmark Comparison workflow step summary to list required secrets and the manual rerun path when falling back to
--dry-run. - Enhances
scripts/benchmark_smoke.pyto emit a prominent dry-run notice in the generated markdown report whenever any result isstatus == "dry_run". - Adds a unit test ensuring the report includes the new “how to get a real benchmark run” guidance.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tests/unit/test_benchmark_smoke.py |
Adds unit coverage asserting dry-run reports include the new guidance text. |
scripts/benchmark_smoke.py |
Adds dry-run detection and injects a prominent guidance block into the markdown report. |
.github/workflows/benchmark-comparison.yml |
Improves workflow summary messaging when secrets are missing and the run is forced into --dry-run. |
|
|
||
| def has_dry_run_results(*result_groups: list[OperationResult]) -> bool: | ||
| """Return whether any result group contains dry-run benchmark results.""" | ||
| return any(result.status == "dry_run" for results in result_groups for result in results) |
There was a problem hiding this comment.
has_dry_run_results return statement is likely to fail ruff format --check due to line wrapping (the generator expression exceeds the formatter's line length). Reformat this line (e.g., split the any(...) call across multiple lines) or run ruff format so CI formatting checks pass.
| return any(result.status == "dry_run" for results in result_groups for result in results) | |
| return any( | |
| result.status == "dry_run" | |
| for results in result_groups | |
| for result in results | |
| ) |
Description
Benchmark Comparison could complete in
--dry-runmode without telling users how to trigger a real benchmark. This change adds explicit guidance in both the workflow summary and generated comparison artifact so dry-run output points to the required secrets and rerun path.Related Issues
Addresses confusion around benchmark runs reporting
dry_runwithout actionable next steps for running the full benchmark workload.Proposed Changes
Workflow summary
benchmark-comparison.ymlto state which secrets gate a full benchmark run.Actions → Benchmark Comparison → Run workflowonce those secrets are configured.Generated benchmark report
scripts/benchmark_smoke.pywhen any result hasstatus == "dry_run".OPENAI_API_KEYGRAPHRAG_API_BASEAZURE_AI_SEARCH_URL_ENDPOINTAZURE_AI_SEARCH_API_KEYFocused coverage
Checklist
Additional Notes
This is intentionally a minimal UX/documentation change: benchmark execution behavior is unchanged. The workflow still falls back to
--dry-runwhen secrets are unavailable; it now explains how to get out of that mode.⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.