Clarify how to get a real benchmark run from Benchmark Comparison by Copilot · Pull Request #19 · sharpninja/graphrag

Copilot · 2026-03-21T19:31:37Z

Description

Benchmark Comparison could complete in --dry-run mode without telling users how to trigger a real benchmark. This change adds explicit guidance in both the workflow summary and generated comparison artifact so dry-run output points to the required secrets and rerun path.

Related Issues

Addresses confusion around benchmark runs reporting dry_run without actionable next steps for running the full benchmark workload.

Proposed Changes

Workflow summary
- Extend the dry-run branch in benchmark-comparison.yml to state which secrets gate a full benchmark run.
- Point users to Actions → Benchmark Comparison → Run workflow once those secrets are configured.
Generated benchmark report
- Add a dry-run notice to scripts/benchmark_smoke.py when any result has status == "dry_run".
- Include the exact secrets required for a full run:
  - OPENAI_API_KEY
  - GRAPHRAG_API_BASE
  - AZURE_AI_SEARCH_URL_ENDPOINT
  - AZURE_AI_SEARCH_API_KEY
Focused coverage
- Add a unit test that verifies dry-run reports include the new “how to get a real benchmark run” guidance.

> This comparison used `--dry-run`, so it validated commands without executing the real benchmark workload.
> To get a real benchmark run in GitHub Actions, configure these secrets and rerun the `Benchmark Comparison` workflow:
> `OPENAI_API_KEY`, `GRAPHRAG_API_BASE`, `AZURE_AI_SEARCH_URL_ENDPOINT`, and `AZURE_AI_SEARCH_API_KEY`.

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

This is intentionally a minimal UX/documentation change: benchmark execution behavior is unchanged. The workflow still falls back to --dry-run when secrets are unavailable; it now explains how to get out of that mode.

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/41510239-d5d7-4e97-8ce6-2dcdf2b97a3c

chatgpt-codex-connector · 2026-03-21T19:49:23Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

This PR improves the UX around Benchmark Comparison runs by making --dry-run output explicitly explain how to trigger a full benchmark run (i.e., which secrets are required and where to rerun the workflow).

Changes:

Updates the Benchmark Comparison workflow step summary to list required secrets and the manual rerun path when falling back to --dry-run.
Enhances scripts/benchmark_smoke.py to emit a prominent dry-run notice in the generated markdown report whenever any result is status == "dry_run".
Adds a unit test ensuring the report includes the new “how to get a real benchmark run” guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`tests/unit/test_benchmark_smoke.py`	Adds unit coverage asserting dry-run reports include the new guidance text.
`scripts/benchmark_smoke.py`	Adds dry-run detection and injects a prominent guidance block into the markdown report.
`.github/workflows/benchmark-comparison.yml`	Improves workflow summary messaging when secrets are missing and the run is forced into `--dry-run`.

Copilot · 2026-03-21T19:51:14Z

scripts/benchmark_smoke.py


+def has_dry_run_results(*result_groups: list[OperationResult]) -> bool:
+    """Return whether any result group contains dry-run benchmark results."""
+    return any(result.status == "dry_run" for results in result_groups for result in results)


has_dry_run_results return statement is likely to fail ruff format --check due to line wrapping (the generator expression exceeds the formatter's line length). Reformat this line (e.g., split the any(...) call across multiple lines) or run ruff format so CI formatting checks pass.

Suggested change

return any(result.status == "dry_run" for results in result_groups for result in results)

return any(

result.status == "dry_run"

for results in result_groups

for result in results

)

Initial plan

ffd3cdc

Copilot AI assigned Copilot and sharpninja Mar 21, 2026

Copilot started work on behalf of sharpninja March 21, 2026 19:31 View session

docs: explain how to run full benchmark comparison

b11e45b

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/41510239-d5d7-4e97-8ce6-2dcdf2b97a3c

Copilot AI changed the title ~~[WIP] Update benchmarking process to include real run~~ Clarify how to get a real benchmark run from Benchmark Comparison Mar 21, 2026

Copilot finished work on behalf of sharpninja March 21, 2026 19:37

Copilot AI requested a review from sharpninja March 21, 2026 19:37

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 21, 2026 19:49

Copilot AI review requested due to automatic review settings March 21, 2026 19:49

sharpninja merged commit 0c9fd3f into main Mar 21, 2026
1 check passed

Copilot started reviewing on behalf of sharpninja March 21, 2026 19:49 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify how to get a real benchmark run from Benchmark Comparison#19

Clarify how to get a real benchmark run from Benchmark Comparison#19
sharpninja merged 2 commits intomainfrom
copilot/get-real-benchmark-run

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading