Skip to content

Clarify how to get a real benchmark run from Benchmark Comparison#19

Merged
sharpninja merged 2 commits intomainfrom
copilot/get-real-benchmark-run
Mar 21, 2026
Merged

Clarify how to get a real benchmark run from Benchmark Comparison#19
sharpninja merged 2 commits intomainfrom
copilot/get-real-benchmark-run

Conversation

Copy link

Copilot AI commented Mar 21, 2026

Description

Benchmark Comparison could complete in --dry-run mode without telling users how to trigger a real benchmark. This change adds explicit guidance in both the workflow summary and generated comparison artifact so dry-run output points to the required secrets and rerun path.

Related Issues

Addresses confusion around benchmark runs reporting dry_run without actionable next steps for running the full benchmark workload.

Proposed Changes

  • Workflow summary

    • Extend the dry-run branch in benchmark-comparison.yml to state which secrets gate a full benchmark run.
    • Point users to Actions → Benchmark Comparison → Run workflow once those secrets are configured.
  • Generated benchmark report

    • Add a dry-run notice to scripts/benchmark_smoke.py when any result has status == "dry_run".
    • Include the exact secrets required for a full run:
      • OPENAI_API_KEY
      • GRAPHRAG_API_BASE
      • AZURE_AI_SEARCH_URL_ENDPOINT
      • AZURE_AI_SEARCH_API_KEY
  • Focused coverage

    • Add a unit test that verifies dry-run reports include the new “how to get a real benchmark run” guidance.
> This comparison used `--dry-run`, so it validated commands without executing the real benchmark workload.
> To get a real benchmark run in GitHub Actions, configure these secrets and rerun the `Benchmark Comparison` workflow:
> `OPENAI_API_KEY`, `GRAPHRAG_API_BASE`, `AZURE_AI_SEARCH_URL_ENDPOINT`, and `AZURE_AI_SEARCH_API_KEY`.

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

This is intentionally a minimal UX/documentation change: benchmark execution behavior is unchanged. The workflow still falls back to --dry-run when secrets are unavailable; it now explains how to get out of that mode.


⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/41510239-d5d7-4e97-8ce6-2dcdf2b97a3c
Copilot AI changed the title [WIP] Update benchmarking process to include real run Clarify how to get a real benchmark run from Benchmark Comparison Mar 21, 2026
Copilot AI requested a review from sharpninja March 21, 2026 19:37
@sharpninja sharpninja marked this pull request as ready for review March 21, 2026 19:49
Copilot AI review requested due to automatic review settings March 21, 2026 19:49
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@sharpninja sharpninja merged commit 0c9fd3f into main Mar 21, 2026
1 check passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the UX around Benchmark Comparison runs by making --dry-run output explicitly explain how to trigger a full benchmark run (i.e., which secrets are required and where to rerun the workflow).

Changes:

  • Updates the Benchmark Comparison workflow step summary to list required secrets and the manual rerun path when falling back to --dry-run.
  • Enhances scripts/benchmark_smoke.py to emit a prominent dry-run notice in the generated markdown report whenever any result is status == "dry_run".
  • Adds a unit test ensuring the report includes the new “how to get a real benchmark run” guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/unit/test_benchmark_smoke.py Adds unit coverage asserting dry-run reports include the new guidance text.
scripts/benchmark_smoke.py Adds dry-run detection and injects a prominent guidance block into the markdown report.
.github/workflows/benchmark-comparison.yml Improves workflow summary messaging when secrets are missing and the run is forced into --dry-run.


def has_dry_run_results(*result_groups: list[OperationResult]) -> bool:
"""Return whether any result group contains dry-run benchmark results."""
return any(result.status == "dry_run" for results in result_groups for result in results)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_dry_run_results return statement is likely to fail ruff format --check due to line wrapping (the generator expression exceeds the formatter's line length). Reformat this line (e.g., split the any(...) call across multiple lines) or run ruff format so CI formatting checks pass.

Suggested change
return any(result.status == "dry_run" for results in result_groups for result in results)
return any(
result.status == "dry_run"
for results in result_groups
for result in results
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants