Skip to content

Enable manual benchmark runs and fix benchmark CI follow-ups#17

Merged
sharpninja merged 3 commits intomainfrom
copilot/make-benchmark-pipeline-triggerable
Mar 21, 2026
Merged

Enable manual benchmark runs and fix benchmark CI follow-ups#17
sharpninja merged 3 commits intomainfrom
copilot/make-benchmark-pipeline-triggerable

Conversation

Copy link

Copilot AI commented Mar 21, 2026

Description

This pull request makes the benchmark comparison workflow manually triggerable from GitHub Actions and addresses follow-up CI failures uncovered on the PR. In addition to preserving the existing draft-PR skip behavior, it fixes the benchmark smoke fixture loading logic and resolves the repository spellcheck failures.

Related Issues

Addresses the benchmark pipeline triggerability issue and the follow-up benchmark/spellcheck CI failures found during review.

Proposed Changes

  • Workflow trigger

    • Add workflow_dispatch to .github/workflows/benchmark-comparison.yml so the benchmark comparison workflow can be launched on demand.
  • Job gating

    • Update the benchmark job condition to allow non-PR events while still skipping draft pull requests.
  • Benchmark smoke fix

    • Update scripts/benchmark_smoke.py so workflow_config entries that are not mappings, such as skip_assert: true, are ignored when collecting expected_artifacts.
  • Test coverage

    • Add coverage in tests/unit/test_benchmark_smoke.py for the Azure fixture case with non-mapping workflow_config entries.
    • Narrow the imported module spec in the test helper before use so the test file also satisfies type checking.
  • Spellcheck

    • Update dictionary.txt with the repository-specific terms that were causing the existing cspell workflow to fail.

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

Validated locally with:

  • python3 -m pytest tests/unit/test_benchmark_smoke.py
  • ./scripts/spellcheck.sh

This PR still includes the original workflow-only change, but now also contains the minimal follow-up fixes required to get the related benchmark and spellcheck CI paths passing.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/4253a90a-67dd-4ef4-8111-25ea94ea578c
Copilot AI changed the title [WIP] Make benchmark pipeline manually triggerable Enable manual runs for the benchmark comparison workflow Mar 21, 2026
Copilot AI requested a review from sharpninja March 21, 2026 17:46
@sharpninja sharpninja marked this pull request as ready for review March 21, 2026 17:46
Copilot AI review requested due to automatic review settings March 21, 2026 17:46
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables ad hoc benchmark runs by making the existing Benchmark Comparison GitHub Actions workflow manually triggerable, while preserving the behavior that skips draft pull request runs.

Changes:

  • Added workflow_dispatch trigger to allow manual workflow execution from the Actions UI.
  • Updated the benchmark job if: condition to skip only draft PRs, while allowing push and manual (workflow_dispatch) events to run.

@sharpninja
Copy link
Owner

@copilot discover the causes of the pipeline failures, but do not change any code.

Copy link
Author

Copilot AI commented Mar 21, 2026

@copilot discover the causes of the pipeline failures, but do not change any code.

Investigated on 69ce79c with no code changes. The failures are from multiple existing issues:

  • Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip_assert": true, so the run crashes with AttributeError: 'bool' object has no attribute 'get' before benchmarking starts.
  • Python Build and Type Check: pyright fails in tests/unit/test_benchmark_smoke.py:14-18 because spec_from_file_location(...) can return None, but the test uses module_from_spec(spec) and spec.loader/spec.name without narrowing first.
  • Python Notebook Tests: notebooks fail during execution because ModelConfig for the Azure provider is created without GRAPHRAG_API_BASE, producing ValidationError: api_base must be specified with the 'azure' model provider.
  • Spellcheck: scripts/spellcheck.sh / cspell reports 149 existing unknown-word findings across dotnet/docs, dotnet source, README, seed data, and related files.

None of the failures point to the workflow_dispatch change itself in .github/workflows/benchmark-comparison.yml.

Copilot AI requested a review from sharpninja March 21, 2026 17:52
@sharpninja
Copy link
Owner

@copilot fix these:

Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip_assert": true, so the run crashes with AttributeError: 'bool' object has no attribute 'get' before benchmarking starts.
Spellcheck: scripts/spellcheck.sh / cspell reports 149 existing unknown-word findings across dotnet/docs, dotnet source, README, seed data, and related files.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/ed8e64ed-c4d1-4644-8660-94891a6c9f68
Copy link
Author

Copilot AI commented Mar 21, 2026

@copilot fix these:

Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip...

Addressed in b278d2f.

  • scripts/benchmark_smoke.py now ignores non-mapping workflow_config entries when collecting expected_artifacts, so the Azure fixture no longer crashes on skip_assert: true.
  • tests/unit/test_benchmark_smoke.py now covers the Azure fixture case and narrows the import spec before use.
  • dictionary.txt now includes the repository-specific words that were causing the spellcheck workflow to fail.

Validated with python3 -m pytest tests/unit/test_benchmark_smoke.py and ./scripts/spellcheck.sh.

Copilot AI changed the title Enable manual runs for the benchmark comparison workflow Enable manual benchmark runs and fix benchmark CI follow-ups Mar 21, 2026
@sharpninja sharpninja merged commit 6277c98 into main Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants