Enable manual benchmark runs and fix benchmark CI follow-ups by Copilot · Pull Request #17 · sharpninja/graphrag

Copilot · 2026-03-21T17:40:56Z

Description

This pull request makes the benchmark comparison workflow manually triggerable from GitHub Actions and addresses follow-up CI failures uncovered on the PR. In addition to preserving the existing draft-PR skip behavior, it fixes the benchmark smoke fixture loading logic and resolves the repository spellcheck failures.

Related Issues

Addresses the benchmark pipeline triggerability issue and the follow-up benchmark/spellcheck CI failures found during review.

Proposed Changes

Workflow trigger
- Add workflow_dispatch to .github/workflows/benchmark-comparison.yml so the benchmark comparison workflow can be launched on demand.
Job gating
- Update the benchmark job condition to allow non-PR events while still skipping draft pull requests.
Benchmark smoke fix
- Update scripts/benchmark_smoke.py so workflow_config entries that are not mappings, such as skip_assert: true, are ignored when collecting expected_artifacts.
Test coverage
- Add coverage in tests/unit/test_benchmark_smoke.py for the Azure fixture case with non-mapping workflow_config entries.
- Narrow the imported module spec in the test helper before use so the test file also satisfies type checking.
Spellcheck
- Update dictionary.txt with the repository-specific terms that were causing the existing cspell workflow to fail.

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

Validated locally with:

python3 -m pytest tests/unit/test_benchmark_smoke.py
./scripts/spellcheck.sh

This PR still includes the original workflow-only change, but now also contains the minimal follow-up fixes required to get the related benchmark and spellcheck CI paths passing.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/4253a90a-67dd-4ef4-8111-25ea94ea578c

chatgpt-codex-connector · 2026-03-21T17:46:55Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

Enables ad hoc benchmark runs by making the existing Benchmark Comparison GitHub Actions workflow manually triggerable, while preserving the behavior that skips draft pull request runs.

Changes:

Added workflow_dispatch trigger to allow manual workflow execution from the Actions UI.
Updated the benchmark job if: condition to skip only draft PRs, while allowing push and manual (workflow_dispatch) events to run.

sharpninja · 2026-03-21T17:49:48Z

@copilot discover the causes of the pipeline failures, but do not change any code.

Copilot · 2026-03-21T17:52:31Z

@copilot discover the causes of the pipeline failures, but do not change any code.

Investigated on 69ce79c with no code changes. The failures are from multiple existing issues:

Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip_assert": true, so the run crashes with AttributeError: 'bool' object has no attribute 'get' before benchmarking starts.
Python Build and Type Check: pyright fails in tests/unit/test_benchmark_smoke.py:14-18 because spec_from_file_location(...) can return None, but the test uses module_from_spec(spec) and spec.loader/spec.name without narrowing first.
Python Notebook Tests: notebooks fail during execution because ModelConfig for the Azure provider is created without GRAPHRAG_API_BASE, producing ValidationError: api_base must be specified with the 'azure' model provider.
Spellcheck: scripts/spellcheck.sh / cspell reports 149 existing unknown-word findings across dotnet/docs, dotnet source, README, seed data, and related files.

None of the failures point to the workflow_dispatch change itself in .github/workflows/benchmark-comparison.yml.

sharpninja · 2026-03-21T17:56:37Z

@copilot fix these:

Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip_assert": true, so the run crashes with AttributeError: 'bool' object has no attribute 'get' before benchmarking starts.
Spellcheck: scripts/spellcheck.sh / cspell reports 149 existing unknown-word findings across dotnet/docs, dotnet source, README, seed data, and related files.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/ed8e64ed-c4d1-4644-8660-94891a6c9f68

Copilot · 2026-03-21T18:02:57Z

@copilot fix these:

Benchmark Comparison: scripts/benchmark_smoke.py:97-100 assumes every workflow_config value is a mapping and calls .get(...), but tests/fixtures/azure/config.json includes "skip...

Addressed in b278d2f.

scripts/benchmark_smoke.py now ignores non-mapping workflow_config entries when collecting expected_artifacts, so the Azure fixture no longer crashes on skip_assert: true.
tests/unit/test_benchmark_smoke.py now covers the Azure fixture case and narrows the import spec before use.
dictionary.txt now includes the repository-specific words that were causing the spellcheck workflow to fail.

Validated with python3 -m pytest tests/unit/test_benchmark_smoke.py and ./scripts/spellcheck.sh.

Initial plan

691f93a

Copilot AI assigned Copilot and sharpninja Mar 21, 2026

Copilot started work on behalf of sharpninja March 21, 2026 17:41 View session

ci: allow manual benchmark workflow runs

69ce79c

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/4253a90a-67dd-4ef4-8111-25ea94ea578c

Copilot AI changed the title ~~[WIP] Make benchmark pipeline manually triggerable~~ Enable manual runs for the benchmark comparison workflow Mar 21, 2026

Copilot AI requested a review from sharpninja March 21, 2026 17:46

Copilot finished work on behalf of sharpninja March 21, 2026 17:46

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 21, 2026 17:46

Copilot AI review requested due to automatic review settings March 21, 2026 17:46

Copilot started reviewing on behalf of sharpninja March 21, 2026 17:47 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Copilot started work on behalf of sharpninja March 21, 2026 17:49 View session

Copilot AI requested a review from sharpninja March 21, 2026 17:52

Copilot finished work on behalf of sharpninja March 21, 2026 17:52

Copilot started work on behalf of sharpninja March 21, 2026 17:56 View session

fix benchmark smoke and spellcheck CI issues

b278d2f

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/ed8e64ed-c4d1-4644-8660-94891a6c9f68

Copilot AI changed the title ~~Enable manual runs for the benchmark comparison workflow~~ Enable manual benchmark runs and fix benchmark CI follow-ups Mar 21, 2026

Copilot finished work on behalf of sharpninja March 21, 2026 18:03

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja merged commit 6277c98 into main Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable manual benchmark runs and fix benchmark CI follow-ups#17

Enable manual benchmark runs and fix benchmark CI follow-ups#17
sharpninja merged 3 commits intomainfrom
copilot/make-benchmark-pipeline-triggerable

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

sharpninja commented Mar 21, 2026

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading