Add a separate benchmark comparison pipeline for Python and .NET by Copilot · Pull Request #15 · sharpninja/graphrag

Copilot · 2026-03-20T15:26:04Z

Description

Adds a dedicated benchmark workflow that runs both implementations against the shared smoke-test fixtures and publishes a single comparison report. The pipeline measures indexing plus all configured query paths so reviewers can see a like-for-like Python vs .NET result set in one place.

Related Issues

Addresses the benchmarking request for:

a separate pipeline for Python and .NET benchmarks
an apple-to-apples comparison report
coverage of indexing and querying flows from the existing test data

Proposed Changes

Benchmark workflow
- Added .github/workflows/benchmark-comparison.yml
- Runs Python and .NET benchmarks as a standalone pipeline
- Publishes JSON outputs plus a markdown comparison report artifact
- Uses the existing smoke fixture configs as the shared benchmark contract
Shared benchmark harness
- Added scripts/benchmark_smoke.py
- Discovers benchmark cases from tests/fixtures/*/config.json
- Executes:
  - indexing for each fixture
  - every configured query for each fixture
- Emits per-implementation JSON results and a consolidated markdown report
Comparison report
- Produces a single benchmark-comparison.md with:
  - per-operation timing
  - per-implementation status
  - side-by-side Python/.NET comparison
  - Python workflow metrics from stats.json when available
  - explicit missing-output reporting, which currently surfaces .NET indexing gaps instead of masking them
Fixture-driven scope
- Reuses the existing smoke fixtures rather than introducing benchmark-only datasets
- Keeps the benchmark aligned with the same indexing/query coverage already encoded in fixture configs
Focused test coverage
- Added tests/unit/test_benchmark_smoke.py
- Covers fixture loading, command generation, and report rendering

Example of the generated benchmark flow:

uv run python scripts/benchmark_smoke.py run \
  --implementation python \
  --repo-root "$PWD" \
  --output benchmark-python.json

uv run python scripts/benchmark_smoke.py run \
  --implementation dotnet \
  --repo-root "$PWD" \
  --output benchmark-dotnet.json

uv run python scripts/benchmark_smoke.py compare \
  --python-results benchmark-python.json \
  --dotnet-results benchmark-dotnet.json \
  --output benchmark-comparison.md

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

The report is intentionally status-aware, not timing-only. For the current .NET CLI, query timings are recorded, while indexing is reported with missing expected outputs so the comparison remains honest and actionable instead of implying feature parity where it does not yet exist.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/041ee5aa-dc22-44d5-b2e8-5608209d11ac

chatgpt-codex-connector · 2026-03-21T14:36:09Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

Adds a standalone CI workflow and shared Python harness to benchmark Python vs .NET implementations using the existing smoke-test fixtures, producing a single comparison report artifact.

Changes:

Added Benchmark Comparison GitHub Actions workflow to run Python + .NET benchmarks and publish JSON + markdown report artifacts.
Added scripts/benchmark_smoke.py to discover fixtures, run indexing + queries, and render a consolidated comparison report.
Added unit tests covering fixture loading, command generation, and report rendering.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`.github/workflows/benchmark-comparison.yml`	New workflow that builds/runs both implementations and uploads benchmark artifacts.
`scripts/benchmark_smoke.py`	New benchmark harness for fixture discovery, execution, JSON output, and markdown comparison rendering.
`tests/unit/test_benchmark_smoke.py`	Unit tests for key benchmark harness behaviors.
`.semversioner/next-release/patch-20260320154154809493.json`	Semversioner entry documenting the patch-level change.

Copilot · 2026-03-21T14:40:41Z

scripts/benchmark_smoke.py

+        )
+        expected_artifacts = tuple(
+            artifact
+            for workflow in workflow_config.values()


load_fixture_cases() assumes every workflow_config value is a dict and calls workflow.get(...), but tests/fixtures/azure/config.json contains non-dict entries (e.g., "skip_assert": true). This will raise AttributeError when the benchmark workflow loads all fixtures. Filter to dict values (or explicitly skip known non-workflow keys like azure/skip_assert) before reading expected_artifacts.

Suggested change

for workflow in workflow_config.values()

for workflow in workflow_config.values()

if isinstance(workflow, dict)

Copilot · 2026-03-21T14:40:42Z

tests/unit/test_benchmark_smoke.py

+def test_load_fixture_cases_uses_smoke_configs():
+    module = load_module()
+    repo_root = Path(__file__).resolve().parents[2]
+
+    fixtures = module.load_fixture_cases(repo_root, ["text", "min-csv"])
+
+    assert [fixture.name for fixture in fixtures] == ["min-csv", "text"]
+    assert fixtures[0].queries[0].method == "local"
+    assert fixtures[1].queries[-1].method == "basic"
+    assert "community_reports.csv" in fixtures[1].expected_artifacts


Unit tests only exercise text/min-csv fixtures, so the non-dict workflow_config values in the azure fixture (e.g., skip_assert) aren’t covered and the current loader bug would slip through. Add a test case that loads the azure fixture (or a minimal config with skip_assert) to ensure fixture parsing works for all fixture shapes.

Initial plan

4aa846d

Copilot AI assigned Copilot and sharpninja Mar 20, 2026

Copilot started work on behalf of sharpninja March 20, 2026 15:26 View session

feat: add benchmark comparison workflow

671f557

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/graphrag/sessions/041ee5aa-dc22-44d5-b2e8-5608209d11ac

Copilot AI changed the title ~~[WIP] Add benchmarking for python and dotnet implementations~~ Add a separate benchmark comparison pipeline for Python and .NET Mar 20, 2026

Copilot AI requested a review from sharpninja March 20, 2026 15:46

Copilot finished work on behalf of sharpninja March 20, 2026 15:46

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 21, 2026 14:36

Copilot AI review requested due to automatic review settings March 21, 2026 14:36

sharpninja merged commit 47366ce into main Mar 21, 2026
14 of 25 checks passed

Copilot started reviewing on behalf of sharpninja March 21, 2026 14:36 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a separate benchmark comparison pipeline for Python and .NET#15

Add a separate benchmark comparison pipeline for Python and .NET#15
sharpninja merged 2 commits intomainfrom
copilot/add-benchmarking-separate-pipeline

Copilot AI commented Mar 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	for workflow in workflow_config.values()
	for workflow in workflow_config.values()
	if isinstance(workflow, dict)

Conversation

Copilot AI commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 20, 2026 •

edited

Loading