Sync vector store dimensions with embedding output across upstream and dotnet parity by Copilot · Pull Request #14 · sharpninja/graphrag

Copilot · 2026-03-20T13:12:57Z

Description

This PR pulls in the upstream 3502c22 behavior change and applies the matching functionality to the C# codebase. It also audits the dotnet implementation against the recent Python changes and closes the only remaining parity gap in this area: vector-store dimensions drifting from the actual embedding model output.

Related Issues

Addresses the upstream sync and C# parity audit work for recent GraphRAG changes.

Proposed Changes

Upstream merge
- Apply the Python validate_config.py change from upstream so embedding validation now captures the embedding response and realigns vector_store.vector_size and index schema dimensions to the detected embedding width.
- Add the corresponding semversioner patch metadata from upstream.
Dotnet parity
- Add GraphRagConfig.SyncVectorStoreDimensions(...) to update vector-store dimensions when the configured embed-text model returns a different embedding width.
- Add VectorStoreConfig.WithVectorSize(...) and IndexSchema.WithVectorSize(...) so dimension updates propagate consistently through the immutable dotnet config model.
Audit follow-up
- Update the upstream sync note for 3502c222 to document the parity decision and resulting dotnet implementation.
- Confirm there were no other missed Python changes after the last sync beyond this vector-size alignment behavior.

Example of the new dotnet parity surface:

var updatedConfig = config.SyncVectorStoreDimensions(
    embeddingModelId,
    new LlmEmbeddingResponse([[1.0f, 2.0f, 3.0f]])
);

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

The dotnet implementation does not have a direct equivalent of Python's validate_config.py, so the parity change is implemented at the configuration-model layer instead of in CLI validation flow. This keeps the change minimal while preserving the same functional outcome once embedding dimensions are known.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot

Pull request overview

This PR syncs vector-store dimensions to the actual embedding model output, matching upstream behavior in Python and adding equivalent parity support in the dotnet immutable configuration layer.

Changes:

Python: capture the embedding probe response during config validation and realign vector_store.vector_size + index schema vector sizes to the detected embedding width.
Dotnet: add GraphRagConfig.SyncVectorStoreDimensions(...) plus WithVectorSize(...) helpers on VectorStoreConfig and IndexSchema, with unit tests covering the copy/sync behavior.
Docs/metadata: update the upstream sync note for 3502c222 and add semversioner patch metadata.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/graphrag/graphrag/index/validate_config.py	Sync Python vector store and index schema dimensions to detected embedding width during validation.
dotnet/src/GraphRag/Config/Models/GraphRagConfig.cs	Add `SyncVectorStoreDimensions` API to return an updated config when embedding width differs.
dotnet/src/GraphRag.Vectors/VectorStoreConfig.cs	Add `WithVectorSize` helper to update vector size and propagate to schema.
dotnet/src/GraphRag.Vectors/IndexSchema.cs	Add `WithVectorSize` helper to update schema vector dimension immutably.
dotnet/tests/GraphRag.Tests.Unit/Vectors/VectorStoreConfigTests.cs	Unit test `VectorStoreConfig.WithVectorSize` updates copy and synchronizes schema.
dotnet/tests/GraphRag.Tests.Unit/Vectors/IndexSchemaTests.cs	Unit test `IndexSchema.WithVectorSize` returns updated copy without mutating original.
dotnet/tests/GraphRag.Tests.Unit/Config/GraphRagConfigMethodTests.cs	Unit tests for `GraphRagConfig.SyncVectorStoreDimensions` matching/mismatch/empty-response cases.
docs/upstream-sync/upstream-3502c222.md	Document manual review results and dotnet parity approach for this upstream change.
.semversioner/next-release/patch-20260315024056229023.json	Add patch note for the vector-size reconfiguration behavior.

Initial plan

dfd04b3

Copilot AI assigned Copilot and sharpninja Mar 20, 2026

Copilot started work on behalf of sharpninja March 20, 2026 13:13 View session

feat: sync vector store dimensions with embedding output

88024f8

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Merge upstream changes and audit C# codebase for Python updates~~ Sync vector store dimensions with embedding output across upstream and dotnet parity Mar 20, 2026

Copilot AI requested a review from sharpninja March 20, 2026 13:29

Copilot finished work on behalf of sharpninja March 20, 2026 13:29

sharpninja approved these changes Mar 20, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 20, 2026 15:26

Copilot AI review requested due to automatic review settings March 20, 2026 15:26

Copilot started reviewing on behalf of sharpninja March 20, 2026 15:26 View session

sharpninja merged commit 18c7624 into main Mar 20, 2026
16 of 25 checks passed

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync vector store dimensions with embedding output across upstream and dotnet parity#14

Sync vector store dimensions with embedding output across upstream and dotnet parity#14
sharpninja merged 2 commits intomainfrom
copilot/merge-upstream-and-audit-codebase

Copilot AI commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 20, 2026 •

edited

Loading