Sync vector store dimensions with embedding output across upstream and dotnet parity#14
Merged
sharpninja merged 2 commits intomainfrom Mar 20, 2026
Merged
Conversation
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Merge upstream changes and audit C# codebase for Python updates
Sync vector store dimensions with embedding output across upstream and dotnet parity
Mar 20, 2026
sharpninja
approved these changes
Mar 20, 2026
There was a problem hiding this comment.
Pull request overview
This PR syncs vector-store dimensions to the actual embedding model output, matching upstream behavior in Python and adding equivalent parity support in the dotnet immutable configuration layer.
Changes:
- Python: capture the embedding probe response during config validation and realign
vector_store.vector_size+ index schema vector sizes to the detected embedding width. - Dotnet: add
GraphRagConfig.SyncVectorStoreDimensions(...)plusWithVectorSize(...)helpers onVectorStoreConfigandIndexSchema, with unit tests covering the copy/sync behavior. - Docs/metadata: update the upstream sync note for
3502c222and add semversioner patch metadata.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| packages/graphrag/graphrag/index/validate_config.py | Sync Python vector store and index schema dimensions to detected embedding width during validation. |
| dotnet/src/GraphRag/Config/Models/GraphRagConfig.cs | Add SyncVectorStoreDimensions API to return an updated config when embedding width differs. |
| dotnet/src/GraphRag.Vectors/VectorStoreConfig.cs | Add WithVectorSize helper to update vector size and propagate to schema. |
| dotnet/src/GraphRag.Vectors/IndexSchema.cs | Add WithVectorSize helper to update schema vector dimension immutably. |
| dotnet/tests/GraphRag.Tests.Unit/Vectors/VectorStoreConfigTests.cs | Unit test VectorStoreConfig.WithVectorSize updates copy and synchronizes schema. |
| dotnet/tests/GraphRag.Tests.Unit/Vectors/IndexSchemaTests.cs | Unit test IndexSchema.WithVectorSize returns updated copy without mutating original. |
| dotnet/tests/GraphRag.Tests.Unit/Config/GraphRagConfigMethodTests.cs | Unit tests for GraphRagConfig.SyncVectorStoreDimensions matching/mismatch/empty-response cases. |
| docs/upstream-sync/upstream-3502c222.md | Document manual review results and dotnet parity approach for this upstream change. |
| .semversioner/next-release/patch-20260315024056229023.json | Add patch note for the vector-size reconfiguration behavior. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR pulls in the upstream
3502c22behavior change and applies the matching functionality to the C# codebase. It also audits the dotnet implementation against the recent Python changes and closes the only remaining parity gap in this area: vector-store dimensions drifting from the actual embedding model output.Related Issues
Addresses the upstream sync and C# parity audit work for recent GraphRAG changes.
Proposed Changes
Upstream merge
validate_config.pychange from upstream so embedding validation now captures the embedding response and realignsvector_store.vector_sizeand index schema dimensions to the detected embedding width.Dotnet parity
GraphRagConfig.SyncVectorStoreDimensions(...)to update vector-store dimensions when the configured embed-text model returns a different embedding width.VectorStoreConfig.WithVectorSize(...)andIndexSchema.WithVectorSize(...)so dimension updates propagate consistently through the immutable dotnet config model.Audit follow-up
3502c222to document the parity decision and resulting dotnet implementation.Example of the new dotnet parity surface:
Checklist
Additional Notes
The dotnet implementation does not have a direct equivalent of Python's
validate_config.py, so the parity change is implemented at the configuration-model layer instead of in CLI validation flow. This keeps the change minimal while preserving the same functional outcome once embedding dimensions are known.💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.