Skip to content

Sync vector store dimensions with embedding output across upstream and dotnet parity#14

Merged
sharpninja merged 2 commits intomainfrom
copilot/merge-upstream-and-audit-codebase
Mar 20, 2026
Merged

Sync vector store dimensions with embedding output across upstream and dotnet parity#14
sharpninja merged 2 commits intomainfrom
copilot/merge-upstream-and-audit-codebase

Conversation

Copy link

Copilot AI commented Mar 20, 2026

Description

This PR pulls in the upstream 3502c22 behavior change and applies the matching functionality to the C# codebase. It also audits the dotnet implementation against the recent Python changes and closes the only remaining parity gap in this area: vector-store dimensions drifting from the actual embedding model output.

Related Issues

Addresses the upstream sync and C# parity audit work for recent GraphRAG changes.

Proposed Changes

  • Upstream merge

    • Apply the Python validate_config.py change from upstream so embedding validation now captures the embedding response and realigns vector_store.vector_size and index schema dimensions to the detected embedding width.
    • Add the corresponding semversioner patch metadata from upstream.
  • Dotnet parity

    • Add GraphRagConfig.SyncVectorStoreDimensions(...) to update vector-store dimensions when the configured embed-text model returns a different embedding width.
    • Add VectorStoreConfig.WithVectorSize(...) and IndexSchema.WithVectorSize(...) so dimension updates propagate consistently through the immutable dotnet config model.
  • Audit follow-up

    • Update the upstream sync note for 3502c222 to document the parity decision and resulting dotnet implementation.
    • Confirm there were no other missed Python changes after the last sync beyond this vector-size alignment behavior.

Example of the new dotnet parity surface:

var updatedConfig = config.SyncVectorStoreDimensions(
    embeddingModelId,
    new LlmEmbeddingResponse([[1.0f, 2.0f, 3.0f]])
);

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

The dotnet implementation does not have a direct equivalent of Python's validate_config.py, so the parity change is implemented at the configuration-model layer instead of in CLI validation flow. This keeps the change minimal while preserving the same functional outcome once embedding dimensions are known.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot AI changed the title [WIP] Merge upstream changes and audit C# codebase for Python updates Sync vector store dimensions with embedding output across upstream and dotnet parity Mar 20, 2026
Copilot AI requested a review from sharpninja March 20, 2026 13:29
@sharpninja sharpninja marked this pull request as ready for review March 20, 2026 15:26
Copilot AI review requested due to automatic review settings March 20, 2026 15:26
@sharpninja sharpninja merged commit 18c7624 into main Mar 20, 2026
16 of 25 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR syncs vector-store dimensions to the actual embedding model output, matching upstream behavior in Python and adding equivalent parity support in the dotnet immutable configuration layer.

Changes:

  • Python: capture the embedding probe response during config validation and realign vector_store.vector_size + index schema vector sizes to the detected embedding width.
  • Dotnet: add GraphRagConfig.SyncVectorStoreDimensions(...) plus WithVectorSize(...) helpers on VectorStoreConfig and IndexSchema, with unit tests covering the copy/sync behavior.
  • Docs/metadata: update the upstream sync note for 3502c222 and add semversioner patch metadata.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/graphrag/graphrag/index/validate_config.py Sync Python vector store and index schema dimensions to detected embedding width during validation.
dotnet/src/GraphRag/Config/Models/GraphRagConfig.cs Add SyncVectorStoreDimensions API to return an updated config when embedding width differs.
dotnet/src/GraphRag.Vectors/VectorStoreConfig.cs Add WithVectorSize helper to update vector size and propagate to schema.
dotnet/src/GraphRag.Vectors/IndexSchema.cs Add WithVectorSize helper to update schema vector dimension immutably.
dotnet/tests/GraphRag.Tests.Unit/Vectors/VectorStoreConfigTests.cs Unit test VectorStoreConfig.WithVectorSize updates copy and synchronizes schema.
dotnet/tests/GraphRag.Tests.Unit/Vectors/IndexSchemaTests.cs Unit test IndexSchema.WithVectorSize returns updated copy without mutating original.
dotnet/tests/GraphRag.Tests.Unit/Config/GraphRagConfigMethodTests.cs Unit tests for GraphRagConfig.SyncVectorStoreDimensions matching/mismatch/empty-response cases.
docs/upstream-sync/upstream-3502c222.md Document manual review results and dotnet parity approach for this upstream change.
.semversioner/next-release/patch-20260315024056229023.json Add patch note for the vector-size reconfiguration behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants