feat(dbapi): use inline begin to eliminate BeginTransaction RPC#1502
feat(dbapi): use inline begin to eliminate BeginTransaction RPC#1502waiho-gumloop wants to merge 7 commits intogoogleapis:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @waiho-gumloop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request optimizes transaction handling in the Spanner DBAPI by leveraging the existing inline begin mechanism. It removes an unnecessary explicit Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable optimization by removing the explicit BeginTransaction RPC call in transaction_checkout. Instead, it leverages the existing inline begin functionality, where the transaction is initiated with the first SQL execution. This change effectively reduces latency by saving one network round-trip per transaction. The modification is well-supported by a new unit test ensuring begin() is not called, and the detailed description provides a thorough safety analysis, confirming the change is sound.
f176484 to
880081e
Compare
…ests Add test_dbapi_inline_begin.py with 7 mockserver tests that verify: - Read-write DBAPI transactions send no BeginTransactionRequest - First ExecuteSqlRequest uses TransactionSelector(begin=...) - Read + write + commit request sequence is correct - DML-only transactions use inline begin - Read-only transactions still use explicit BeginTransaction - Transaction retry after abort works with inline begin Update existing mockserver tests that expected BeginTransactionRequest for read-write DBAPI transactions: - test_tags.py: Remove BeginTransactionRequest from expected sequences for all read-write tag tests, adjust tag index offsets - test_dbapi_isolation_level.py: Verify isolation level on the inline begin field of ExecuteSqlRequest instead of BeginTransactionRequest Made-with: Cursor
a98de4c to
48d4df8
Compare
Made-with: Cursor
- Consolidate 3 redundant single-read tests into one comprehensive test that verifies: no BeginTransactionRequest, inline begin on first ExecuteSqlRequest, correct request sequence, and correct query results - Rename test_second_statement_uses_transaction_id to test_read_then_write_full_lifecycle with additional assertions: CommitRequest.transaction_id matches the transaction ID from inline begin - Strengthen test_rollback to verify RollbackRequest is sent with a non-empty transaction_id (was only checking no BeginTransactionRequest) - Add CommitRequest assertions to abort retry test: both the aborted and successful commits carry valid transaction IDs - Assert cursor.fetchall() return values in read tests to verify inline begin doesn't corrupt result set metadata - Add RollbackRequest import Made-with: Cursor
|
📢 Migration Notice: 📢 This library is moving to the google-cloud-python monorepo soon. We kept this PR open due to recent activity. We would like to finalize this PR so it can be merged if it is critical. If we don't hear from the PR author, we will close this PR in the next few days. The PR can then be re-opened in the monorepo once the migration is complete and work can continue there. |
|
Do not merge yet: This needs a bit of work, as it does not handle the situation where the first statement in a transaction fails, and thereby also fails to actually start a transaction and return a transaction ID. |
Add retries if the first statement in a read/write transaction fails, as the statement then does not return a transaction ID. In order to ensure that we get a transaction ID, we first execute an explicit BeginTransaction RPC and then retry the original statement. We return the response of the retry to the application, regardless whether the retry fails or succeeds. The reason that we do a retry with a BeginTransaction AND include the first statement, is to guarantee transaction consistency. If we were to leave the first statement out of the transaction, then it will not be guaranteed that the error condition that cause the failure in the first place is actually still true when the transaction commits. This would break the transaction guarantees. Example (pseudo-code): ```sql -- The following statement fails with ALREADY_EXISTS insert into some_table (id, value) values (1, 'One'); -- Execute an explicit BeginTransaction RPC. begin; -- Retry the initial statement. This ensures that -- whatever the response is, this response will be -- valid for the entire transaction. insert into some_table (id, value) values (1, 'One'); -- This is guaranteed to return a row. select * from some_table where id=1; -- ... execute the rest of the transaction ... commit; ``` If we had not included the initial insert statement in the retried transaction, then there is no guarantee that the select statement would actually return any rows, as other transactions could in theory have deleted it in the meantime.
423d205 to
ead03f3
Compare
ead03f3 to
d65c25c
Compare
|
A couple of additional notes on this implementation: First Statement FailsThe implementation now also handles the case where the first statement in a read/write transaction fails. From the commit message: Add retries if the first statement in a read/write transaction fails, as the The reason that we do a retry with a BeginTransaction AND include the Example (pseudo-code): -- The following statement fails with ALREADY_EXISTS
insert into some_table (id, value) values (1, 'One');
-- Execute an explicit BeginTransaction RPC.
begin;
-- Retry the initial statement. This ensures that
-- whatever the response is, this response will be
-- valid for the entire transaction.
insert into some_table (id, value) values (1, 'One');
-- This is guaranteed to return a row.
select * from some_table where id=1;
-- ... execute the rest of the transaction ...
commit;If we had not included the initial insert statement in the retried transaction, Performance GainThe initial description mentions a performance gain of '~16ms measured on the emulator'. The actual gain on real Spanner is likely to be somewhere between 3ms to 5ms, assuming that the applications runs in the same region as the Spanner instance it is talking to. |
Fixes googleapis/google-cloud-python#15871
Summary
Connection.transaction_checkout()currently callsTransaction.begin()explicitly before the first query, which sends a standaloneBeginTransactiongRPC RPC. This is unnecessary because theTransactionclass already supports inline begin — piggybackingBeginTransactiononto the firstExecuteSql/ExecuteBatchDmlrequest viaTransactionSelector(begin=...).This PR removes the explicit
begin()call, letting the existing inline begin logic inexecute_sql(),execute_update(), andbatch_update()handle transaction creation. This saves one gRPC round-trip per transaction (~16ms measured on the emulator).What changed
google/cloud/spanner_dbapi/connection.py—transaction_checkout():self._transaction.begin()on L413_transaction_id=Nonetests/unit/spanner_dbapi/test_connection.py:test_transaction_checkout_does_not_call_beginto assertbegin()is not calledtests/mockserver_tests/test_dbapi_inline_begin.py(new):BeginTransactionRequestsent, firstExecuteSqlRequestusesTransactionSelector(begin=...), transaction ID reuse on second statement, rollback, read-only unaffected, retry after aborttests/mockserver_tests/test_tags.py:BeginTransactionRequestfrom expected RPC sequences, adjusted tag index offsetstests/mockserver_tests/test_dbapi_isolation_level.py:ExecuteSqlRequest.transaction.begin.isolation_levelinstead ofBeginTransactionRequest.options.isolation_levelWhy this is safe
The inline begin code path already exists and is battle-tested —
Session.run_in_transaction()creates aTransactionwithout callingbegin()and relies on the same inline begin logic (session.py L566).Specific safety analysis:
transaction_checkout()callers always execute SQL immediately: It's only called fromrun_statement()→execute_sql()andbatch_dml_executor→batch_update(). Both set_transaction_idvia inline begin before any commit/rollback path.execute_sql/execute_update/batch_updatehandle_transaction_id is None: They acquire a lock, use_make_txn_selector()which returnsTransactionSelector(begin=...), and store the returned_transaction_id(transaction.py L612-L623).rollback()handles_transaction_id is None: Skips the RPC — correct when no server-side transaction exists (transaction.py L163).commit()handles_transaction_id is None: Falls back to_begin_mutations_only_transaction()for mutation-only transactions (transaction.py L263-L267).Retry mechanism is compatible:
_set_connection_for_retry()resets_spanner_transaction_started=False, so replayed statements go throughtransaction_checkout()again, create a freshTransaction, and use inline begin.PEP 249 conformance
This change is fully conformant with PEP 249 (DB-API 2.0). The spec does not define a
begin()method — transactions are implicit. The PEP author clarified on the DB-SIG mailing list that "transactions start implicitly after you connect and after you call.commit()or.rollback()", and the mechanism by which the driver starts the server-side transaction is an implementation detail. Deferring the server-side begin to the first SQL execution (as psycopg2 and other mature DB-API drivers do) is the standard approach. The observable transactional semantics — atomicity betweencommit()/rollback()calls — are unchanged.Performance impact
Before (4 RPCs per read-write transaction):
After (3 RPCs per read-write transaction):
Measured ~16ms savings per transaction on the Spanner emulator.
Context
This optimization was identified while profiling SQLAlchemy/DBAPI performance against Spanner. The DBAPI was created ~2 years before inline begin support was added to the Python client library (inline begin landed in PR googleapis/python-spanner#740, Dec 2022). The
run_in_transactionpath was updated to use inline begin, buttransaction_checkoutwas not.Test plan
tests/unit/spanner_dbapi/)begin()is not called bytransaction_checkout()tests/mockserver_tests/test_dbapi_inline_begin.py)test_tags.py,test_dbapi_isolation_level.py)tests/system/test_dbapi.py)