181
/* ========================================================================
182
======================= TRANSACTIONS ===================================*/
185
Transaction handling in the server
186
==================================
188
In each client connection, MySQL maintains two transactional
190
- a statement transaction,
191
- a standard, also called normal transaction.
195
"Statement transaction" is a non-standard term that comes
196
from the times when MySQL supported BerkeleyDB storage engine.
198
First of all, it should be said that in BerkeleyDB auto-commit
199
mode auto-commits operations that are atomic to the storage
200
engine itself, such as a write of a record, and are too
201
high-granular to be atomic from the application perspective
202
(MySQL). One SQL statement could involve many BerkeleyDB
203
auto-committed operations and thus BerkeleyDB auto-commit was of
206
Secondly, instead of SQL standard savepoints, BerkeleyDB
207
provided the concept of "nested transactions". In a nutshell,
208
transactions could be arbitrarily nested, but when the parent
209
transaction was committed or aborted, all its child (nested)
210
transactions were handled committed or aborted as well.
211
Commit of a nested transaction, in turn, made its changes
212
visible, but not durable: it destroyed the nested transaction,
213
all its changes would become available to the parent and
214
currently active nested transactions of this parent.
216
So the mechanism of nested transactions was employed to
217
provide "all or nothing" guarantee of SQL statements
218
required by the standard.
219
A nested transaction would be created at start of each SQL
220
statement, and destroyed (committed or aborted) at statement
221
end. Such nested transaction was internally referred to as
222
a "statement transaction" and gave birth to the term.
224
<Historical note ends>
226
Since then a statement transaction is started for each statement
227
that accesses transactional tables or uses the binary log. If
228
the statement succeeds, the statement transaction is committed.
229
If the statement fails, the transaction is rolled back. Commits
230
of statement transactions are not durable -- each such
231
transaction is nested in the normal transaction, and if the
232
normal transaction is rolled back, the effects of all enclosed
233
statement transactions are undone as well. Technically,
234
a statement transaction can be viewed as a savepoint which is
235
maintained automatically in order to make effects of one
238
The normal transaction is started by the user and is ended
239
usually upon a user request as well. The normal transaction
240
encloses transactions of all statements issued between
241
its beginning and its end.
242
In autocommit mode, the normal transaction is equivalent
243
to the statement transaction.
245
Since MySQL supports PSEA (pluggable storage engine
246
architecture), more than one transactional engine can be
247
active at a time. Hence transactions, from the server
248
point of view, are always distributed. In particular,
249
transactional state is maintained independently for each
250
engine. In order to commit a transaction the two phase
251
commit protocol is employed.
253
Not all statements are executed in context of a transaction.
254
Administrative and status information statements do not modify
255
engine data, and thus do not start a statement transaction and
256
also have no effect on the normal transaction. Examples of such
257
statements are SHOW STATUS and RESET SLAVE.
259
Similarly DDL statements are not transactional,
260
and therefore a transaction is [almost] never started for a DDL
261
statement. The difference between a DDL statement and a purely
262
administrative statement though is that a DDL statement always
263
commits the current transaction before proceeding, if there is
266
At last, SQL statements that work with non-transactional
267
engines also have no effect on the transaction state of the
268
connection. Even though they are written to the binary log,
269
and the binary log is, overall, transactional, the writes
270
are done in "write-through" mode, directly to the binlog
271
file, followed with a OS cache sync, in other words,
272
bypassing the binlog undo log (translog).
273
They do not commit the current normal transaction.
274
A failure of a statement that uses non-transactional tables
275
would cause a rollback of the statement transaction, but
276
in case there no non-transactional tables are used,
277
no statement transaction is started.
282
The server stores its transaction-related data in
283
session->transaction. This structure has two members of type
284
Session_TRANS. These members correspond to the statement and
285
normal transactions respectively:
287
- session->transaction.stmt contains a list of engines
288
that are participating in the given statement
289
- session->transaction.all contains a list of engines that
290
have participated in any of the statement transactions started
291
within the context of the normal transaction.
292
Each element of the list contains a pointer to the storage
293
engine, engine-specific transactional data, and engine-specific
296
In autocommit mode session->transaction.all is empty.
297
Instead, data of session->transaction.stmt is
298
used to commit/rollback the normal transaction.
300
The list of registered engines has a few important properties:
301
- no engine is registered in the list twice
302
- engines are present in the list a reverse temporal order --
303
new participants are always added to the beginning of the list.
305
Transaction life cycle
306
----------------------
308
When a new connection is established, session->transaction
309
members are initialized to an empty state.
310
If a statement uses any tables, all affected engines
311
are registered in the statement engine list. In
312
non-autocommit mode, the same engines are registered in
313
the normal transaction list.
314
At the end of the statement, the server issues a commit
315
or a roll back for all engines in the statement list.
316
At this point transaction flags of an engine, if any, are
317
propagated from the statement list to the list of the normal
319
When commit/rollback is finished, the statement list is
320
cleared. It will be filled in again by the next statement,
321
and emptied again at the next statement's end.
323
The normal transaction is committed in a similar way
324
(by going over all engines in session->transaction.all list)
325
but at different times:
326
- upon COMMIT SQL statement is issued by the user
327
- implicitly, by the server, at the beginning of a DDL statement
328
or SET AUTOCOMMIT={0|1} statement.
330
The normal transaction can be rolled back as well:
331
- if the user has requested so, by issuing ROLLBACK SQL
333
- if one of the storage engines requested a rollback
334
by setting session->transaction_rollback_request. This may
335
happen in case, e.g., when the transaction in the engine was
336
chosen a victim of the internal deadlock resolution algorithm
337
and rolled back internally. When such a situation happens, there
338
is little the server can do and the only option is to rollback
339
transactions in all other participating engines. In this case
340
the rollback is accompanied by an error sent to the user.
342
As follows from the use cases above, the normal transaction
343
is never committed when there is an outstanding statement
344
transaction. In most cases there is no conflict, since
345
commits of the normal transaction are issued by a stand-alone
346
administrative or DDL statement, thus no outstanding statement
347
transaction of the previous statement exists. Besides,
348
all statements that manipulate with the normal transaction
349
are prohibited in stored functions and triggers, therefore
350
no conflicting situation can occur in a sub-statement either.
351
The remaining rare cases when the server explicitly has
352
to commit the statement transaction prior to committing the normal
353
one cover error-handling scenarios (see for example
356
When committing a statement or a normal transaction, the server
357
either uses the two-phase commit protocol, or issues a commit
358
in each engine independently. The two-phase commit protocol
360
- all participating engines support two-phase commit (provide
361
plugin::StorageEngine::prepare PSEA API call) and
362
- transactions in at least two engines modify data (i.e. are
365
Note that the two phase commit is used for
366
statement transactions, even though they are not durable anyway.
367
This is done to ensure logical consistency of data in a multiple-
369
For example, imagine that some day MySQL supports unique
370
constraint checks deferred till the end of statement. In such
371
case a commit in one of the engines may yield ER_DUP_KEY,
372
and MySQL should be able to gracefully abort statement
373
transactions of other participants.
375
After the normal transaction has been committed,
376
session->transaction.all list is cleared.
378
When a connection is closed, the current normal transaction, if
381
Roles and responsibilities
382
--------------------------
384
The server has no way to know that an engine participates in
385
the statement and a transaction has been started
386
in it unless the engine says so. Thus, in order to be
387
a part of a transaction, the engine must "register" itself.
388
This is done by invoking trans_register_ha() server call.
389
Normally the engine registers itself whenever Cursor::external_lock()
390
is called. trans_register_ha() can be invoked many times: if
391
an engine is already registered, the call does nothing.
392
In case autocommit is not set, the engine must register itself
393
twice -- both in the statement list and in the normal transaction
395
In which list to register is a parameter of trans_register_ha().
397
Note, that although the registration interface in itself is
398
fairly clear, the current usage practice often leads to undesired
399
effects. E.g. since a call to trans_register_ha() in most engines
400
is embedded into implementation of Cursor::external_lock(), some
401
DDL statements start a transaction (at least from the server
402
point of view) even though they are not expected to. E.g.
403
CREATE TABLE does not start a transaction, since
404
Cursor::external_lock() is never called during CREATE TABLE. But
405
CREATE TABLE ... SELECT does, since Cursor::external_lock() is
406
called for the table that is being selected from. This has no
407
practical effects currently, but must be kept in mind
410
Once an engine is registered, the server will do the rest
413
During statement execution, whenever any of data-modifying
414
PSEA API methods is used, e.g. Cursor::write_row() or
415
Cursor::update_row(), the read-write flag is raised in the
416
statement transaction for the involved engine.
417
Currently All PSEA calls are "traced", and the data can not be
418
changed in a way other than issuing a PSEA call. Important:
419
unless this invariant is preserved the server will not know that
420
a transaction in a given engine is read-write and will not
421
involve the two-phase commit protocol!
423
At the end of a statement, server call
424
ha_autocommit_or_rollback() is invoked. This call in turn
425
invokes plugin::StorageEngine::prepare() for every involved engine.
426
Prepare is followed by a call to plugin::StorageEngine::commit_one_phase()
427
If a one-phase commit will suffice, plugin::StorageEngine::prepare() is not
428
invoked and the server only calls plugin::StorageEngine::commit_one_phase().
429
At statement commit, the statement-related read-write engine
430
flag is propagated to the corresponding flag in the normal
431
transaction. When the commit is complete, the list of registered
434
Rollback is handled in a similar fashion.
436
Additional notes on DDL and the normal transaction.
437
---------------------------------------------------
439
DDLs and operations with non-transactional engines
440
do not "register" in session->transaction lists, and thus do not
441
modify the transaction state. Besides, each DDL in
442
MySQL is prefixed with an implicit normal transaction commit
443
(a call to Session::endActiveTransaction()), and thus leaves nothing
445
However, as it has been pointed out with CREATE TABLE .. SELECT,
446
some DDL statements can start a *new* transaction.
448
Behaviour of the server in this case is currently badly
450
DDL statements use a form of "semantic" logging
451
to maintain atomicity: if CREATE TABLE .. SELECT failed,
452
the newly created table is deleted.
453
In addition, some DDL statements issue interim transaction
454
commits: e.g. ALTER Table issues a commit after data is copied
455
from the original table to the internal temporary table. Other
456
statements, e.g. CREATE TABLE ... SELECT do not always commit
458
And finally there is a group of DDL statements such as
459
RENAME/DROP Table that doesn't start a new transaction
462
This diversity makes it hard to say what will happen if
463
by chance a stored function is invoked during a DDL --
464
whether any modifications it makes will be committed or not
465
is not clear. Fortunately, SQL grammar of few DDLs allows
466
invocation of a stored function.
468
A consistent behaviour is perhaps to always commit the normal
469
transaction after all DDLs, just like the statement transaction
470
is always committed at the end of all statements.
474
Register a storage engine for a transaction.
476
Every storage engine MUST call this function when it starts
477
a transaction or a statement (that is it must be called both for the
478
"beginning of transaction" and "beginning of statement").
479
Only storage engines registered for the transaction/statement
480
will know when to commit/rollback it.
483
trans_register_ha is idempotent - storage engine may register many
484
times per transaction.
487
void trans_register_ha(Session *session, bool all, plugin::StorageEngine *engine)
489
Session_TRANS *trans;
490
Ha_trx_info *ha_info;
494
trans= &session->transaction.all;
495
session->server_status|= SERVER_STATUS_IN_TRANS;
498
trans= &session->transaction.stmt;
500
ha_info= session->getEngineInfo(engine, all ? 1 : 0);
502
if (ha_info->is_started())
503
return; /* already registered, return */
505
ha_info->register_ha(trans, engine);
507
trans->no_2pc|= not engine->has_2pc();
508
if (session->transaction.xid_state.xid.is_null())
509
session->transaction.xid_state.xid.set(session->query_id);
513
Check if we can skip the two-phase commit.
515
A helper function to evaluate if two-phase commit is mandatory.
516
As a side effect, propagates the read-only/read-write flags
517
of the statement transaction to its enclosing normal transaction.
519
@retval true we must run a two-phase commit. Returned
520
if we have at least two engines with read-write changes.
521
@retval false Don't need two-phase commit. Even if we have two
522
transactional engines, we can run two independent
523
commits if changes in one of the engines are read-only.
528
ha_check_and_coalesce_trx_read_only(Session *session, Ha_trx_info *ha_list,
531
/* The number of storage engines that have actual changes. */
532
unsigned rw_ha_count= 0;
533
Ha_trx_info *ha_info;
535
for (ha_info= ha_list; ha_info; ha_info= ha_info->next())
537
if (ha_info->is_trx_read_write())
542
Ha_trx_info *ha_info_all= session->getEngineInfo(ha_info->engine(), 1);
543
assert(ha_info != ha_info_all);
545
Merge read-only/read-write information about statement
546
transaction to its enclosing normal transaction. Do this
547
only if in a real transaction -- that is, if we know
548
that ha_info_all is registered in session->transaction.all.
549
Since otherwise we only clutter the normal transaction flags.
551
if (ha_info_all->is_started()) /* false if autocommit. */
552
ha_info_all->coalesce_trx_with(ha_info);
554
else if (rw_ha_count > 1)
557
It is a normal transaction, so we don't need to merge read/write
558
information up, and the need for two-phase commit has been
559
already established. Break the loop prematurely.
564
return rw_ha_count > 1;
572
1 transaction was rolled back
574
2 error during commit, data may be inconsistent
577
Since we don't support nested statement transactions in 5.0,
578
we can't commit or rollback stmt transactions while we are inside
579
stored functions or triggers. So we simply do nothing now.
580
TODO: This should be fixed in later ( >= 5.1) releases.
582
int ha_commit_trans(Session *session, bool all)
584
int error= 0, cookie= 0;
586
'all' means that this is either an explicit commit issued by
587
user, or an implicit commit issued by a DDL.
589
Session_TRANS *trans= all ? &session->transaction.all : &session->transaction.stmt;
590
bool is_real_trans= all || session->transaction.all.ha_list == 0;
591
Ha_trx_info *ha_info= trans->ha_list;
594
We must not commit the normal transaction if a statement
595
transaction is pending. Otherwise statement transaction
596
flags will not get propagated to its normal transaction's
599
assert(session->transaction.stmt.ha_list == NULL ||
600
trans == &session->transaction.stmt);
606
if (is_real_trans && wait_if_global_read_lock(session, 0, 0))
608
ha_rollback_trans(session, all);
612
must_2pc= ha_check_and_coalesce_trx_read_only(session, ha_info, all);
614
if (!trans->no_2pc && must_2pc)
616
for (; ha_info && !error; ha_info= ha_info->next())
619
plugin::StorageEngine *engine= ha_info->engine();
621
Do not call two-phase commit if this particular
622
transaction is read-only. This allows for simpler
623
implementation in engines that are always read-only.
625
if (! ha_info->is_trx_read_write())
628
Sic: we know that prepare() is not NULL since otherwise
629
trans->no_2pc would have been set.
631
if ((err= engine->prepare(session, all)))
633
my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
636
status_var_increment(session->status_var.ha_prepare_count);
640
ha_rollback_trans(session, all);
645
error=ha_commit_one_phase(session, all) ? (cookie ? 2 : 1) : 0;
648
start_waiting_global_read_lock(session);
655
This function does not care about global read lock. A caller should.
657
int ha_commit_one_phase(Session *session, bool all)
660
Session_TRANS *trans=all ? &session->transaction.all : &session->transaction.stmt;
661
bool is_real_trans=all || session->transaction.all.ha_list == 0;
662
Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
665
for (; ha_info; ha_info= ha_info_next)
668
plugin::StorageEngine *engine= ha_info->engine();
669
if ((err= engine->commit(session, all)))
671
my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
674
status_var_increment(session->status_var.ha_commit_count);
675
ha_info_next= ha_info->next();
676
ha_info->reset(); /* keep it conveniently zero-filled */
681
session->transaction.xid_state.xid.null();
684
session->variables.tx_isolation=session->session_tx_isolation;
685
session->transaction.cleanup();
693
* We commit the normal transaction by finalizing the transaction message
694
* and propogating the message to all registered replicators.
696
ReplicationServices &replication_services= ReplicationServices::singleton();
697
replication_services.commitTransaction(session);
704
int ha_rollback_trans(Session *session, bool all)
707
Session_TRANS *trans=all ? &session->transaction.all : &session->transaction.stmt;
708
Ha_trx_info *ha_info= trans->ha_list, *ha_info_next;
709
bool is_real_trans=all || session->transaction.all.ha_list == 0;
712
We must not rollback the normal transaction if a statement
713
transaction is pending.
715
assert(session->transaction.stmt.ha_list == NULL ||
716
trans == &session->transaction.stmt);
720
for (; ha_info; ha_info= ha_info_next)
723
plugin::StorageEngine *engine= ha_info->engine();
724
if ((err= engine->rollback(session, all)))
726
my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
729
status_var_increment(session->status_var.ha_rollback_count);
730
ha_info_next= ha_info->next();
731
ha_info->reset(); /* keep it conveniently zero-filled */
737
* We need to signal the ROLLBACK to ReplicationServices here
738
* BEFORE we set the transaction ID to NULL. This is because
739
* if a bulk segment was sent to replicators, we need to send
740
* a rollback statement with the corresponding transaction ID
743
ReplicationServices &replication_services= ReplicationServices::singleton();
744
replication_services.rollbackTransaction(session);
747
session->transaction.xid_state.xid.null();
750
session->variables.tx_isolation=session->session_tx_isolation;
751
session->transaction.cleanup();
755
session->transaction_rollback_request= false;
758
If a non-transactional table was updated, warn; don't warn if this is a
759
slave thread (because when a slave thread executes a ROLLBACK, it has
760
been read from the binary log, so it's 100% sure and normal to produce
761
error ER_WARNING_NOT_COMPLETE_ROLLBACK. If we sent the warning to the
762
slave SQL thread, it would not stop the thread but just be printed in
763
the error log; but we don't want users to wonder why they have this
764
message in the error log, so we don't send it.
766
if (is_real_trans && session->transaction.all.modified_non_trans_table && session->killed != Session::KILL_CONNECTION)
767
push_warning(session, DRIZZLE_ERROR::WARN_LEVEL_WARN,
768
ER_WARNING_NOT_COMPLETE_ROLLBACK,
769
ER(ER_WARNING_NOT_COMPLETE_ROLLBACK));
774
This is used to commit or rollback a single statement depending on
778
Note that if the autocommit is on, then the following call inside
779
InnoDB will commit or rollback the whole transaction (= the statement). The
780
autocommit mechanism built into InnoDB is based on counting locks, but if
781
the user has used LOCK TABLES then that mechanism does not know to do the
784
int ha_autocommit_or_rollback(Session *session, int error)
786
if (session->transaction.stmt.ha_list)
790
if (ha_commit_trans(session, 0))
795
(void) ha_rollback_trans(session, 0);
796
if (session->transaction_rollback_request)
797
(void) ha_rollback(session);
800
session->variables.tx_isolation=session->session_tx_isolation;
807
return the list of XID's to a client, the same way SHOW commands do.
810
I didn't find in XA specs that an RM cannot return the same XID twice,
811
so mysql_xa_recover does not filter XID's to ensure uniqueness.
812
It can be easily fixed later, if necessary.
814
bool mysql_xa_recover(Session *session)
816
List<Item> field_list;
820
field_list.push_back(new Item_int("formatID", 0, MY_INT32_NUM_DECIMAL_DIGITS));
821
field_list.push_back(new Item_int("gtrid_length", 0, MY_INT32_NUM_DECIMAL_DIGITS));
822
field_list.push_back(new Item_int("bqual_length", 0, MY_INT32_NUM_DECIMAL_DIGITS));
823
field_list.push_back(new Item_empty_string("data",XIDDATASIZE));
825
if (session->client->sendFields(&field_list))
828
pthread_mutex_lock(&LOCK_xid_cache);
829
while ((xs= (XID_STATE*)hash_element(&xid_cache, i++)))
831
if (xs->xa_state==XA_PREPARED)
833
session->client->store((int64_t)xs->xid.formatID);
834
session->client->store((int64_t)xs->xid.gtrid_length);
835
session->client->store((int64_t)xs->xid.bqual_length);
836
session->client->store(xs->xid.data,
837
xs->xid.gtrid_length+xs->xid.bqual_length);
838
if (session->client->flush())
840
pthread_mutex_unlock(&LOCK_xid_cache);
846
pthread_mutex_unlock(&LOCK_xid_cache);
852
int ha_rollback_to_savepoint(Session *session, SAVEPOINT *sv)
855
Session_TRANS *trans= &session->transaction.all;
856
Ha_trx_info *ha_info, *ha_info_next;
860
rolling back to savepoint in all storage engines that were part of the
861
transaction when the savepoint was set
863
for (ha_info= sv->ha_list; ha_info; ha_info= ha_info->next())
866
plugin::StorageEngine *engine= ha_info->engine();
868
if ((err= engine->savepoint_rollback(session,
871
my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
874
status_var_increment(session->status_var.ha_savepoint_rollback_count);
875
trans->no_2pc|= not engine->has_2pc();
878
rolling back the transaction in all storage engines that were not part of
879
the transaction when the savepoint was set
881
for (ha_info= trans->ha_list; ha_info != sv->ha_list;
882
ha_info= ha_info_next)
885
plugin::StorageEngine *engine= ha_info->engine();
886
if ((err= engine->rollback(session, !(0))))
888
my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
891
status_var_increment(session->status_var.ha_rollback_count);
892
ha_info_next= ha_info->next();
893
ha_info->reset(); /* keep it conveniently zero-filled */
895
trans->ha_list= sv->ha_list;
901
according to the sql standard (ISO/IEC 9075-2:2003)
902
section "4.33.4 SQL-statements and transaction states",
903
SAVEPOINT is *not* transaction-initiating SQL-statement
905
int ha_savepoint(Session *session, SAVEPOINT *sv)
908
Session_TRANS *trans= &session->transaction.all;
909
Ha_trx_info *ha_info= trans->ha_list;
910
for (; ha_info; ha_info= ha_info->next())
913
plugin::StorageEngine *engine= ha_info->engine();
915
#ifdef NOT_IMPLEMENTED /*- TODO (examine this againt the original code base) */
916
if (! engine->savepoint_set)
918
my_error(ER_CHECK_NOT_IMPLEMENTED, MYF(0), "SAVEPOINT");
923
if ((err= engine->savepoint_set(session, (void *)(sv+1))))
925
my_error(ER_GET_ERRNO, MYF(0), err);
928
status_var_increment(session->status_var.ha_savepoint_count);
931
Remember the list of registered storage engines. All new
932
engines are prepended to the beginning of the list.
934
sv->ha_list= trans->ha_list;
938
int ha_release_savepoint(Session *session, SAVEPOINT *sv)
941
Ha_trx_info *ha_info= sv->ha_list;
943
for (; ha_info; ha_info= ha_info->next())
946
plugin::StorageEngine *engine= ha_info->engine();
947
/* Savepoint life time is enclosed into transaction life time. */
949
if ((err= engine->savepoint_release(session,
952
my_error(ER_GET_ERRNO, MYF(0), err);
170
} /* namespace drizzled */