~drizzle-trunk/drizzle/development

1861.2.1 by David Shrewsbury
Add some initial Transaction Log plugin and Replication documentation.
1
Drizzle Replication
2
===================
3
4
Replication events are recorded using messages in the `Google Protocol Buffer
5
<http://code.google.com/p/protobuf/>`_ (GPB) format. GPB messages can contain
6
sub-messages. There is a single main "envelope" message, Transaction, that
7
is passed to plugins that subscribe to the replication stream.
8
9
10
Message Definitions
11
-------------------
12
13
The GPB messages are defined in .proto files in the drizzled/message
14
directory of the Drizzle source code. The primary definition file is
15
transaction.proto. Messages defined in this file are related in the
16
following ways::
17
18
19
  ------------------------------------------------------------------
20
  |                                                                |
21
  | Transaction message                                            |
22
  |                                                                |
23
  |   -----------------------------------------------------------  |
24
  |   |                                                         |  |
25
  |   | TransactionContext message                              |  |
26
  |   |                                                         |  |
27
  |   -----------------------------------------------------------  |
28
  |   -----------------------------------------------------------  |
29
  |   |                                                         |  |
30
  |   | Statement message 1                                     |  |
31
  |   |                                                         |  |
32
  |   -----------------------------------------------------------  |
33
  |   -----------------------------------------------------------  |
34
  |   |                                                         |  |
35
  |   | Statement message 2                                     |  |
36
  |   |                                                         |  |
37
  |   -----------------------------------------------------------  |
38
  |                             ...                                |
39
  |   -----------------------------------------------------------  |
40
  |   |                                                         |  |
41
  |   | Statement message N                                     |  |
42
  |   |                                                         |  |
43
  |   -----------------------------------------------------------  |
44
  ------------------------------------------------------------------
45
46
with each Statement message looking like so::
47
48
  ------------------------------------------------------------------
49
  |                                                                |
50
  | Statement message                                              |
51
  |                                                                |
52
  |   -----------------------------------------------------------  |
53
  |   |                                                         |  |
54
  |   | Common information                                      |  |
55
  |   |                                                         |  |
56
  |   |  - Type of Statement (INSERT, DELETE, etc)              |  |
57
  |   |  - Start Timestamp                                      |  |
58
  |   |  - End Timestamp                                        |  |
59
  |   |  - (OPTIONAL) Actual SQL query string                   |  |
60
  |   |                                                         |  |
61
  |   -----------------------------------------------------------  |
62
  |   -----------------------------------------------------------  |
63
  |   |                                                         |  |
64
  |   | Statement subclass message 1 (see below)                |  |
65
  |   |                                                         |  |
66
  |   -----------------------------------------------------------  |
67
  |                             ...                                |
68
  |   -----------------------------------------------------------  |
69
  |   |                                                         |  |
70
  |   | Statement subclass message N (see below)                |  |
71
  |   |                                                         |  |
72
  |   -----------------------------------------------------------  |
73
  ------------------------------------------------------------------
74
75
The Transaction Message
76
^^^^^^^^^^^^^^^^^^^^^^^
77
78
The main "envelope" message which represents an atomic transaction
79
which changed the state of a server is the Transaction message class.
80
81
The Transaction message contains two pieces:
82
83
#. A TransactionContext message containing information about the
84
   transaction as a whole, such as the ID of the executing server,
85
   the start and end timestamp of the transaction, and a globally-
86
   unique identifier for the transaction.
87
#. A vector of Statement messages representing the distinct SQL
88
   statements which modified the state of the server.  The Statement
89
   message is, itself, a generic envelope message containing a
90
   sub-message which describes the specific data modification which
91
   occurred on the server (such as, for instance, an INSERT statement.
92
93
The Statement Message
94
^^^^^^^^^^^^^^^^^^^^^
95
96
The generic "envelope" message containing information common to each
97
SQL statement executed against a server (such as a start and end timestamp
98
and the type of the SQL statement) as well as a Statement subclass message
99
describing the specific data modification event on the server.
100
101
Each Statement message contains a type member which indicates how readers
102
of the Statement should construct the inner Statement subclass representing
103
a data change.
104
105
106
How Bulk Operations Work
107
------------------------
108
109
Certain operations which change large volumes of data on a server
110
present a specific set of problems for a transaction coordinator or
111
replication service. If all operations must complete atomically on a
112
publishing server before replicas are delivered the complete
113
transactional unit:
114
115
#. The publishing server could consume a large amount of memory
116
   building an in-memory Transaction message containing all the
117
   operations contained  in the entire transaction.
118
#. A replica, or subscribing server, is wasting time waiting on the
119
   eventual completion (commit) of the large transaction on the
120
   publishing server. It could be applying pieces of the large
121
   transaction in the meantime...
122
123
In order to prevent the problems inherent in (1) and (2) above, Drizzle's
124
replication system uses a mechanism which provides bulk change
125
operations.
126
127
When a regular SQL statement modifies or inserts more rows than a
128
certain threshold, Drizzle's replication services component will begin
129
sending Transaction messages to replicas which contain a chunk
130
(or "segment") of the data which has been changed on the publisher.
131
132
When data is inserted, updated, or modified in the database, a
133
header containing information about modified tables and fields is
134
matched with one or more data segments which contain the actual
135
values changed in the statement.
136
137
It's easiest to understand this mechanism by following through a real-world
138
scenario.
139
140
Suppose the following table::
141
142
  CREATE TABLE test.person
143
  (
144
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
145
  , first_name VARCHAR(50)
146
  , last_name VARCHAR(50)
147
  , is_active CHAR(1) NOT NULL DEFAULT 'Y'
148
  );
149
150
Also suppose that test.t1 contains 1 million records.
151
152
Next, suppose a client issues the SQL statement::
153
154
  UPDATE test.person SET is_active = 'N';
155
156
It is clear that one million records could be updated by this statement
157
(we say, "could be" since Drizzle does not actually update a record if
158
the UPDATE would not change the existing record...).
159
160
In order to prevent the publishing server from having to construct an
161
enormous Transaction message, Drizzle's replication services component
162
will do the following:
163
164
#. Construct a Transaction message with a transaction context containing
165
   information about the originating server, the transaction ID, and
166
   timestamp information.
167
#. Construct an UpdateHeader message with information about the tables
168
   and fields involved in the UPDATE statement.  Push this UpdateHeader
169
   message onto the Transaction message's statement vector.
170
#. Construct an UpdateData message.  Set the segment_id member to 1.
171
   Set the end_segment member to true.
172
#. For every record updated in a storage engine, the ReplicationServices
173
   component builds a new UpdateRecord message and appends this message
174
   to the aforementioned UpdateData message's record vector.
175
#. After a certain threshold of records is reached, the
176
   ReplicationServices component sets the current UpdateData message's
177
   end_segment member to false, and proceeds to send the Transaction
178
   message to replicators.
179
#. The ReplicationServices component then constructs a new Transaction
180
   message and constructs a transaction context with the same
181
   transaction ID and server information.
182
#. A new UpdateData message is created.  The message's segment_id is
183
   set to N+1 and as new records are updated, new UpdateRecord messages
184
   are appended to the UpdateData message's record vector.
185
#. While records are being updated, we repeat steps 5 through 7, with
186
   only the final UpdateData message having its end_segment member set
187
   to true.
188
189
Handling ROLLBACKs
190
------------------
191
1861.2.2 by David Shrewsbury
Correct rollback handling section.
192
When a transaction is rolled back, one of two things happen depending
193
on whether the transaction is made up of either a single Transaction
194
message, or if it is made up of multiple Transaction messages (e.g, bulk
195
load).
1861.2.1 by David Shrewsbury
Add some initial Transaction Log plugin and Replication documentation.
196
1861.2.2 by David Shrewsbury
Correct rollback handling section.
197
* For a transaction encapsulated entirely within a single Transaction
198
  message, the entire message is simply discarded and not sent through
199
  the replication stream.
200
* For a transaction which is made up of multiple messages, and at least
201
  one message has already been sent through the replication stream, then
202
  the Transaction message will contain a Statement message with type =
203
  ROLLBACK.