2425.2.2
by Daniel Nichter
A lot of doc changes: rewrite and expand Configuration and Administration, re-order top-level sections, enhance Contributing, add Release Notes, add Help and Support, fix title casing, label all plugins, other misc. enhancements. |
1 |
*******************
|
2 |
Drizzle Replication
|
|
3 |
*******************
|
|
4 |
||
5 |
Replication events are recorded using messages in the `Google Protocol Buffer |
|
6 |
<http://code.google.com/p/protobuf/>`_ (GPB) format. GPB messages can contain
|
|
7 |
sub-messages. There is a single main "envelope" message, Transaction, that |
|
8 |
is passed to plugins that subscribe to the replication stream. |
|
9 |
||
10 |
Configuration Options
|
|
11 |
#####################
|
|
12 |
||
13 |
**transaction_message_threshold**
|
|
14 |
||
15 |
Controls the size, in bytes, of the Transaction messages. When a Transaction |
|
16 |
message exceeds this size, a new Transaction message with the same |
|
17 |
transaction ID will be created to continue the replication events. |
|
18 |
See :ref:`bulk-operations` below. |
|
19 |
||
20 |
||
21 |
**replicate_query**
|
|
22 |
||
23 |
Controls whether the originating SQL query will be included within each |
|
24 |
Statement message contained in the enclosing Transaction message. The |
|
25 |
default global value is FALSE which will not include the query in the |
|
26 |
messages. It can be controlled per session, as well. For example: |
|
27 |
||
28 |
.. code-block:: mysql |
|
29 |
||
30 |
drizzle> set @@replicate_query = 1; |
|
31 |
||
32 |
The stored query should be used as a guide only, and never executed |
|
33 |
on a slave to perform replication as this will lead to incorrect results. |
|
34 |
||
35 |
Message Definitions
|
|
36 |
###################
|
|
37 |
||
38 |
The GPB messages are defined in .proto files in the drizzled/message |
|
39 |
directory of the Drizzle source code. The primary definition file is |
|
40 |
transaction.proto. Messages defined in this file are related in the |
|
41 |
following ways:: |
|
42 |
||
43 |
||
44 |
------------------------------------------------------------------ |
|
45 |
| |
|
|
46 |
| Transaction message |
|
|
47 |
| |
|
|
48 |
| ----------------------------------------------------------- |
|
|
49 |
| | | |
|
|
50 |
| | TransactionContext message | |
|
|
51 |
| | | |
|
|
52 |
| ----------------------------------------------------------- |
|
|
53 |
| ----------------------------------------------------------- |
|
|
54 |
| | | |
|
|
55 |
| | Statement message 1 | |
|
|
56 |
| | | |
|
|
57 |
| ----------------------------------------------------------- |
|
|
58 |
| ----------------------------------------------------------- |
|
|
59 |
| | | |
|
|
60 |
| | Statement message 2 | |
|
|
61 |
| | | |
|
|
62 |
| ----------------------------------------------------------- |
|
|
63 |
| ... |
|
|
64 |
| ----------------------------------------------------------- |
|
|
65 |
| | | |
|
|
66 |
| | Statement message N | |
|
|
67 |
| | | |
|
|
68 |
| ----------------------------------------------------------- |
|
|
69 |
------------------------------------------------------------------ |
|
70 |
||
71 |
with each Statement message looking like so::
|
|
72 |
||
73 |
------------------------------------------------------------------
|
|
74 |
| |
|
|
75 |
| Statement message |
|
|
76 |
| |
|
|
77 |
| ----------------------------------------------------------- |
|
|
78 |
| | | |
|
|
79 |
| | Common information | |
|
|
80 |
| | | |
|
|
81 |
| | - Type of Statement (INSERT, DELETE, etc) | |
|
|
82 |
| | - Start Timestamp | |
|
|
83 |
| | - End Timestamp | |
|
|
84 |
| | - (OPTIONAL) Actual SQL query string | |
|
|
85 |
| | | |
|
|
86 |
| ----------------------------------------------------------- |
|
|
87 |
| ----------------------------------------------------------- |
|
|
88 |
| | | |
|
|
89 |
| | Statement subclass message 1 (see below) | |
|
|
90 |
| | | |
|
|
91 |
| ----------------------------------------------------------- |
|
|
92 |
| ... |
|
|
93 |
| ----------------------------------------------------------- |
|
|
94 |
| | | |
|
|
95 |
| | Statement subclass message N (see below) | |
|
|
96 |
| | | |
|
|
97 |
| ----------------------------------------------------------- |
|
|
98 |
------------------------------------------------------------------
|
|
99 |
||
100 |
The Transaction Message
|
|
101 |
#######################
|
|
102 |
||
103 |
The main "envelope" message which represents an atomic transaction |
|
104 |
which changed the state of a server is the Transaction message class. |
|
105 |
||
106 |
The Transaction message contains two pieces: |
|
107 |
||
108 |
#. A TransactionContext message containing information about the
|
|
109 |
transaction as a whole, such as the ID of the executing server, |
|
110 |
the start and end timestamp of the transaction, segmenting |
|
111 |
metadata and a unique identifier for the transaction. |
|
112 |
#. A vector of Statement messages representing the distinct SQL
|
|
113 |
statements which modified the state of the server. The Statement |
|
114 |
message is, itself, a generic envelope message containing a |
|
115 |
sub-message which describes the specific data modification which |
|
116 |
occurred on the server (such as, for instance, an INSERT statement). |
|
117 |
||
118 |
The Statement Message
|
|
119 |
#####################
|
|
120 |
||
121 |
The generic "envelope" message containing information common to each |
|
122 |
SQL statement executed against a server (such as a start and end timestamp |
|
123 |
and the type of the SQL statement) as well as a Statement subclass message |
|
124 |
describing the specific data modification event on the server. |
|
125 |
||
126 |
Each Statement message contains a type member which indicates how readers |
|
127 |
of the Statement should construct the inner Statement subclass representing |
|
128 |
a data change. |
|
129 |
||
130 |
Statements are recorded separately as sometimes individual statements |
|
131 |
have to be rolled back. |
|
132 |
||
133 |
||
134 |
.. _bulk-operations: |
|
135 |
||
136 |
How Bulk Operations Work
|
|
137 |
########################
|
|
138 |
||
139 |
Certain operations which change large volumes of data on a server |
|
140 |
present a specific set of problems for a transaction coordinator or |
|
141 |
replication service. If all operations must complete atomically on a |
|
142 |
publishing server before replicas are delivered the complete |
|
143 |
transactional unit: |
|
144 |
||
145 |
#. The publishing server could consume a large amount of memory
|
|
146 |
building an in-memory Transaction message containing all the |
|
147 |
operations contained in the entire transaction. |
|
148 |
#. A replica, or subscribing server, is wasting time waiting on the
|
|
149 |
eventual completion (commit) of the large transaction on the |
|
150 |
publishing server. It could be applying pieces of the large |
|
151 |
transaction in the meantime... |
|
152 |
||
153 |
In order to prevent the problems inherent in (1) and (2) above, Drizzle's |
|
154 |
replication system uses a mechanism which provides bulk change |
|
155 |
operations. |
|
156 |
||
157 |
A single transaction in the database can possibly be represented with |
|
158 |
multiple protobuf Transaction messages if the message grows too large. |
|
159 |
This can happen if you have a bulk transaction, or a single statement |
|
160 |
affecting a very large number of rows, or just a large transaction with |
|
161 |
many statements/changes. |
|
162 |
||
163 |
For the first two examples, it is likely that the Statement sub-message |
|
164 |
itself will get segmented, causing another Transaction message to be |
|
165 |
created to hold the rest of the Statement's row changes. In these cases, |
|
166 |
it is enough to look at the segment information stored in the Statement |
|
167 |
message (see example below). |
|
168 |
||
169 |
For the last example, the Statement sub-messages may or may not be |
|
170 |
segmented, but we could still need to split the individual Statements up into |
|
171 |
multiple Transaction messages to keep the Transaction message size from |
|
172 |
growing too large. In this case, the segment information in the Statement |
|
173 |
submessages is not helpful if the Statement isn't segmented. We need this |
|
174 |
information in the Transaction message itself. |
|
175 |
||
176 |
Segmenting a Single SQL Statement
|
|
177 |
*********************************
|
|
178 |
||
179 |
When a regular SQL statement modifies or inserts more rows than a |
|
180 |
certain threshold, Drizzle's replication services component will begin |
|
181 |
sending Transaction messages to replicas which contain a chunk |
|
182 |
(or "segment") of the data which has been changed on the publisher. |
|
183 |
||
184 |
When data is inserted, updated, or modified in the database, a |
|
185 |
header containing information about modified tables and fields is |
|
186 |
matched with one or more data segments which contain the actual |
|
187 |
values changed in the statement. |
|
188 |
||
189 |
It's easiest to understand this mechanism by following through a real-world |
|
190 |
scenario. |
|
191 |
||
192 |
Suppose the following table: |
|
193 |
||
194 |
.. code-block:: mysql |
|
195 |
||
196 |
CREATE TABLE test.person |
|
197 |
(
|
|
198 |
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY |
|
199 |
, first_name VARCHAR(50) |
|
200 |
, last_name VARCHAR(50) |
|
201 |
, is_active CHAR(1) NOT NULL DEFAULT 'Y' |
|
202 |
);
|
|
203 |
||
204 |
Also suppose that test.t1 contains 1 million records. |
|
205 |
||
206 |
Next, suppose a client issues the SQL statement: |
|
207 |
||
208 |
.. code-block:: mysql |
|
209 |
||
210 |
UPDATE test.person SET is_active = 'N'; |
|
211 |
||
212 |
It is clear that one million records could be updated by this statement |
|
213 |
(we say, "could be" since Drizzle does not actually update a record if |
|
214 |
the UPDATE would not change the existing record...). |
|
215 |
||
216 |
In order to prevent the publishing server from having to construct an |
|
217 |
enormous Transaction message, Drizzle's replication services component |
|
218 |
will do the following: |
|
219 |
||
220 |
#. Construct a Transaction message with a transaction context containing
|
|
221 |
information about the originating server, the transaction ID, and |
|
222 |
timestamp information. |
|
223 |
#. Construct an UpdateHeader message with information about the tables
|
|
224 |
and fields involved in the UPDATE statement. Push this UpdateHeader |
|
225 |
message onto the Transaction message's statement vector. |
|
226 |
#. Construct an UpdateData message. Set the *segment_id* member to 1. |
|
227 |
Set the *end_segment* member to true.
|
|
228 |
#. For every record updated in a storage engine, the ReplicationServices
|
|
229 |
component builds a new UpdateRecord message and appends this message |
|
230 |
to the aforementioned UpdateData message's record vector. |
|
231 |
#. After a certain threshold of records is reached, the
|
|
232 |
ReplicationServices component sets the current UpdateData message's |
|
233 |
*end_segment* member to false, and proceeds to send the Transaction
|
|
234 |
message to replicators. |
|
235 |
#. The ReplicationServices component then constructs a new Transaction
|
|
236 |
message and constructs a transaction context with the same |
|
237 |
transaction ID and server information. |
|
238 |
#. A new UpdateData message is created. The message's *segment_id* is |
|
239 |
set to N+1 and as new records are updated, new UpdateRecord messages |
|
240 |
are appended to the UpdateData message's record vector. |
|
241 |
#. While records are being updated, we repeat steps 5 through 7, with
|
|
242 |
only the final UpdateData message having its *end_segment* member set
|
|
243 |
to true. |
|
244 |
||
245 |
Segmenting a Transaction
|
|
246 |
************************
|
|
247 |
||
248 |
The Transaction protobuf message also contains *segment_id* member and a
|
|
249 |
*end_segment* member. These values are also set appropriately when a
|
|
250 |
Statement sub-message is segmented, as described above. |
|
251 |
||
252 |
These values are also set when a Transaction must be segmented along |
|
253 |
individual Statement boundaries (i.e., the Statement message itself |
|
254 |
is **not** segmented). In either case, it is enough to check the
|
|
255 |
*end_segment* and *segment_id* values of the Transaction message |
|
256 |
to determine if this is a multi-message transaction. |
|
257 |
||
258 |
Handling ROLLBACKs
|
|
259 |
##################
|
|
260 |
||
261 |
Both transactions and individual statements may be rolled back. |
|
262 |
||
263 |
When a transaction is rolled back, one of two things happen depending |
|
264 |
on whether the transaction is made up of either a single Transaction |
|
265 |
message, or if it is made up of multiple Transaction messages (e.g, bulk |
|
266 |
load). |
|
267 |
||
268 |
* For a transaction encapsulated entirely within a single Transaction
|
|
269 |
message, the entire message is simply discarded and not sent through |
|
270 |
the replication stream. |
|
271 |
* For a transaction which is made up of multiple messages, and at least
|
|
272 |
one message has already been sent through the replication stream, then |
|
273 |
the Transaction message will contain a Statement message with type = |
|
274 |
ROLLBACK. This signifies to rollback the entire transaction. |
|
275 |
||
276 |
A special Statement message type, ROLLBACK_STATEMENT, is used when |
|
277 |
we have a segmented Statement message (see above) and we need to tell the |
|
278 |
receiver to undo any changes made for this single statement, but not |
|
279 |
for the entire transaction. If the receiver cannot handle rolling back |
|
280 |
a single statement, then a message buffering strategy should be employed |
|
281 |
to guarantee that a statement was indeed applied successfully before |
|
282 |
executing on the receiver. |
|
283 |
||
284 |
.. _replication_streams: |
|
285 |
||
286 |
Replication Streams
|
|
287 |
###################
|
|
288 |
||
289 |
The Drizzle kernel handles delivering replication messages to plugins by |
|
290 |
maintaining a list of replication streams. A stream is represented as a |
|
291 |
registered *replicator* and *applier* pair. |
|
292 |
||
293 |
When a replication message is generated within the kernel, the replication |
|
294 |
services module of the kernel will send this message to each registered |
|
295 |
*replicator*. The *replicator* will then do something useful with it and |
|
296 |
send it to each *applier* with which it is associated.
|
|
297 |
||
298 |
Replicators
|
|
299 |
***********
|
|
300 |
||
301 |
A registered *replicator* is a plugin that implements the TransactionReplicator
|
|
302 |
API. Each replicator will be plugged into the kernel to receive the Google |
|
303 |
Protobuf messages that are generated as the database is changed. Ideally, |
|
304 |
each registered replicator will transform or modify the messages it receives |
|
305 |
to implement a specific behavior. For example, filtering by schema name. |
|
306 |
||
307 |
Each registered replicator should have a unique name. The default replicator, |
|
308 |
cleverly named **default_replicator**, does no transformation at all on the
|
|
309 |
replication messages. |
|
310 |
||
311 |
Appliers
|
|
312 |
********
|
|
313 |
||
314 |
A registered *applier* is a plugin that implements the TransactionApplier
|
|
315 |
API. Appliers are responsible for applying the replication messages that it |
|
316 |
will receive from a registered replicator. The word "apply" is used loosely |
|
317 |
here. An applier may do anything with the replication messages that provides |
|
318 |
useful behavior. For example, an applier may simply write the messages to a |
|
319 |
file on disk, or it may send the messages over the network to some other |
|
320 |
service to be processed. |
|
321 |
||
322 |
At the point of registration with the Drizzle kernel, each applier specifies |
|
323 |
the name of a registered replicator that it should be attached to in order to |
|
324 |
make the replication stream pair. |