~drizzle-trunk/drizzle/development

Viewing changes to docs/replication/drizzle.rst

Committer: Daniel Nichter
Date: 2011-10-23 05:45:09 UTC
mto: This revision was merged to the branch mainline in revision 2448.
Revision ID: daniel@percona.com-20111023054509-5w1z0g4hn1c4guqv

A lot of doc changes: rewrite and expand Configuration and Administration, re-order top-level sections, enhance Contributing, add Release Notes, add Help and Support, fix title casing, label all plugins, other misc. enhancements.

files added:
docs/administration/drizzled.rst

docs/administration/plugins.rst

docs/configuration/index.rst

docs/contributing/getting_started.rst

docs/contributing/more_ways.rst

docs/help.rst

docs/release_notes

docs/release_notes/drizzle-7.0.rst

docs/replication

docs/replication/drizzle.rst

files removed:
docs/clients/errors.rst

docs/contributing/introduction.rst

docs/replication.rst

files modified:
docs/administration/authorization.rst

docs/brief_history_of_drizzle.rst

docs/clients/drizzle.rst

docs/clients/drizzledump.rst

docs/configuration/drizzled.rst

docs/configuration/options.rst

docs/configuration/values.rst

docs/configuration/variables.rst

docs/contributing/code.rst

docs/contributing/documentation.rst

docs/how_to_report_a_bug.rst

docs/index.rst

docs/installing/from_source.rst

docs/installing/redhat.rst

docs/installing/requirements.rst

docs/installing/ubuntu.rst

docs/versioning.rst

plugin/ascii/docs/index.rst

plugin/auth_all/docs/index.rst

plugin/auth_file/docs/index.rst

plugin/auth_http/docs/index.rst

plugin/auth_ldap/docs/index.rst

plugin/auth_pam/docs/index.rst

plugin/benchmark/docs/index.rst

plugin/catalog/docs/index.rst

plugin/charlength/docs/index.rst

plugin/coercibility_function/docs/index.rst

plugin/collation_dictionary/docs/index.rst

plugin/compression/docs/index.rst

plugin/connection_id/docs/index.rst

plugin/crc32/docs/index.rst

plugin/debug/docs/index.rst

plugin/default_replicator/docs/index.rst

plugin/drizzle_protocol/docs/index.rst

plugin/errmsg_stderr/docs/index.rst

plugin/error_dictionary/docs/index.rst

plugin/filtered_replicator/docs/index.rst

plugin/function_dictionary/docs/index.rst

plugin/function_engine/docs/index.rst

plugin/hex_functions/docs/index.rst

plugin/http_functions/docs/index.rst

plugin/information_schema_dictionary/docs/index.rst

plugin/innobase/docs/index.rst

plugin/json_server/docs/index.rst

plugin/length/docs/index.rst

plugin/logging_gearman/docs/index.rst

plugin/logging_query/docs/index.rst

plugin/logging_stats/docs/index.rst

plugin/math_functions/docs/index.rst

plugin/md5/docs/index.rst

plugin/memcached_functions/docs/index.rst

plugin/memcached_stats/docs/index.rst

plugin/memory/docs/index.rst

plugin/multi_thread/docs/index.rst

plugin/myisam/docs/index.rst

plugin/mysql_protocol/docs/index.rst

plugin/mysql_unix_socket_protocol/docs/index.rst

plugin/pbms/docs/index.rst

plugin/performance_dictionary/docs/index.rst

plugin/protocol_dictionary/docs/index.rst

plugin/query_log/docs/index.rst

plugin/rabbitmq/docs/index.rst

plugin/rand_function/docs/index.rst

plugin/regex_policy/docs/index.rst

plugin/registry_dictionary/docs/index.rst

plugin/replication_dictionary/docs/index.rst

plugin/reverse_function/docs/index.rst

plugin/schema_dictionary/docs/index.rst

plugin/schema_engine/docs/index.rst

plugin/session_dictionary/docs/index.rst

plugin/show_dictionary/docs/index.rst

plugin/show_schema_proto/docs/index.rst

plugin/shutdown_function/docs/index.rst

plugin/signal_handler/docs/index.rst

plugin/simple_user_policy/docs/index.rst

plugin/slave/docs/index.rst

plugin/sleep/docs/index.rst

plugin/status_dictionary/docs/index.rst

plugin/string_functions/docs/index.rst

plugin/substr_functions/docs/index.rst

plugin/syslog/docs/index.rst

plugin/table_cache_dictionary/docs/index.rst

plugin/transaction_log/docs/index.rst

plugin/trigger_dictionary/docs/index.rst

plugin/user_locks/docs/index.rst

plugin/utility_dictionary/docs/index.rst

plugin/utility_functions/docs/index.rst

plugin/uuid_function/docs/index.rst

plugin/version/docs/index.rst

plugin/zeromq/docs/index.rst

Show diffs side-by-side

added added

removed removed

docs/replication/drizzle.rst

*******************

Drizzle Replication

*******************

Replication events are recorded using messages in the `Google Protocol Buffer

<http://code.google.com/p/protobuf/>`_ (GPB) format. GPB messages can contain

sub-messages. There is a single main "envelope" message, Transaction, that

is passed to plugins that subscribe to the replication stream.

Configuration Options

#####################

**transaction_message_threshold**

Controls the size, in bytes, of the Transaction messages. When a Transaction

message exceeds this size, a new Transaction message with the same

transaction ID will be created to continue the replication events.

See :ref:`bulk-operations` below.

**replicate_query**

Controls whether the originating SQL query will be included within each

Statement message contained in the enclosing Transaction message. The

default global value is FALSE which will not include the query in the

messages. It can be controlled per session, as well. For example:

.. code-block:: mysql

drizzle> set @@replicate_query = 1;

The stored query should be used as a guide only, and never executed

on a slave to perform replication as this will lead to incorrect results.

Message Definitions

###################

The GPB messages are defined in .proto files in the drizzled/message

directory of the Drizzle source code. The primary definition file is

transaction.proto. Messages defined in this file are related in the

following ways::

------------------------------------------------------------------

| |

| Transaction message |

| |

| ----------------------------------------------------------- |

| | | |

| | TransactionContext message | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement message 1 | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement message 2 | |

| | | |

| ----------------------------------------------------------- |

| ... |

| ----------------------------------------------------------- |

| | | |

| | Statement message N | |

| | | |

| ----------------------------------------------------------- |

------------------------------------------------------------------

with each Statement message looking like so::

------------------------------------------------------------------

| |

| Statement message |

| |

| ----------------------------------------------------------- |

| | | |

| | Common information | |

| | | |

| | - Type of Statement (INSERT, DELETE, etc) | |

| | - Start Timestamp | |

| | - End Timestamp | |

| | - (OPTIONAL) Actual SQL query string | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement subclass message 1 (see below) | |

| | | |

| ----------------------------------------------------------- |

| ... |

| ----------------------------------------------------------- |

| | | |

| | Statement subclass message N (see below) | |

| | | |

| ----------------------------------------------------------- |

------------------------------------------------------------------

100

The Transaction Message

101

#######################

102

103

The main "envelope" message which represents an atomic transaction

104

which changed the state of a server is the Transaction message class.

105

106

The Transaction message contains two pieces:

107

108

#. A TransactionContext message containing information about the

109

transaction as a whole, such as the ID of the executing server,

110

the start and end timestamp of the transaction, segmenting

111

metadata and a unique identifier for the transaction.

112

#. A vector of Statement messages representing the distinct SQL

113

statements which modified the state of the server. The Statement

114

message is, itself, a generic envelope message containing a

115

sub-message which describes the specific data modification which

116

occurred on the server (such as, for instance, an INSERT statement).

117

118

The Statement Message

119

#####################

120

121

The generic "envelope" message containing information common to each

122

SQL statement executed against a server (such as a start and end timestamp

123

and the type of the SQL statement) as well as a Statement subclass message

124

describing the specific data modification event on the server.

125

126

Each Statement message contains a type member which indicates how readers

127

of the Statement should construct the inner Statement subclass representing

128

a data change.

129

130

Statements are recorded separately as sometimes individual statements

131

have to be rolled back.

132

133

134

.. _bulk-operations:

135

136

How Bulk Operations Work

137

########################

138

139

Certain operations which change large volumes of data on a server

140

present a specific set of problems for a transaction coordinator or

141

replication service. If all operations must complete atomically on a

142

publishing server before replicas are delivered the complete

143

transactional unit:

144

145

#. The publishing server could consume a large amount of memory

146

building an in-memory Transaction message containing all the

147

operations contained in the entire transaction.

148

#. A replica, or subscribing server, is wasting time waiting on the

149

eventual completion (commit) of the large transaction on the

150

publishing server. It could be applying pieces of the large

151

transaction in the meantime...

152

153

In order to prevent the problems inherent in (1) and (2) above, Drizzle's

154

replication system uses a mechanism which provides bulk change

155

operations.

156

157

A single transaction in the database can possibly be represented with

158

multiple protobuf Transaction messages if the message grows too large.

159

This can happen if you have a bulk transaction, or a single statement

160

affecting a very large number of rows, or just a large transaction with

161

many statements/changes.

162

163

For the first two examples, it is likely that the Statement sub-message

164

itself will get segmented, causing another Transaction message to be

165

created to hold the rest of the Statement's row changes. In these cases,

166

it is enough to look at the segment information stored in the Statement

167

message (see example below).

168

169

For the last example, the Statement sub-messages may or may not be

170

segmented, but we could still need to split the individual Statements up into

171

multiple Transaction messages to keep the Transaction message size from

172

growing too large. In this case, the segment information in the Statement

173

submessages is not helpful if the Statement isn't segmented. We need this

174

information in the Transaction message itself.

175

176

Segmenting a Single SQL Statement

177

*********************************

178

179

When a regular SQL statement modifies or inserts more rows than a

180

certain threshold, Drizzle's replication services component will begin

181

sending Transaction messages to replicas which contain a chunk

182

(or "segment") of the data which has been changed on the publisher.

183

184

When data is inserted, updated, or modified in the database, a

185

header containing information about modified tables and fields is

186

matched with one or more data segments which contain the actual

187

values changed in the statement.

188

189

It's easiest to understand this mechanism by following through a real-world

190

scenario.

191

192

Suppose the following table:

193

194

.. code-block:: mysql

195

196

CREATE TABLE test.person

197

(

198

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY

199

, first_name VARCHAR(50)

200

, last_name VARCHAR(50)

201

, is_active CHAR(1) NOT NULL DEFAULT 'Y'

202

);

203

204

Also suppose that test.t1 contains 1 million records.

205

206

Next, suppose a client issues the SQL statement:

207

208

.. code-block:: mysql

209

210

UPDATE test.person SET is_active = 'N';

211

212

It is clear that one million records could be updated by this statement

213

(we say, "could be" since Drizzle does not actually update a record if

214

the UPDATE would not change the existing record...).

215

216

In order to prevent the publishing server from having to construct an

217

enormous Transaction message, Drizzle's replication services component

218

will do the following:

219

220

#. Construct a Transaction message with a transaction context containing

221

information about the originating server, the transaction ID, and

222

timestamp information.

223

#. Construct an UpdateHeader message with information about the tables

224

and fields involved in the UPDATE statement. Push this UpdateHeader

225

message onto the Transaction message's statement vector.

226

#. Construct an UpdateData message. Set the *segment_id* member to 1.

227

Set the *end_segment* member to true.

228

#. For every record updated in a storage engine, the ReplicationServices

229

component builds a new UpdateRecord message and appends this message

230

to the aforementioned UpdateData message's record vector.

231

#. After a certain threshold of records is reached, the

232

ReplicationServices component sets the current UpdateData message's

233

*end_segment* member to false, and proceeds to send the Transaction

234

message to replicators.

235

#. The ReplicationServices component then constructs a new Transaction

236

message and constructs a transaction context with the same

237

transaction ID and server information.

238

#. A new UpdateData message is created. The message's *segment_id* is

239

set to N+1 and as new records are updated, new UpdateRecord messages

240

are appended to the UpdateData message's record vector.

241

#. While records are being updated, we repeat steps 5 through 7, with

242

only the final UpdateData message having its *end_segment* member set

243

to true.

244

245

Segmenting a Transaction

246

************************

247

248

The Transaction protobuf message also contains *segment_id* member and a

249

*end_segment* member. These values are also set appropriately when a

250

Statement sub-message is segmented, as described above.

251

252

These values are also set when a Transaction must be segmented along

253

individual Statement boundaries (i.e., the Statement message itself

254

is **not** segmented). In either case, it is enough to check the

255

*end_segment* and *segment_id* values of the Transaction message

256

to determine if this is a multi-message transaction.

257

258

Handling ROLLBACKs

259

##################

260

261

Both transactions and individual statements may be rolled back.

262

263

When a transaction is rolled back, one of two things happen depending

264

on whether the transaction is made up of either a single Transaction

265

message, or if it is made up of multiple Transaction messages (e.g, bulk

266

load).

267

268

* For a transaction encapsulated entirely within a single Transaction

269

message, the entire message is simply discarded and not sent through

270

the replication stream.

271

* For a transaction which is made up of multiple messages, and at least

272

one message has already been sent through the replication stream, then

273

the Transaction message will contain a Statement message with type =

274

ROLLBACK. This signifies to rollback the entire transaction.

275

276

A special Statement message type, ROLLBACK_STATEMENT, is used when

277

we have a segmented Statement message (see above) and we need to tell the

278

receiver to undo any changes made for this single statement, but not

279

for the entire transaction. If the receiver cannot handle rolling back

280

a single statement, then a message buffering strategy should be employed

281

to guarantee that a statement was indeed applied successfully before

282

executing on the receiver.

283

284

.. _replication_streams:

285

286

Replication Streams

287

###################

288

289

The Drizzle kernel handles delivering replication messages to plugins by

290

maintaining a list of replication streams. A stream is represented as a

291

registered *replicator* and *applier* pair.

292

293

When a replication message is generated within the kernel, the replication

294

services module of the kernel will send this message to each registered

295

*replicator*. The *replicator* will then do something useful with it and

296

send it to each *applier* with which it is associated.

297

298

Replicators

299

***********

300

301

A registered *replicator* is a plugin that implements the TransactionReplicator

302

API. Each replicator will be plugged into the kernel to receive the Google

303

Protobuf messages that are generated as the database is changed. Ideally,

304

each registered replicator will transform or modify the messages it receives

305

to implement a specific behavior. For example, filtering by schema name.

306

307

Each registered replicator should have a unique name. The default replicator,

308

cleverly named **default_replicator**, does no transformation at all on the

309

replication messages.

310

311

Appliers

312

********

313

314

A registered *applier* is a plugin that implements the TransactionApplier

315

API. Appliers are responsible for applying the replication messages that it

316

will receive from a registered replicator. The word "apply" is used loosely

317

here. An applier may do anything with the replication messages that provides

318

useful behavior. For example, an applier may simply write the messages to a

319

file on disk, or it may send the messages over the network to some other

320

service to be processed.

321

322

At the point of registration with the Drizzle kernel, each applier specifies

323

the name of a registered replicator that it should be attached to in order to

324

make the replication stream pair.

Older »