~drizzle-trunk/drizzle/development

Committer: Monty Taylor
Date: 2010-10-20 16:35:23 UTC
mfrom: (1862.1.4 ld-version-script)
Revision ID: mordred@inaugust.com-20101020163523-u7ibkag4we6yfrik

Added some replication docs.
Fixed MySQL 5.0 foreign key support in drizzledump.
Fixed the build so that haildb can live in the tree normally.

files added:
config/drizzle.ver

docs/replication.rst

files removed:
plugin/haildb/test-suite-dir

plugin/haildb/test-suite-dir/haildb

files renamed:
plugin/haildb/test-suite-dir/haildb/tests/ => plugin/haildb/tests/

files modified:
Makefile.am

client/drizzledump_mysql.cc

config/pandora-plugin

configure.ac

docs/index.rst

plugin/haildb/plugin.ini

plugin/innobase/plugin.ini

plugin/transaction_log/docs/index.rst

tests/valgrind.supp

Show diffs side-by-side

added added

removed removed

docs/replication.rst

Drizzle Replication

===================

Replication events are recorded using messages in the `Google Protocol Buffer

<http://code.google.com/p/protobuf/>`_ (GPB) format. GPB messages can contain

sub-messages. There is a single main "envelope" message, Transaction, that

is passed to plugins that subscribe to the replication stream.

Message Definitions

-------------------

The GPB messages are defined in .proto files in the drizzled/message

directory of the Drizzle source code. The primary definition file is

transaction.proto. Messages defined in this file are related in the

following ways::

------------------------------------------------------------------

| |

| Transaction message |

| |

| ----------------------------------------------------------- |

| | | |

| | TransactionContext message | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement message 1 | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement message 2 | |

| | | |

| ----------------------------------------------------------- |

| ... |

| ----------------------------------------------------------- |

| | | |

| | Statement message N | |

| | | |

| ----------------------------------------------------------- |

------------------------------------------------------------------

with each Statement message looking like so::

------------------------------------------------------------------

| |

| Statement message |

| |

| ----------------------------------------------------------- |

| | | |

| | Common information | |

| | | |

| | - Type of Statement (INSERT, DELETE, etc) | |

| | - Start Timestamp | |

| | - End Timestamp | |

| | - (OPTIONAL) Actual SQL query string | |

| | | |

| ----------------------------------------------------------- |

| | | |

| | Statement subclass message 1 (see below) | |

| | | |

| ----------------------------------------------------------- |

| ... |

| ----------------------------------------------------------- |

| | | |

| | Statement subclass message N (see below) | |

| | | |

| ----------------------------------------------------------- |

------------------------------------------------------------------

The Transaction Message

^^^^^^^^^^^^^^^^^^^^^^^

The main "envelope" message which represents an atomic transaction

which changed the state of a server is the Transaction message class.

The Transaction message contains two pieces:

#. A TransactionContext message containing information about the

transaction as a whole, such as the ID of the executing server,

the start and end timestamp of the transaction, and a globally-

unique identifier for the transaction.

#. A vector of Statement messages representing the distinct SQL

statements which modified the state of the server. The Statement

message is, itself, a generic envelope message containing a

sub-message which describes the specific data modification which

occurred on the server (such as, for instance, an INSERT statement.

The Statement Message

^^^^^^^^^^^^^^^^^^^^^

The generic "envelope" message containing information common to each

SQL statement executed against a server (such as a start and end timestamp

and the type of the SQL statement) as well as a Statement subclass message

describing the specific data modification event on the server.

100

101

Each Statement message contains a type member which indicates how readers

102

of the Statement should construct the inner Statement subclass representing

103

a data change.

104

105

106

How Bulk Operations Work

107

------------------------

108

109

Certain operations which change large volumes of data on a server

110

present a specific set of problems for a transaction coordinator or

111

replication service. If all operations must complete atomically on a

112

publishing server before replicas are delivered the complete

113

transactional unit:

114

115

#. The publishing server could consume a large amount of memory

116

building an in-memory Transaction message containing all the

117

operations contained in the entire transaction.

118

#. A replica, or subscribing server, is wasting time waiting on the

119

eventual completion (commit) of the large transaction on the

120

publishing server. It could be applying pieces of the large

121

transaction in the meantime...

122

123

In order to prevent the problems inherent in (1) and (2) above, Drizzle's

124

replication system uses a mechanism which provides bulk change

125

operations.

126

127

When a regular SQL statement modifies or inserts more rows than a

128

certain threshold, Drizzle's replication services component will begin

129

sending Transaction messages to replicas which contain a chunk

130

(or "segment") of the data which has been changed on the publisher.

131

132

When data is inserted, updated, or modified in the database, a

133

header containing information about modified tables and fields is

134

matched with one or more data segments which contain the actual

135

values changed in the statement.

136

137

It's easiest to understand this mechanism by following through a real-world

138

scenario.

139

140

Suppose the following table::

141

142

CREATE TABLE test.person

143

(

144

id INT NOT NULL AUTO_INCREMENT PRIMARY KEY

145

, first_name VARCHAR(50)

146

, last_name VARCHAR(50)

147

, is_active CHAR(1) NOT NULL DEFAULT 'Y'

148

);

149

150

Also suppose that test.t1 contains 1 million records.

151

152

Next, suppose a client issues the SQL statement::

153

154

UPDATE test.person SET is_active = 'N';

155

156

It is clear that one million records could be updated by this statement

157

(we say, "could be" since Drizzle does not actually update a record if

158

the UPDATE would not change the existing record...).

159

160

In order to prevent the publishing server from having to construct an

161

enormous Transaction message, Drizzle's replication services component

162

will do the following:

163

164

#. Construct a Transaction message with a transaction context containing

165

information about the originating server, the transaction ID, and

166

timestamp information.

167

#. Construct an UpdateHeader message with information about the tables

168

and fields involved in the UPDATE statement. Push this UpdateHeader

169

message onto the Transaction message's statement vector.

170

#. Construct an UpdateData message. Set the segment_id member to 1.

171

Set the end_segment member to true.

172

#. For every record updated in a storage engine, the ReplicationServices

173

component builds a new UpdateRecord message and appends this message

174

to the aforementioned UpdateData message's record vector.

175

#. After a certain threshold of records is reached, the

176

ReplicationServices component sets the current UpdateData message's

177

end_segment member to false, and proceeds to send the Transaction

178

message to replicators.

179

#. The ReplicationServices component then constructs a new Transaction

180

message and constructs a transaction context with the same

181

transaction ID and server information.

182

#. A new UpdateData message is created. The message's segment_id is

183

set to N+1 and as new records are updated, new UpdateRecord messages

184

are appended to the UpdateData message's record vector.

185

#. While records are being updated, we repeat steps 5 through 7, with

186

only the final UpdateData message having its end_segment member set

187

to true.

188

189

Handling ROLLBACKs

190

------------------

191

192

When a transaction is rolled back, one of two things happen depending

193

on whether the transaction is made up of either a single Transaction

194

message, or if it is made up of multiple Transaction messages (e.g, bulk

195

load).

196

197

* For a transaction encapsulated entirely within a single Transaction

198

message, the entire message is simply discarded and not sent through

199

the replication stream.

200

* For a transaction which is made up of multiple messages, and at least

201

one message has already been sent through the replication stream, then

202

the Transaction message will contain a Statement message with type =

203

ROLLBACK.

b'\\ No newline at end of file'

Older »