1802.8.2
by Monty Taylor
Fixed comments in protocol doc. |
1 |
.. Drizzle Client & Protocol Library
|
2 |
||
3 |
.. Copyright (C) 2008 Eric Day (eday@oddments.org)
|
|
4 |
.. All rights reserved.
|
|
5 |
|
|
6 |
.. Use and distribution licensed under the BSD license. See
|
|
7 |
.. the COPYING.BSD file in the root source directory for full text.
|
|
1712.1.2
by Monty Taylor
Added missing file. |
8 |
|
9 |
Drizzle Protocol
|
|
1794.2.8
by Monty Taylor
Moved protocol doc into the docs. |
10 |
================
|
11 |
||
12 |
`This is currently a proposed draft as of November 29, 2008`
|
|
1712.1.2
by Monty Taylor
Added missing file. |
13 |
|
14 |
The Drizzle protocol works over TCP, UDP, and Unix Domain Sockets |
|
15 |
(UDS, also known as IPC sockets), although there are limitations when |
|
16 |
using UDP (this is discussed below). In the case of TCP and UDS, |
|
17 |
a connection is made, a command is sent, and a response loop is |
|
18 |
started. Socket communication ends when either side closes the |
|
19 |
connection or a QUIT command is issued. |
|
20 |
||
21 |
TCP and UDS communications will be full duplex. This means that as |
|
22 |
the client is sending a command, it is possible for the server to |
|
23 |
report an error before the sending of data completes. This allows |
|
24 |
the server to do preliminary checks (table exists, authentication, |
|
25 |
...) before a request is completely sent so the client may abort. This
|
|
26 |
will primarily be used for large requests (INSERTing large BLOBs). |
|
27 |
||
28 |
TCP and UDS communications will also allow for pipe-lining of requests |
|
29 |
and concurrent command execution. This means a client does not need |
|
30 |
to wait for a command to finish before a new command is sent. It is |
|
31 |
even possible a later command issued will complete and have a result |
|
32 |
before an earlier command. Result packets may be interleaved so a |
|
33 |
client issuing concurrent commands must be able to parse results |
|
34 |
concurrently. |
|
35 |
||
36 |
UDP sockets are supported to allow small, fast updates for |
|
37 |
applications such as statistical gathering. Since UDP does not |
|
38 |
guarantee delivery, this method should not be used for applications |
|
39 |
that require reliable transport. When using UDP, the authentication |
|
40 |
packet (if needed) and command packet are bundled into a single UDP |
|
41 |
packet and sent. This puts a limitation on the size of the request |
|
42 |
being made, and this limit can be different between network hosts. The |
|
43 |
absolute limit is 65,507 bytes (28 bytes used for IPv4 and UDP |
|
44 |
headers), but again, this can depend on the network hosts. Responses |
|
45 |
are optional when issuing UDP commands, and this preference is |
|
46 |
specified in the handshake packet. |
|
47 |
||
48 |
All sizes given throughout this document are in bytes. Byte order |
|
49 |
for all multi-byte binary objects such as lengths and mutli-byte |
|
50 |
bit-fields are packed little-endian. |
|
51 |
||
52 |
||
53 |
Packet Sequence Overview
|
|
54 |
------------------------
|
|
55 |
||
56 |
The sequence of packets for a simple connection and command that |
|
57 |
responds with an OK packet: |
|
58 |
||
59 |
C: Command |
|
60 |
S: OK |
|
61 |
||
62 |
The sequence of packets for a simple connection and query command |
|
63 |
with results: |
|
64 |
||
65 |
C: Command |
|
66 |
S: OK |
|
67 |
S: Fields (optional, multiple packets) |
|
68 |
S: Rows (multiple packets) |
|
69 |
S: EOF |
|
70 |
||
71 |
When authentication is required for a command, the server will ask |
|
72 |
for it. For example: |
|
73 |
||
74 |
C: Command |
|
75 |
S: Authentication Required |
|
76 |
C: Authentication Credentials |
|
77 |
S: OK |
|
78 |
S: Fields |
|
79 |
S: Rows |
|
80 |
S: EOF |
|
81 |
||
82 |
The server will use the most recent credential information when |
|
83 |
processing subsequent commands. |
|
84 |
||
85 |
If a client wishes to multiplex commands on a single connection, |
|
86 |
it can do so using the command identifiers. Here is an example of |
|
87 |
how the packets could be ordered, but this will largely depend on |
|
88 |
the servers ability to process the commands concurrently and the |
|
89 |
processing time for each command. |
|
90 |
||
91 |
C: Command (Command ID=1) |
|
92 |
C: Command (Command ID=2) |
|
93 |
S: OK (Command ID=2) |
|
94 |
S: Field (Command ID=2) |
|
95 |
S: OK (Command ID=1) |
|
96 |
S: Fields (Command ID=2) |
|
97 |
S: Rows (Command ID=2) |
|
98 |
S: EOF |
|
99 |
||
100 |
As you can see, the commands may be executed with results generated |
|
101 |
in any order, and the packet containing the results may be interleaved. |
|
102 |
||
103 |
||
104 |
Length Encoding
|
|
105 |
---------------
|
|
106 |
||
107 |
Some lengths used within the protocol packets are length encoded. This |
|
108 |
means the size of the length field will vary between 1 and 9 bytes, |
|
109 |
and is determined by the value of the first byte. |
|
110 |
||
111 |
0-252 - Actual length |
|
112 |
253 - NULL value (only applicable in row results) |
|
113 |
254 - Following 8 bytes contain actual length |
|
114 |
255 - Depends on context, usually signifies end |
|
115 |
||
116 |
||
117 |
Packets
|
|
118 |
-------
|
|
119 |
||
120 |
Packets consist of two layers. The first is meant to be small, |
|
121 |
simple, and have just enough information for fast router and proxy |
|
122 |
processing. It consists of a fixed-size part, along with a variable |
|
123 |
sized client id (explained later), a series of chunked data, followed |
|
124 |
by a checksum at the end. The chunked transfer encoding allows for |
|
125 |
not having to pre-compute the packet data length before sending, |
|
126 |
and support packets of any size. It also allows for a large packet |
|
127 |
to be aborted gracefully (without having to close the connection) |
|
128 |
in the event of an error. |
|
129 |
||
1794.2.8
by Monty Taylor
Moved protocol doc into the docs. |
130 |
+-------------------------------------------------------------------------+ |
131 |
+ 32 Bits +
|
|
132 |
+-------------------------------------------------------------------------+ |
|
133 |
||
134 |
+-----+----------------+----------------+---------------------------------+ |
|
135 |
| 0 | Magic | Protocol | Command ID |
|
|
136 |
+-----+----------------+----------------+---------------------------------+ |
|
137 |
||
138 |
+-----+---------------------------------+---------------------------------+ |
|
139 |
| 32 | Command / Result Code | Client ID Length |
|
|
140 |
+-----+---------------------------------+---------------------------------+ |
|
141 |
||
142 |
+-----+---------------------------------+---------------------------------+ |
|
143 |
| 64 | Client ID (optional, variable length) |
|
|
144 |
+-----+---------------------------------+---------------------------------+ |
|
145 |
||
146 |
+-----+---------------------------------+---------------------------------+ |
|
147 |
| 64+ | Chunk Length and Value Pairs (optional, variable length) |
|
|
148 |
+-----+---------------------------------+---------------------------------+ |
|
149 |
||
150 |
+-----+---------------------------------+---------------------------------+ |
|
151 |
+ 64+ | Chunk Length = 0 | |
|
|
152 |
+-----+---------------------------------+---------------------------------+ |
|
153 |
||
154 |
+-----+---------------------------------+---------------------------------+ |
|
155 |
| 80+ | Checksum |
|
|
156 |
+-----+---------------------------------+---------------------------------+ |
|
1712.1.2
by Monty Taylor
Added missing file. |
157 |
|
158 |
The first part of a packet is: |
|
159 |
||
160 |
1-byte Magic number, the value should be 0x44. |
|
161 |
||
162 |
1-byte Protocol version, currently 1. |
|
163 |
||
164 |
2-byte Command ID. This is a unique number among all other queries |
|
165 |
currently being executed on the connection. The client is |
|
166 |
responsible for choosing a unique number while generating a |
|
167 |
command packet, and all response packets associated with that |
|
168 |
command must have the same command ID. Once a command has been |
|
169 |
completed, the client may reuse the ID. |
|
170 |
||
171 |
2-byte Command/result code. For commands, this may be: |
|
172 |
||
1794.2.8
by Monty Taylor
Moved protocol doc into the docs. |
173 |
1 ECHO |
174 |
The entire packet is simply echoed back to the caller. |
|
175 |
2 SET |
|
176 |
Set protocol options. |
|
177 |
3 QUERY |
|
178 |
Execute query. |
|
179 |
4 QUERY_RO |
|
180 |
Same as QUERY, but hints that this is a read-only |
|
181 |
query. This is only useful for routers/proxies who may want |
|
182 |
to redirect the request to a read slave. |
|
1712.1.2
by Monty Taylor
Added missing file. |
183 |
|
184 |
Result codes may be: |
|
185 |
||
1794.2.8
by Monty Taylor
Moved protocol doc into the docs. |
186 |
1 OK |
187 |
Single packet success response. No data associated |
|
188 |
with the result besides parameters. |
|
189 |
2 ERROR |
|
190 |
Single packet error response. |
|
191 |
3 DATA |
|
192 |
Start of a multi-packet result set. |
|
193 |
3 DATA_END |
|
194 |
Mark the end of a series of data packets. This is |
|
195 |
useful so a low level router or proxy can know when a |
|
196 |
response is complete without inspecting the contents of |
|
197 |
the packets. |
|
1712.1.2
by Monty Taylor
Added missing file. |
198 |
|
199 |
2-byte Client ID length. |
|
1794.2.8
by Monty Taylor
Moved protocol doc into the docs. |
200 |
X-byte Client ID (length is value of client ID length). |
201 |
The client ID is there for the client and routers/proxies to use. The server |
|
202 |
treats this as opaque data, and will only preserve it to send |
|
203 |
in responses. This can be used as a sharding key, to keep |
|
204 |
state information in a proxy, or any other use. |
|
1712.1.2
by Monty Taylor
Added missing file. |
205 |
|
206 |
Next, zero or more chunks are given, terminated by a chunk length of |
|
207 |
0. Each chunk consist of a length and then that amount of data.
|
|
208 |
||
209 |
2-byte Chunk length |
|
210 |
X-byte Chunk (length is value of chunk length) |
|
211 |
||
212 |
After the the chunk length of 0 is given, a checksum value is given |
|
213 |
that was computed for the entire packet. |
|
214 |
||
215 |
4-byte Checksum |
|
216 |
||
217 |
The second layer of the protocol is encapsulated inside of the |
|
218 |
chunked encoding. This consists of zero or more packet parameters, |
|
219 |
an end of parameter marker, followed by an optional data set that is |
|
220 |
given until the end of a packet (or the end of all chunks). |
|
221 |
||
222 |
||
223 |
Packet Parameters
|
|
224 |
-----------------
|
|
225 |
||
226 |
Packet parameter names are defined in a global namespace, although |
|
227 |
not all parameters are relevant for all packet types. Parameters are |
|
228 |
enumerated, and the name is specified with a 1-byte value representing |
|
229 |
the enumerated name. Each packet parameter may have a value associated |
|
230 |
with it, and each parameter defines the size and how that value is |
|
231 |
given. The list of possible packet parameters are: |
|
232 |
||
233 |
0 END_OF_PARAMETERS - Marks the end of a parameter list. |
|
234 |
||
235 |
Parameters used for setting options: |
|
236 |
||
237 |
1 AUTH - 1-byte value with authentication mechanism |
|
238 |
to use. Possible values are: |
|
239 |
0 - None. |
|
240 |
1 - MD5 on user and password. |
|
241 |
2 - 3-way handshake. |
|
242 |
2 CHECKSUM - 1-byte value with preferred checksum |
|
243 |
type. Possible values are: |
|
244 |
0 - None. |
|
245 |
1 - CRC32 |
|
246 |
3 COMPRESSION - 1-byte value with preferred compression |
|
247 |
type. Possible values are: |
|
248 |
0 - None. |
|
249 |
1 - zlib. |
|
250 |
2 - bzip2. |
|
251 |
4 FIELD_ENCODING - 1-byte value with preferred field encoding |
|
252 |
type. Possible values are: |
|
253 |
0 - String. |
|
254 |
1 - Native. |
|
255 |
5 FIELD_INFO - 1-byte value to determine if field information |
|
256 |
should be sent. Possible values are: |
|
257 |
0 - None. |
|
258 |
1 - Send field info. |
|
259 |
||
260 |
(6-63 Reserved for future options that can be set) |
|
261 |
||
262 |
Parameters used in responses: |
|
263 |
||
264 |
64 STATUS - 4-byte bit field. |
|
265 |
65 NUM_ROWS_AFFECTED - Length-encoded count of rows affected. |
|
266 |
66 NUM_ROWS_SCANNED - Length-encoded count of rows scanned. |
|
267 |
67 NUM_WARNINGS - Length-encoded count of warnings encountered. |
|
268 |
68 INSERT_ID - Last insert ID. |
|
269 |
69 ERROR_CODE - 4-byte error code. |
|
270 |
70 ERROR_STRING - Length-encoded string. |
|
271 |
71 SQL_STATE - Length-encoded string. |
|
272 |
72 NUM_FIELDS - 4-byte integer. |
|
273 |
73 FIELD_START - No value, starts a new set of field parameters. |
|
274 |
74 FIELD_TYPE - 2-byte enumerated type. |
|
275 |
75 FIELD_LENGTH - Length-encoded value. |
|
276 |
76 FIELD_FLAGS - 4-byte bit-field. |
|
277 |
77 DB_NAME - Length-encoded string. |
|
278 |
78 TABLE_NAME - Length-encoded string. |
|
279 |
79 ORIG_TABLE_NAME - Length-encoded string. |
|
280 |
80 FIELD_NAME - Length-encoded string. |
|
281 |
81 ORIG_FIELD_NAME - Length-encoded string. |
|
282 |
82 DEFAULT_VALUE - Length-encoded string. |
|
283 |
||
284 |
(83-255 Reserved for future responses parameters) |
|
285 |
||
286 |
"Length-encoded string" means a length-encoded value, followed by a |
|
287 |
string of that length. |
|
288 |
||
289 |
||
290 |
Command
|
|
291 |
-------
|
|
292 |
||
293 |
Inside of the chunked data, command packets consist of zero or more |
|
294 |
parameters depending on which options are being set, followed by |
|
295 |
a end of parameter marker, and then all data until the end of the |
|
296 |
chunks are considered arguments for the command. For a QUERY, this |
|
297 |
will be the actual query to run. |
|
298 |
||
299 |
||
300 |
OK/ERROR
|
|
301 |
--------
|
|
302 |
||
303 |
The server responds with an OK or ERROR if no row data is given. A |
|
304 |
list of parameters may follow, and the marked with an end of parameter |
|
305 |
value. |
|
306 |
||
307 |
||
308 |
DATA
|
|
309 |
----
|
|
310 |
||
311 |
A data packet consists of a series of parameters, followed by the end |
|
312 |
of parameter, and then a series of length-encoded values holding field |
|
313 |
values. The NUM_FIELDS parameter must be given before any values, as |
|
314 |
this indicates when a start of a new row happens. The field values may |
|
315 |
either be in string format or native data type, depending on the value |
|
316 |
of FIELD_ENCODING. |
|
317 |
||
318 |
There may be multiple rows inside of a single DATA result packet. In |
|
319 |
the case of large result sets, the result should be split into multiple |
|
320 |
DATA packets since other concurrent commands on the connection will |
|
321 |
block if a single large packet is sent. By breaking resulting rows |
|
322 |
into multiple DATA packets, other commands are then allowed to send |
|
323 |
interleaved response packets. |