1
/* -*- mode: c++; c-basic-offset: 2; indent-tabs-mode: nil; -*-
2
* vim:expandtab:shiftwidth=2:tabstop=2:smarttab:
4
* Copyright (C) 2008-2009 Sun Microsystems, Inc.
6
* This program is free software; you can redistribute it and/or modify
7
* it under the terms of the GNU General Public License as published by
8
* the Free Software Foundation; version 2 of the License.
10
* This program is distributed in the hope that it will be useful,
11
* but WITHOUT ANY WARRANTY; without even the implied warranty of
12
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
* GNU General Public License for more details.
15
* You should have received a copy of the GNU General Public License
16
* along with this program; if not, write to the Free Software
17
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
20
#ifndef DRIZZLED_OPTIMIZER_SEL_ARG_H
21
#define DRIZZLED_OPTIMIZER_SEL_ARG_H
32
A construction block of the SEL_ARG-graph.
34
The following description only covers graphs of SEL_ARG objects with
35
sel_arg->type==KEY_RANGE:
37
One SEL_ARG object represents an "elementary interval" in form
39
min_value <=? table.keypartX <=? max_value
41
The interval is a non-empty interval of any kind: with[out] minimum/maximum
42
bound, [half]open/closed, single-point interval, etc.
44
1. SEL_ARG GRAPH STRUCTURE
46
SEL_ARG objects are linked together in a graph. The meaning of the graph
47
is better demostrated by an example:
52
| part=1 $ part=2 $ part=3
54
| +-------+ $ +-------+ $ +--------+
55
| | kp1<1 |--$-->| kp2=5 |--$-->| kp3=10 |
56
| +-------+ $ +-------+ $ +--------+
62
\->| kp1=2 |--$--------------$-+
63
+-------+ $ $ | +--------+
65
+-------+ $ $ | +--------+
66
| kp1=3 |--$--------------$-+ |
67
+-------+ $ $ +--------+
71
The entire graph is partitioned into "interval lists".
73
An interval list is a sequence of ordered disjoint intervals over the same
74
key part. SEL_ARG are linked via "next" and "prev" pointers. Additionally,
75
all intervals in the list form an RB-tree, linked via left/right/parent
76
pointers. The RB-tree root SEL_ARG object will be further called "root of the
79
In the example pic, there are 4 interval lists:
80
"kp<1 OR kp1=2 OR kp1=3", "kp2=5", "kp3=10 OR kp3=12", "kp3=11 OR kp3=13".
81
The vertical lines represent SEL_ARG::next/prev pointers.
83
In an interval list, each member X may have SEL_ARG::next_key_part pointer
84
pointing to the root of another interval list Y. The pointed interval list
85
must cover a key part with greater number (i.e. Y->part > X->part).
87
In the example pic, the next_key_part pointers are represented by
90
2. SEL_ARG GRAPH SEMANTICS
92
It represents a condition in a special form (we don't have a name for it ATM)
93
The SEL_ARG::next/prev is "OR", and next_key_part is "AND".
95
For example, the picture represents the condition in form:
96
(kp1 < 1 AND kp2=5 AND (kp3=10 OR kp3=12)) OR
97
(kp1=2 AND (kp3=11 OR kp3=14)) OR
98
(kp1=3 AND (kp3=11 OR kp3=14))
103
Use get_mm_tree() to construct SEL_ARG graph from WHERE condition.
104
Then walk the SEL_ARG graph and get a list of dijsoint ordered key
105
intervals (i.e. intervals in form
107
(constA1, .., const1_K) < (keypart1,.., keypartK) < (constB1, .., constB_K)
109
Those intervals can be used to access the index. The uses are in:
110
- check_quick_select() - Walk the SEL_ARG graph and find an estimate of
111
how many table records are contained within all
113
- get_quick_select() - Walk the SEL_ARG, materialize the key intervals,
114
and create QuickRangeSelect object that will
115
read records within these intervals.
117
4. SPACE COMPLEXITY NOTES
119
SEL_ARG graph is a representation of an ordered disjoint sequence of
120
intervals over the ordered set of index tuple values.
122
For multi-part keys, one can construct a WHERE expression such that its
123
list of intervals will be of combinatorial size. Here is an example:
125
(keypart1 IN (1,2, ..., n1)) AND
126
(keypart2 IN (1,2, ..., n2)) AND
127
(keypart3 IN (1,2, ..., n3))
129
For this WHERE clause the list of intervals will have n1*n2*n3 intervals
132
(keypart1, keypart2, keypart3) = (k1, k2, k3), where 1 <= k{i} <= n{i}
134
SEL_ARG graph structure aims to reduce the amount of required space by
135
"sharing" the elementary intervals when possible (the pic at the
136
beginning of this comment has examples of such sharing). The sharing may
137
prevent combinatorial blowup:
139
There are WHERE clauses that have combinatorial-size interval lists but
140
will be represented by a compact SEL_ARG graph.
142
(keypartN IN (1,2, ..., n1)) AND
144
(keypart2 IN (1,2, ..., n2)) AND
145
(keypart1 IN (1,2, ..., n3))
147
but not in all cases:
149
- There are WHERE clauses that do have a compact SEL_ARG-graph
150
representation but get_mm_tree() and its callees will construct a
151
graph of combinatorial size.
153
(keypart1 IN (1,2, ..., n1)) AND
154
(keypart2 IN (1,2, ..., n2)) AND
156
(keypartN IN (1,2, ..., n3))
158
- There are WHERE clauses for which the minimal possible SEL_ARG graph
159
representation will have combinatorial size.
161
By induction: Let's take any interval on some keypart in the middle:
165
Then let's AND it with this interval 'structure' from preceding and
168
(kp14=c1 AND kp16=c3) OR keypart14=c2) (*)
170
We will obtain this SEL_ARG graph:
174
+---------+ $ +---------+ $ +---------+
175
| kp14=c1 |--$-->| kp15=c0 |--$-->| kp16=c3 |
176
+---------+ $ +---------+ $ +---------+
178
+---------+ $ +---------+ $
179
| kp14=c2 |--$-->| kp15=c0 | $
180
+---------+ $ +---------+ $
183
Note that we had to duplicate "kp15=c0" and there was no way to avoid
185
The induction step: AND the obtained expression with another "wrapping"
187
When the process ends because of the limit on max. number of keyparts
190
WHERE clause length is O(3*#max_keyparts)
191
SEL_ARG graph size is O(2^(#max_keyparts/2))
193
(it is also possible to construct a case where instead of 2 in 2^n we
194
have a bigger constant, e.g. 4, and get a graph with 4^(31/2)= 2^31
197
We avoid consuming too much memory by setting a limit on the number of
198
SEL_ARG object we can construct during one range analysis invocation.
201
class SEL_ARG :public memory::SqlAlloc
204
uint8_t min_flag,max_flag,maybe_flag;
205
uint8_t part; // Which key part
208
Number of children of this element in the RB-tree, plus 1 for this
213
Valid only for elements which are RB-tree roots: Number of times this
214
RB-tree is referred to (it is referred by SEL_ARG::next_key_part or by
215
SEL_TREE::keys[i] or by a temporary SEL_ARG* variable)
220
unsigned char *min_value,*max_value; // Pointer to range
223
eq_tree() requires that left == right == 0 if the type is MAYBE_KEY.
225
SEL_ARG *left,*right; /* R-B tree children */
226
SEL_ARG *next,*prev; /* Links for bi-directional interval list */
227
SEL_ARG *parent; /* R-B tree parent */
228
SEL_ARG *next_key_part;
229
enum leaf_color { BLACK,RED } color;
230
enum Type { IMPOSSIBLE, MAYBE, MAYBE_KEY, KEY_RANGE } type;
241
SEL_ARG(Field *,const unsigned char *, const unsigned char *);
243
SEL_ARG(Field *field,
245
unsigned char *min_value,
246
unsigned char *max_value,
251
SEL_ARG(enum Type type_arg)
263
inline bool is_same(SEL_ARG *arg)
265
if (type != arg->type || part != arg->part)
267
if (type != KEY_RANGE)
269
return (cmp_min_to_min(arg) == 0 && cmp_max_to_max(arg) == 0);
272
inline void merge_flags(SEL_ARG *arg)
274
maybe_flag|= arg->maybe_flag;
277
inline void maybe_smaller()
282
/* Return true iff it's a single-point null interval */
283
inline bool is_null_interval()
285
return (maybe_null && max_value[0] == 1);
288
inline int cmp_min_to_min(SEL_ARG *arg)
290
return sel_cmp(field,min_value, arg->min_value, min_flag, arg->min_flag);
293
inline int cmp_min_to_max(SEL_ARG *arg)
295
return sel_cmp(field,min_value, arg->max_value, min_flag, arg->max_flag);
298
inline int cmp_max_to_max(SEL_ARG *arg)
300
return sel_cmp(field,max_value, arg->max_value, max_flag, arg->max_flag);
303
inline int cmp_max_to_min(SEL_ARG *arg)
305
return sel_cmp(field,max_value, arg->min_value, max_flag, arg->min_flag);
308
SEL_ARG *clone_and(SEL_ARG *arg);
310
SEL_ARG *clone_first(SEL_ARG *arg);
312
SEL_ARG *clone_last(SEL_ARG *arg);
314
SEL_ARG *clone(RangeParameter *param, SEL_ARG *new_parent, SEL_ARG **next);
316
bool copy_min(SEL_ARG *arg);
318
bool copy_max(SEL_ARG *arg);
320
void copy_min_to_min(SEL_ARG *arg);
322
void copy_min_to_max(SEL_ARG *arg);
324
void copy_max_to_min(SEL_ARG *arg);
326
/* returns a number of keypart values (0 or 1) appended to the key buffer */
327
int store_min(uint32_t length, unsigned char **min_key, uint32_t min_key_flag);
329
/* returns a number of keypart values (0 or 1) appended to the key buffer */
330
int store_max(uint32_t length, unsigned char **max_key, uint32_t max_key_flag);
332
/* returns a number of keypart values appended to the key buffer */
333
int store_min_key(KEY_PART *key, unsigned char **range_key, uint32_t *range_key_flag);
335
/* returns a number of keypart values appended to the key buffer */
336
int store_max_key(KEY_PART *key, unsigned char **range_key, uint32_t *range_key_flag);
338
SEL_ARG *insert(SEL_ARG *key);
339
SEL_ARG *tree_delete(SEL_ARG *key);
340
SEL_ARG *find_range(SEL_ARG *key);
341
SEL_ARG *rb_insert(SEL_ARG *leaf);
343
friend SEL_ARG *rb_delete_fixup(SEL_ARG *root,SEL_ARG *key, SEL_ARG *par);
351
inline bool simple_key()
353
return (! next_key_part && elements == 1);
356
void increment_use_count(long count)
360
next_key_part->use_count+= count;
361
count*= (next_key_part->use_count - count);
362
for (SEL_ARG *pos= next_key_part->first(); pos; pos= pos->next)
363
if (pos->next_key_part)
364
pos->increment_use_count(count);
370
for (SEL_ARG *pos= first(); pos; pos= pos->next)
371
if (pos->next_key_part)
373
pos->next_key_part->use_count--;
374
pos->next_key_part->free_tree();
378
inline SEL_ARG **parent_ptr()
380
return parent->left == this ? &parent->left : &parent->right;
385
Check if this SEL_ARG object represents a single-point interval
391
Check if this SEL_ARG object (not tree) represents a single-point
392
interval, i.e. if it represents a "keypart = const" or
396
true This SEL_ARG object represents a singlepoint interval
400
bool is_singlepoint()
403
Check for NEAR_MIN ("strictly less") and NO_MIN_RANGE (-inf < field)
404
flags, and the same for right edge.
406
if (min_flag || max_flag)
408
unsigned char *min_val= min_value;
409
unsigned char *max_val= max_value;
413
/* First byte is a NULL value indicator */
414
if (*min_val != *max_val)
418
return true; /* This "x IS NULL" */
422
return ! field->key_cmp(min_val, max_val);
425
SEL_ARG *clone_tree(RangeParameter *param);
430
Check if a compare is ok, when one takes ranges in account
431
Returns -2 or 2 if the ranges where 'joined' like < 2 and >= 2
433
int sel_cmp(Field *in_field,
440
/* First check if there was a compare to a min or max element */
441
if (a_flag & (NO_MIN_RANGE | NO_MAX_RANGE))
443
if ((a_flag & (NO_MIN_RANGE | NO_MAX_RANGE)) ==
444
(b_flag & (NO_MIN_RANGE | NO_MAX_RANGE)))
446
return (a_flag & NO_MIN_RANGE) ? -1 : 1;
448
if (b_flag & (NO_MIN_RANGE | NO_MAX_RANGE))
449
return (b_flag & NO_MIN_RANGE) ? 1 : -1;
451
if (in_field->real_maybe_null()) // If null is part of key
458
goto end; // NULL where equal
459
a++; b++; // Skip NULL marker
461
cmp= in_field->key_cmp(a , b);
462
if (cmp) return cmp < 0 ? -1 : 1; // The values differed
464
// Check if the compared equal arguments was defined with open/closed range
466
if (a_flag & (NEAR_MIN | NEAR_MAX))
468
if ((a_flag & (NEAR_MIN | NEAR_MAX)) == (b_flag & (NEAR_MIN | NEAR_MAX)))
470
if (! (b_flag & (NEAR_MIN | NEAR_MAX)))
471
return (a_flag & NEAR_MIN) ? 2 : -2;
472
return (a_flag & NEAR_MIN) ? 1 : -1;
474
if (b_flag & (NEAR_MIN | NEAR_MAX))
475
return (b_flag & NEAR_MIN) ? -2 : 2;
476
return 0; // The elements where equal
482
SEL_ARG *rb_delete_fixup(SEL_ARG *root,
486
extern SEL_ARG null_element;
488
} /* namespace optimizer */
490
} /* namespace drizzled */
492
#endif /* DRIZZLED_OPTIMIZER_SEL_ARG_H */