1
/* -*- mode: c++; c-basic-offset: 2; indent-tabs-mode: nil; -*-
2
* vim:expandtab:shiftwidth=2:tabstop=2:smarttab:
4
* Copyright (C) 2008-2009 Sun Microsystems
6
* This program is free software; you can redistribute it and/or modify
7
* it under the terms of the GNU General Public License as published by
8
* the Free Software Foundation; version 2 of the License.
10
* This program is distributed in the hope that it will be useful,
11
* but WITHOUT ANY WARRANTY; without even the implied warranty of
12
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
* GNU General Public License for more details.
15
* You should have received a copy of the GNU General Public License
16
* along with this program; if not, write to the Free Software
17
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
20
#ifndef DRIZZLED_OPTIMIZER_SEL_ARG_H
21
#define DRIZZLED_OPTIMIZER_SEL_ARG_H
32
A construction block of the SEL_ARG-graph.
34
The following description only covers graphs of SEL_ARG objects with
35
sel_arg->type==KEY_RANGE:
37
One SEL_ARG object represents an "elementary interval" in form
39
min_value <=? table.keypartX <=? max_value
41
The interval is a non-empty interval of any kind: with[out] minimum/maximum
42
bound, [half]open/closed, single-point interval, etc.
44
1. SEL_ARG GRAPH STRUCTURE
46
SEL_ARG objects are linked together in a graph. The meaning of the graph
47
is better demostrated by an example:
52
| part=1 $ part=2 $ part=3
54
| +-------+ $ +-------+ $ +--------+
55
| | kp1<1 |--$-->| kp2=5 |--$-->| kp3=10 |
56
| +-------+ $ +-------+ $ +--------+
62
\->| kp1=2 |--$--------------$-+
63
+-------+ $ $ | +--------+
65
+-------+ $ $ | +--------+
66
| kp1=3 |--$--------------$-+ |
67
+-------+ $ $ +--------+
71
The entire graph is partitioned into "interval lists".
73
An interval list is a sequence of ordered disjoint intervals over the same
74
key part. SEL_ARG are linked via "next" and "prev" pointers. Additionally,
75
all intervals in the list form an RB-tree, linked via left/right/parent
76
pointers. The RB-tree root SEL_ARG object will be further called "root of the
79
In the example pic, there are 4 interval lists:
80
"kp<1 OR kp1=2 OR kp1=3", "kp2=5", "kp3=10 OR kp3=12", "kp3=11 OR kp3=13".
81
The vertical lines represent SEL_ARG::next/prev pointers.
83
In an interval list, each member X may have SEL_ARG::next_key_part pointer
84
pointing to the root of another interval list Y. The pointed interval list
85
must cover a key part with greater number (i.e. Y->part > X->part).
87
In the example pic, the next_key_part pointers are represented by
90
2. SEL_ARG GRAPH SEMANTICS
92
It represents a condition in a special form (we don't have a name for it ATM)
93
The SEL_ARG::next/prev is "OR", and next_key_part is "AND".
95
For example, the picture represents the condition in form:
96
(kp1 < 1 AND kp2=5 AND (kp3=10 OR kp3=12)) OR
97
(kp1=2 AND (kp3=11 OR kp3=14)) OR
98
(kp1=3 AND (kp3=11 OR kp3=14))
103
Use get_mm_tree() to construct SEL_ARG graph from WHERE condition.
104
Then walk the SEL_ARG graph and get a list of dijsoint ordered key
105
intervals (i.e. intervals in form
107
(constA1, .., const1_K) < (keypart1,.., keypartK) < (constB1, .., constB_K)
109
Those intervals can be used to access the index. The uses are in:
110
- check_quick_select() - Walk the SEL_ARG graph and find an estimate of
111
how many table records are contained within all
113
- get_quick_select() - Walk the SEL_ARG, materialize the key intervals,
114
and create QuickRangeSelect object that will
115
read records within these intervals.
117
4. SPACE COMPLEXITY NOTES
119
SEL_ARG graph is a representation of an ordered disjoint sequence of
120
intervals over the ordered set of index tuple values.
122
For multi-part keys, one can construct a WHERE expression such that its
123
list of intervals will be of combinatorial size. Here is an example:
125
(keypart1 IN (1,2, ..., n1)) AND
126
(keypart2 IN (1,2, ..., n2)) AND
127
(keypart3 IN (1,2, ..., n3))
129
For this WHERE clause the list of intervals will have n1*n2*n3 intervals
132
(keypart1, keypart2, keypart3) = (k1, k2, k3), where 1 <= k{i} <= n{i}
134
SEL_ARG graph structure aims to reduce the amount of required space by
135
"sharing" the elementary intervals when possible (the pic at the
136
beginning of this comment has examples of such sharing). The sharing may
137
prevent combinatorial blowup:
139
There are WHERE clauses that have combinatorial-size interval lists but
140
will be represented by a compact SEL_ARG graph.
142
(keypartN IN (1,2, ..., n1)) AND
144
(keypart2 IN (1,2, ..., n2)) AND
145
(keypart1 IN (1,2, ..., n3))
147
but not in all cases:
149
- There are WHERE clauses that do have a compact SEL_ARG-graph
150
representation but get_mm_tree() and its callees will construct a
151
graph of combinatorial size.
153
(keypart1 IN (1,2, ..., n1)) AND
154
(keypart2 IN (1,2, ..., n2)) AND
156
(keypartN IN (1,2, ..., n3))
158
- There are WHERE clauses for which the minimal possible SEL_ARG graph
159
representation will have combinatorial size.
161
By induction: Let's take any interval on some keypart in the middle:
165
Then let's AND it with this interval 'structure' from preceding and
168
(kp14=c1 AND kp16=c3) OR keypart14=c2) (*)
170
We will obtain this SEL_ARG graph:
174
+---------+ $ +---------+ $ +---------+
175
| kp14=c1 |--$-->| kp15=c0 |--$-->| kp16=c3 |
176
+---------+ $ +---------+ $ +---------+
178
+---------+ $ +---------+ $
179
| kp14=c2 |--$-->| kp15=c0 | $
180
+---------+ $ +---------+ $
183
Note that we had to duplicate "kp15=c0" and there was no way to avoid
185
The induction step: AND the obtained expression with another "wrapping"
187
When the process ends because of the limit on max. number of keyparts
190
WHERE clause length is O(3*#max_keyparts)
191
SEL_ARG graph size is O(2^(#max_keyparts/2))
193
(it is also possible to construct a case where instead of 2 in 2^n we
194
have a bigger constant, e.g. 4, and get a graph with 4^(31/2)= 2^31
197
We avoid consuming too much memory by setting a limit on the number of
198
SEL_ARG object we can construct during one range analysis invocation.
201
class SEL_ARG :public memory::SqlAlloc
207
uint8_t part; // Which key part
210
Number of children of this element in the RB-tree, plus 1 for this
215
Valid only for elements which are RB-tree roots: Number of times this
216
RB-tree is referred to (it is referred by SEL_ARG::next_key_part or by
217
SEL_TREE::keys[i] or by a temporary SEL_ARG* variable)
222
unsigned char *min_value,*max_value; // Pointer to range
225
eq_tree() requires that left == right == 0 if the type is MAYBE_KEY.
227
SEL_ARG *left,*right; /* R-B tree children */
228
SEL_ARG *next,*prev; /* Links for bi-directional interval list */
229
SEL_ARG *parent; /* R-B tree parent */
230
SEL_ARG *next_key_part;
255
SEL_ARG(Field *,const unsigned char *, const unsigned char *);
257
SEL_ARG(Field *field,
259
unsigned char *min_value,
260
unsigned char *max_value,
265
SEL_ARG(enum Type type_arg)
277
inline bool is_same(SEL_ARG *arg)
279
if (type != arg->type || part != arg->part)
281
if (type != KEY_RANGE)
283
return (cmp_min_to_min(arg) == 0 && cmp_max_to_max(arg) == 0);
286
inline void merge_flags(SEL_ARG *arg)
288
maybe_flag|= arg->maybe_flag;
291
inline void maybe_smaller()
296
/* Return true iff it's a single-point null interval */
297
inline bool is_null_interval()
299
return (maybe_null && max_value[0] == 1);
302
inline int cmp_min_to_min(SEL_ARG *arg)
304
return sel_cmp(field,min_value, arg->min_value, min_flag, arg->min_flag);
307
inline int cmp_min_to_max(SEL_ARG *arg)
309
return sel_cmp(field,min_value, arg->max_value, min_flag, arg->max_flag);
312
inline int cmp_max_to_max(SEL_ARG *arg)
314
return sel_cmp(field,max_value, arg->max_value, max_flag, arg->max_flag);
317
inline int cmp_max_to_min(SEL_ARG *arg)
319
return sel_cmp(field,max_value, arg->min_value, max_flag, arg->min_flag);
322
SEL_ARG *clone_and(SEL_ARG *arg);
324
SEL_ARG *clone_first(SEL_ARG *arg);
326
SEL_ARG *clone_last(SEL_ARG *arg);
328
SEL_ARG *clone(RangeParameter *param, SEL_ARG *new_parent, SEL_ARG **next);
330
bool copy_min(SEL_ARG *arg);
332
bool copy_max(SEL_ARG *arg);
334
void copy_min_to_min(SEL_ARG *arg);
336
void copy_min_to_max(SEL_ARG *arg);
338
void copy_max_to_min(SEL_ARG *arg);
340
/* returns a number of keypart values (0 or 1) appended to the key buffer */
341
int store_min(uint32_t length, unsigned char **min_key, uint32_t min_key_flag);
343
/* returns a number of keypart values (0 or 1) appended to the key buffer */
344
int store_max(uint32_t length, unsigned char **max_key, uint32_t max_key_flag);
346
/* returns a number of keypart values appended to the key buffer */
347
int store_min_key(KEY_PART *key, unsigned char **range_key, uint32_t *range_key_flag);
349
/* returns a number of keypart values appended to the key buffer */
350
int store_max_key(KEY_PART *key, unsigned char **range_key, uint32_t *range_key_flag);
352
SEL_ARG *insert(SEL_ARG *key);
353
SEL_ARG *tree_delete(SEL_ARG *key);
354
SEL_ARG *find_range(SEL_ARG *key);
355
SEL_ARG *rb_insert(SEL_ARG *leaf);
357
friend SEL_ARG *rb_delete_fixup(SEL_ARG *root,SEL_ARG *key, SEL_ARG *par);
365
inline bool simple_key()
367
return (! next_key_part && elements == 1);
370
void increment_use_count(long count)
374
next_key_part->use_count+= count;
375
count*= (next_key_part->use_count - count);
376
for (SEL_ARG *pos= next_key_part->first(); pos; pos= pos->next)
377
if (pos->next_key_part)
378
pos->increment_use_count(count);
384
for (SEL_ARG *pos= first(); pos; pos= pos->next)
385
if (pos->next_key_part)
387
pos->next_key_part->use_count--;
388
pos->next_key_part->free_tree();
392
inline SEL_ARG **parent_ptr()
394
return parent->left == this ? &parent->left : &parent->right;
399
Check if this SEL_ARG object represents a single-point interval
405
Check if this SEL_ARG object (not tree) represents a single-point
406
interval, i.e. if it represents a "keypart = const" or
410
true This SEL_ARG object represents a singlepoint interval
414
bool is_singlepoint()
417
Check for NEAR_MIN ("strictly less") and NO_MIN_RANGE (-inf < field)
418
flags, and the same for right edge.
420
if (min_flag || max_flag)
422
unsigned char *min_val= min_value;
423
unsigned char *max_val= max_value;
427
/* First byte is a NULL value indicator */
428
if (*min_val != *max_val)
432
return true; /* This "x IS NULL" */
436
return ! field->key_cmp(min_val, max_val);
439
SEL_ARG *clone_tree(RangeParameter *param);
444
Check if a compare is ok, when one takes ranges in account
445
Returns -2 or 2 if the ranges where 'joined' like < 2 and >= 2
447
int sel_cmp(Field *in_field,
454
/* First check if there was a compare to a min or max element */
455
if (a_flag & (NO_MIN_RANGE | NO_MAX_RANGE))
457
if ((a_flag & (NO_MIN_RANGE | NO_MAX_RANGE)) ==
458
(b_flag & (NO_MIN_RANGE | NO_MAX_RANGE)))
460
return (a_flag & NO_MIN_RANGE) ? -1 : 1;
462
if (b_flag & (NO_MIN_RANGE | NO_MAX_RANGE))
463
return (b_flag & NO_MIN_RANGE) ? 1 : -1;
465
if (in_field->real_maybe_null()) // If null is part of key
472
goto end; // NULL where equal
473
a++; b++; // Skip NULL marker
475
cmp= in_field->key_cmp(a , b);
476
if (cmp) return cmp < 0 ? -1 : 1; // The values differed
478
// Check if the compared equal arguments was defined with open/closed range
480
if (a_flag & (NEAR_MIN | NEAR_MAX))
482
if ((a_flag & (NEAR_MIN | NEAR_MAX)) == (b_flag & (NEAR_MIN | NEAR_MAX)))
484
if (! (b_flag & (NEAR_MIN | NEAR_MAX)))
485
return (a_flag & NEAR_MIN) ? 2 : -2;
486
return (a_flag & NEAR_MIN) ? 1 : -1;
488
if (b_flag & (NEAR_MIN | NEAR_MAX))
489
return (b_flag & NEAR_MIN) ? -2 : 2;
490
return 0; // The elements where equal
496
SEL_ARG *rb_delete_fixup(SEL_ARG *root,
500
extern SEL_ARG null_element;
502
} /* namespace optimizer */
504
} /* namespace drizzled */
506
#endif /* DRIZZLED_OPTIMIZER_SEL_ARG_H */