1
/* Copyright (C) 2000-2002 MySQL AB
2
Copyright (C) 2008 eBay, Inc
4
This program is free software; you can redistribute it and/or modify
5
it under the terms of the GNU General Public License as published by
6
the Free Software Foundation; version 2 of the License.
8
This program is distributed in the hope that it will be useful,
9
but WITHOUT ANY WARRANTY; without even the implied warranty of
10
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
GNU General Public License for more details.
13
You should have received a copy of the GNU General Public License
14
along with this program; if not, write to the Free Software
15
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
17
/* Implements various base dataspace-related functions - allocate, free, clear */
23
MySQL Heap tables keep data in arrays of fixed-size chunks.
24
These chunks are organized into two groups of HP_BLOCK structures:
25
- group1 contains indexes, with one HP_BLOCK per key
27
- group2 contains record data, with single HP_BLOCK
28
for all records, referenced by HP_SHARE.recordspace.block
30
While columns used in index are usually small, other columns
31
in the table may need to accomodate larger data. Typically,
32
larger data is placed into VARCHAR or BLOB columns. With actual
33
sizes varying, Heap Engine has to support variable-sized records
34
in memory. Heap Engine implements the concept of dataspace
35
(HP_DATASPACE), which incorporates HP_BLOCK for the record data,
36
and adds more information for managing variable-sized records.
38
Variable-size records are stored in multiple "chunks",
39
which means that a single record of data (database "row") can
40
consist of multiple chunks organized into one "set". HP_BLOCK
41
contains chunks. In variable-size format, one record
42
is represented as one or many chunks, depending on the actual
43
data, while in fixed-size mode, one record is always represented
44
as one chunk. The index structures would always point to the first
45
chunk in the chunkset.
47
At the time of table creation, Heap Engine attempts to find out
48
if variable-size records are desired. A user can request
49
variable-size records by providing either row_type=dynamic or
50
block_size=NNN table create option. Heap Engine will check
51
whether block_size provides enough space in the first chunk
52
to keep all null bits and columns that are used in indexes.
53
If block_size is too small, table creation will be aborted
54
with an error. Heap Engine will revert to fixed-size allocation
55
mode if block_size provides no memory benefits (no VARCHAR
56
fields extending past first chunk).
58
In order to improve index search performance, Heap Engine needs
59
to keep all null flags and all columns used as keys inside
60
the first chunk of a chunkset. In particular, this means that
61
all columns used as keys should be defined first in the table
62
creation SQL. The length of data used by null bits and key columns
63
is stored as fixed_data_length inside HP_SHARE. fixed_data_length
64
will extend past last key column if more fixed-length fields can
65
fit into the first chunk.
67
Variable-size records are necessary only in the presence
68
of variable-size columns. Heap Engine will be looking for VARCHAR
69
columns, which declare length of 32 or more. If no such columns
70
are found, table will be switched to fixed-size format. You should
71
always try to put such columns at the end of the table definition.
73
Whenever data is being inserted or updated in the table
74
Heap Engine will calculate how many chunks are necessary.
75
For insert operations, Heap Engine allocates new chunkset in
76
the recordspace. For update operations it will modify length of
77
the existing chunkset, unlinking unnecessary chunks at the end,
78
or allocating and adding more if larger length is necessary.
80
When writing data to chunks or copying data back to record,
81
Heap Engine will first copy fixed_data_length of data using single
82
memcpy call. The rest of the columns are processed one-by-one.
83
Non-VARCHAR columns are copied in their full format. VARCHAR's
84
are copied based on their actual length. Any NULL values after
85
fixed_data_length are skipped.
87
The allocation and contents of the actual chunks varies between
88
fixed and variable-size modes. Total chunk length is always
89
aligned to the next sizeof(uchar*). Here is the format of
91
uchar[] - sizeof=chunk_dataspace_length, but at least
92
sizeof(uchar*) bytes. Keeps actual data or pointer
93
to the next deleted chunk.
94
chunk_dataspace_length equals to full record length
95
uchar - status field (1 means "in use", 0 means "deleted")
96
Variable-size uses different format:
97
uchar[] - sizeof=chunk_dataspace_length, but at least
98
sizeof(uchar*) bytes. Keeps actual data or pointer
99
to the next deleted chunk.
100
chunk_dataspace_length is set according to table
102
uchar* - pointer to the next chunk in this chunkset,
103
or NULL for the last chunk
104
uchar - status field (1 means "first", 0 means "deleted",
107
When allocating a new chunkset of N chunks, Heap Engine will try
108
to allocate chunks one-by-one, linking them as they become
109
allocated. Allocation of a single chunk will attempt to reuse
110
a deleted (freed) chunk. If no free chunks are available,
111
it will attempt to allocate a new area inside HP_BLOCK.
112
Freeing chunks will place them at the front of free list
113
referenced by del_link in HP_DATASPACE. The newly freed chunk
114
will contain reference to the previously freed chunk in its first
115
sizeof(uchar*) of the payload space.
118
1. It is not very nice to require people to keep key columns
119
at the beginning of the table creation SQL. There are three
120
proposed resolutions:
121
a. Leave it as is. It's a reasonable limitation
122
b. Add new HA_KEEP_KEY_COLUMNS_TO_FRONT flag to handler.h and
123
make table.cpp align columns when it creates the table
124
c. Make HeapEngine reorder columns in the chunk data, so that
125
key columns go first. Add parallel HA_KEYSEG structures
126
to distinguish positions in record vs. positions in
127
the first chunk. Copy all data field-by-field rather than
128
using single memcpy unless DBA kept key columns to
130
2. heap_check_heap needs verify linked chunks, looking for
131
issues such as orphans, cycles, and bad links. However,
132
Heap Engine today does not do similar things even for
134
3. With new HP_DATASPACE allocation mechaism, BLOB will become
135
increasingly simple to implement, but I may not have time
136
for that. In one approach, BLOB data can be placed at
137
the end of the same record. In another approach (which I
138
prefer) BLOB data would have its own HP_DATASPACE with
139
variable-size entries.
140
4. In a more sophisticated implementation, some space can
141
be saved even with all fixed-size columns if many of them
142
have NULL value, as long as these columns are not used
144
5. In variable-size format status should be moved to lower
145
bits of the "next" pointer. Pointer is always aligned
146
to sizeof(uchar*), which is at least 4, leaving 2 lower
147
bits free. This will save 8 bytes per chunk
149
6. As we do not want to modify FRM format, BLOCK_SIZE option
150
of "CREATE TABLE" is saved as "RAID_CHUNKSIZE" for
154
static uchar *hp_allocate_one_chunk(HP_DATASPACE *info);
160
Frees memory and zeros-out any relevant counters in the dataspace
162
@param info the dataspace to clear
165
void hp_clear_dataspace(HP_DATASPACE *info)
167
if (info->block.levels)
169
VOID(hp_free_level(&info->block,info->block.levels,info->block.root,
172
info->block.levels=0;
173
info->del_chunk_count= info->chunk_count= 0;
175
info->total_data_length= 0;
180
Allocate or reallocate a chunkset in the dataspace
182
Attempts to allocate a new chunkset or change the size of an existing chunkset
184
@param info the hosting dataspace
185
@param chunk_count the number of chunks that we expect as the result
186
@param existing_set non-null value asks function to resize existing chunkset,
187
return value would point to this set
189
@return Pointer to the first chunk in the new or updated chunkset, or NULL if unsuccessful
192
static uchar *hp_allocate_variable_chunkset(HP_DATASPACE *info,
193
uint chunk_count, uchar* existing_set)
195
int alloc_count= chunk_count, i;
196
uchar *first_chunk= 0, *curr_chunk= 0, *prev_chunk= 0, *last_existing_chunk= 0;
202
first_chunk= existing_set;
204
curr_chunk= existing_set;
205
while (curr_chunk && alloc_count)
207
prev_chunk= curr_chunk;
208
curr_chunk= *((uchar**)(curr_chunk + info->offset_link));
216
/* We came through all chunks and there is more left, let's truncate the list */
217
*((uchar**)(prev_chunk + info->offset_link)) = NULL;
218
hp_free_chunks(info, curr_chunk);
224
last_existing_chunk = prev_chunk;
227
/* We can reach this point only if we're allocating new chunkset or more chunks in existing set */
229
for (i=0; i<alloc_count; i++)
231
curr_chunk= hp_allocate_one_chunk(info);
234
/* no space in the current block */
236
if (last_existing_chunk)
238
/* Truncate whatever was added at the end of the existing chunkset */
239
prev_chunk= last_existing_chunk;
240
curr_chunk= *((uchar**)(prev_chunk + info->offset_link));
241
*((uchar**)(prev_chunk + info->offset_link)) = NULL;
242
hp_free_chunks(info, curr_chunk);
244
else if (first_chunk)
246
/* free any chunks previously allocated */
247
hp_free_chunks(info, first_chunk);
253
/* mark as if this chunk is last in the chunkset */
254
*((uchar**) (curr_chunk + info->offset_link))= 0;
258
/* tie them into a linked list */
259
*((uchar**) (prev_chunk + info->offset_link))= curr_chunk;
260
curr_chunk[info->offset_status]= CHUNK_STATUS_LINKED; /* Record linked from active */
264
curr_chunk[info->offset_status]= CHUNK_STATUS_ACTIVE; /* Record active */
269
first_chunk= curr_chunk;
272
prev_chunk= curr_chunk;
280
Allocate a new chunkset in the dataspace
282
Attempts to allocate a new chunkset
284
@param info the hosting dataspace
285
@param chunk_count the number of chunks that we expect as the result
287
@return Pointer to the first chunk in the new or updated chunkset, or NULL if unsuccessful
290
uchar *hp_allocate_chunkset(HP_DATASPACE *info, uint chunk_count)
295
if (info->is_variable_size)
297
result = hp_allocate_variable_chunkset(info, chunk_count, NULL);
301
result= hp_allocate_one_chunk(info);
304
result[info->offset_status]= CHUNK_STATUS_ACTIVE;
315
Reallocate an existing chunkset in the dataspace
317
Attempts to change the size of an existing chunkset
319
@param info the hosting dataspace
320
@param chunk_count the number of chunks that we expect as the result
321
@param pos pointer to the existing chunkset
323
@return Error code or zero if successful
326
int hp_reallocate_chunkset(HP_DATASPACE *info, uint chunk_count, uchar* pos)
329
if (!info->is_variable_size)
331
/* Update should never change chunk_count in fixed-size mode */
332
my_errno=HA_ERR_WRONG_COMMAND;
336
/* Reallocate never moves the first chunk */
337
if (!hp_allocate_variable_chunkset(info, chunk_count, pos))
345
Allocate a single chunk in the dataspace
347
Attempts to allocate a new chunk or reuse one from deleted list
349
@param info the hosting dataspace
351
@return Pointer to the chunk, or NULL if unsuccessful
354
static uchar *hp_allocate_one_chunk(HP_DATASPACE *info)
357
size_t length, block_pos;
361
curr_chunk=info->del_link;
362
info->del_link= *((uchar**) curr_chunk);
363
info->del_chunk_count--;
368
block_pos= (info->chunk_count % info->block.records_in_block);
371
if (hp_get_new_block(&info->block,&length))
373
/* no space in the current block */
377
info->total_data_length+= length;
381
curr_chunk= ((uchar*) info->block.level_info[0].last_blocks +
382
block_pos * info->block.recbuffer);
390
Free a list of chunks
392
Reclaims all chunks linked by the pointer,
393
which could be the whole chunkset or a part of an existing chunkset
395
@param info the hosting dataspace
396
@param pos pointer to the head of the chunkset
399
void hp_free_chunks(HP_DATASPACE *info, uchar *pos)
401
uchar* curr_chunk= pos;
404
info->del_chunk_count++;
405
*((uchar**) curr_chunk)= info->del_link;
406
info->del_link= curr_chunk;
408
curr_chunk[info->offset_status]= CHUNK_STATUS_DELETED;
411
if (!info->is_variable_size)
416
/* Delete next chunk in this chunkset */
417
curr_chunk= *((uchar**)(curr_chunk + info->offset_link));