keep a primary key that is unique; lookup using secondary keys which point to primary key which holds byte offset
want to keep byte offset info in as few places as possible so there's less to update
all additions/deletions/modifications require updating ALL key indices (adding/removing/rearranging etc.)
Secondary Keys
secondary keys usually based on some field(s) in data record; primary key is sometimes inaccessible(not meaningful) to user
Update doesn't change any key: simplest; indices unaffected unless record must be moved - then update only key that has byte offset (primary key)
Update changes secondary key:[SLIDE 5] rearrange secondary key index if necessary
Update changes primary key: rearrange primary key index; update pointers in all secondary indices (can be RRNindex or string/number directly related to data field). Secondary indices should not need to be moved
once you have several secondary keys - can begin to search using logical expressions - involves parsing the request and building an efficient search request (Selective Indices)
updates may be expensive (secondary indices will get rearranged a lot)
duplicate secondary keys result in wasted space
Solution 1: create array of references for secondary keys (i.e. all 'Becker's) - cuts down on requirement to rearrange secondary key index - just change the 'pointer' and leave the entry
Problem: it wastes space if secondary key records are fixed size
Inverted Lists
Solution: create linked list of entries for secondary keys
inverted because we are trying to go back from secondary key to primary key
now we won't have to rearrange the secondary key index often: only when a new secondary key value appears
Secondary Key File only needs to be rearranged when a new key is saved
Rearranging Secondary Key File will likely be faster because it tends to be smaller
Less need for sorting = less overhead
Secondary Key Linked List contains fixed records that are entry sequenced - never needs sorting (need to manage holes though)
Disadvantages:
Records in the Secondary Key Linked List aren't necessarily grouped together - affects disk access (could institute paging system)
Binding: When are keys bound to physical addresses?
Primary Keys are bound when they are created
Secondary Keys are bound when they are used (through de-referencing)
may result in more disk accesses for searches; but substantially cuts down the cost of updates
less error prone than direct binding
Practice Safe Programming!
Changes should affect as few places as possible.
Code execution is fast and cheap. (yeah, right!)
A FEW WORDS ON SEARCHING....
Binary Search is O(log2 N); pretty good... but for large files like 1,000,000 records, binary search is about 20.... if that means 20 disk accesses, this could be unacceptable.
Interpolation Search:
more like how we do it
choose next location based on its estimated position in the file
key(Sought) - key(Lower)
next = Lower + -------------------------------------- (Upper - Lower)
key(Upper) - key(Lower)
O(log2 log2 N)
more work than binary, but fewer disk accesses
Self-Organizing Lists:
records are returned to the head of the list
most frequently used records are closest to the front
Transpose Method:
switch sought record with its immediate predecessor
commonly sought records tend to migrate to the head of the list
Count Method:
keep a count of # times record is accessed
record gets moved to location ahead of those with lower counts
CPSC 461:Copyright © 2002 Katrin Becker 1998-2002 Last Modified November 17, 200012:48 PM