- CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified August 5, 2000 12:44 PM
Ring Files
Sometimes also called:
- circular lists
- threaded lists*
- multilists*
- * the last 2 usually imply that members are linked in more than one way though the list itself need not be circular
We've seen how quickly multiple indices get ugly:
- - they waste space
- - they are hard to maintain
- - they really don't do much to improve search times for non-trivial logical queries
-
Ring Files, used in some of the largest databases are one solution.
Rather than maintaining separate indices, they are incorporated into the data themselves - records are linked to each other.
All records with a common attribute value (same value for a given field) are linked together in a circularly linked list headed by a header record which contains the actual attribute value.
The header contains information which pertains to all members of its ring.
Rings can be nested to many levels:
- - member records of a level i ring become header records for the next level down (level i-1)
- - the bottom level is level 1
-
If the attribute has a finite # of values (ordinal, reasonable size) then the values themselves need not be stored in the data record - they only need to appear in the header record. This is potentially a substantial savings - all we need to store is a pointer in the record.
If the attribute is not ordinal, or has a large number of values it may be prudent to create sub-groups as members of one ring. In this case the value range must be stored in the header but the actual value must still be stored in the record.
One of the chief advantages of this structure is that it permits efficient results for sometimes complex queries.
Slide 2 shows a fairly complex hierarchical structure. Remember that each box represents a complete ring (if we drew them all it would look like spaghetti). Here we have employees in various locations segregated into specific departments, accessible in order of seniority; we have a warehouse at each location as well as stock information.
Relationships between rings are not necessarily hierarchical. We can set up rings that relate members of the same ring, that provide multiple pathways between records,and ones that relate lower rings back to higher order ones.
Searching/Queries (a.k.a. Navigation):
Not all constructs are allowable, desirable, or implementable in practice. The ease with which membership arrows can be drawn hides the complexity of the underlying structures very effectively. The query processor for such a structure also becomes quite complex.
Often when a query is placed, there is more than one way to proceed with the search and at least some of these alternatives will be more efficient (sometimes substantially so) than others. In some cases, the only way to deal with this is with further user input (ask them).
A problem with this set-up is that we cannot always access the values we need for the search by following just one path. We can solve this with a parallel search or by making an initial search and then tracking down the remaining required values 'singly'.
Structure:
All records in a multiring file have similar structures, but their contents and size are a function of the ring to which they belong. There may be a considerable number of categories. (note that this is in violation of the general rule that a file should contain records that are identical in format). Ring membership as well as file membership must be known before a record can be processed.
One way of handling this is to have a field that appears at the start of all records and is identical in format for all records that identifies the type of record we have found.
Generally there will be many link pointers and they will not always be used. Since we have many record types we don't need to waste much field space otherwise and in general, the records will be quite dense.
Physical Placement of Records:
- records frequently accessed together should be stored with high degree of locality
- entire rings might be best stored together (on one cylinder = clustered ring)
- when frequent reference to the header record is required maybe the header should be part of the cluster
- in a dynamically changing database, optimal clustering is very difficult to maintain
Header Records:
Each ring must have a header. It is either an entry point, a member of another ring, or both. They may contain no data other than the attribute value or they may be actual data records.
CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified August 5, 2000 12:44 PM