-
-
Computer Science 461Programming Assignment
- NOTE: The programming part of this assignment is worth 75% of the total marks. the other 25% is for documentation and discussion.
-
- Write a program that will create and manipulate an indexed, varying length record file with the following characteristics:
- 1. Each record is identified using the time of the transaction in the form yyyymmddhhmm where month, day, hour (00-23), and minute are represented by two digits as shown. The year is a 4-digit number. These ID numbers are unique and cannot be modified.
-
- 2. Each record has a minimum of 3 and a maximum of 16 fields. One of these fields must be the identification number. Field order is the same for all records but all fields except the ID field ( and # of children) may vary in length from 0 to a maximum of 100 bytes.
- 3. Updates to an existing record may change the size of existing fields, add fields previously null, or `delete' existing fields. The ID field may not be changed.
- 4. Records within the file are kept in sorted order by ID number, but since the ID number is time dependent this may not involve much maintenance. Record order can not be changed. You are not to sort the file at any time.
- 5. All fields other than the ID field contain printable ACSII data with no white space other than blanks.
This program is to be run in a `batch' environment. There will be no user interaction whatsoever. There will be no opportunities to request further information from the user at run time. All user requests will be contained in a `script' file. There will be one script file per run and both the data file name and the script file name will be specified in the command line, which must have the form:
<command> <datafile> <scriptfile>
SAMPLE COMMAND LINE:
a1 datafile dofile1.in
The name of the source program must be a1.
Each run should produce a log file containing a full account of the modifications that were made and any errors that occurred. Don't let it get too wordy (some kind of tabular form is suggested). The name of the log file must be related to the script file. For example, if the script file is called `run1.mods', then the log file will be called `run1.log'. Your program must attempt all requests (i.e. it may not terminate prematurely unless the file names in the command line are invalid). All requests (from the script file) must be echoed in the log file.
Your implementation must execute the following operations:
1. OPEN FILE:
- This is done at startup. The user may not open a file from within the script.
- If the file doesn't already exist then create the file for an unspecified number of records (i.e. open a new file). Each record must specify the byte offset (i.e. count of bytes) to the next record and the record's size. These values are to be located at the beginning of each record making the maximum number of fields for each record equal to 18. The ID field must be third.
- If the file does exist, then open it and prepare it for reading and writing.
- 2. ADD RECORD:
- Add a new record to the file.
- Form of request:
- a <new record><cr>
- a yyyymmddhhmm|.......rest of record[CR]
- Each ADD will have exactly 15 delimiters but some may have no values between the delimiters
-
- 3. MODIFY RECORD:
- Using the ID number and a field specification (the field number), the user may modify an existing field or add data to blank fields.
- The ID field is field #1 (the next record `pointer' is field 0)
- All field modifications change the entire field.
- Only one field is to be modified per request. Most modifications to the same record appear one after the other but this is not guaranteed.
- Modification requests need not appear in any particular order.
- Form of request:
- m <record identifier>,<field identifier>,<data for field><cr>
- m yyyymmddhhmm,field#,<new field data>[CR]
- this command can be used to delete a field (make it null)
-
- 4. DELETE RECORD:
- Using the ID number and a field specification (the field number), the user may delete an record.
- The ID field is field #1 (the next record `pointer' is field 0)
- Deletion requests need not appear in any particular order.
- Form of request:
- d <record identifier><cr>
- d yyyymmddhhmm[CR]
-
- 5. PRINT RECORD:
- Print a single record
- Form of request:
- p <record identifier><cr>
- p yyyymmddhhmm
- Records are to be printed in the log file.
- 6. CLOSE FILE:
- Done during cleanup. Print statistical information (file size, number of records, # of erros found, total requests processed) at the end of the log file.
- 7. COMMENTS:
- Any request line in the script file may contain a comment. It must appear at the end of the line, is delimited by `{`, which follows the last valid character of the request and has no preceding blanks. The comment itself may contain blanks and is always terminated by the end of line.
Add, delete, modify and print requests MUST be in the above format and may be in upper or lower case.
An initial "build" file will be provided. The format of the build file is the same as any other script file for this assignment. Each line in the original build file consists of one add request followed by the record terminated by a carriage return. The first record field on the line is the ID number and each field is separated from the next by a vertical bar: '|'. There are no other acceptable delimiters or special characters (i.e. all characters except the delimiters and the carriage return are part of a field). The delimiter may not appear as part of any field. You will also be provided with at least one test script. Include any additional test files and scripts you create.
Example input record:
a 9609082147|becker|||220-5769||bonus|||||||||[CR]
a 9609082313|smith|||555-6789|junk|junk|run!||||||||[CR]
.....
Your implementation should make efficient use of disk space and minimize disk access as much as possible. Justify all design decisions regarding data structures used and how they affect efficiency and speed. Comment on any special problems encountered as a result of your design and possible solutions or modifications recommended for the next version. Include some discussion of file behaviour as it applies to possible fragmentation and wasted space. Remember that this application should be able to deal with files containing millions of records so design your structures and algorithms accordingly.
Your program must maintain a free list for space avaiable for re-use within the file.
HINT: Keep it simple. However, you may not assume that the index will always fit in RAM.
You are free to examine the contents of the data files and make assuptions about the data (such as setting some fields as fixed). Your choices must be justifiable and you must have contingency plans ( i.e. if you set a field as fixed, what will you do if you get a value larger than your max.?).
Record order MAY NOT BE CHANGED (i.e. the data may not be sorted). Note that both sequential and random access must be possible. The build file will be in ascending order. You can assume the 'build file' has been checked for valid record ID's.
- If a record becomes bigger due to modification, it can be moved.
The use of header records is permitted.
Data file records may have no more than 18 fields in total.
The YEAR part of the ID field is a 4-digit year (the 'Y2K' problem' is not yours).
You may sort the requests from the script file if you must but you must devise a way for the user to find and edit errors that doesn't require the user to work too hard. Remember, in a real life situation there may be thousands of requests in each script file. The method you choose must be well documented and well justified.
- Keep the 'request parser' as simple as possible. There is no need to give detailed error messages for invalid requests.
All code MUST be fully documented and both a programmer's manual and a user's manual must be supplied (can be in same document but must be obviously separate). Programmer documentation must include a description of the structures used, algorithm outlines, and justification for your decisions. If adequate documentation is not supplied your maximum grade will be C.
BONUS:
1. Include an 'echo on/off' facility for the script requests. The format of the request must be in keeping with the others.
2. Implement a secondary index using the name field as the key.
3. Allow for print requests to print to a file other than the log file. (i.e. allow for a file name at the end of a print request) If you implement this bonus, Standard Output (the screen) must be an acceptable argument.
4. Allow a print request that prints a range of records (i.e. from some lower bound to some upper bound - by primary key)
-
IF YOU DEVIATE FROM THE SPECS YOU MUST HAVE A GOOD REASON. In other words you may use your own judgment but you must be able to convince your TA that it's really a good idea!
A major portion of the marks for this assignment is allowed for discussion. You need not implement the most efficient methods you know. Give it some thought at the beginning and then do the implementation. Do not re-design your program if you discover inefficiencies. Instead, discuss what is good about your design and what is not. Make some suggestions for how this program could be re-designed to be more efficient.
- Discussion:
- - Describe and fully justify your structure :
- - How are the 2 numeric fields defined?
- - Are all fields defined as varying length?
- - Is the internal field order the same as in th specification? Why?
- - How does the record structure and size relate to physical disk organization and access?
- - How is the index implemented?
- - Since the key is numeric (and large), storing it as text would be quite innefficient); however it is too large to be stored as a 32-bit integer. How did you solve this problem? NOTE: cutting the year to 2 digits is NOT acceptable!
- - What kind of index search did you implement?
- - Remember, it must be efficient for a file with millions of records - even when the entire index can't be stored in memory at once.
-
-
- DATA FIELD STRUCTURE:
-
- 1. ID (yyyymmddhhmm)
- 2. Last Name
- 3. First Name
- 4. Middle Name
- 5. Address
- 6. City
- 7. Province/State/etc.
- 8. Country
- 9. Planet
- 10. Education/Profession
- 11. Type of Dwelling
- 12. # of children
- 13. Pet # 1
- 14. Pet # 2
- 15. Saying / Comment
- 16. tag: `tv','silly','made-up','fiction'
-
Copyright © Katrin Becker 1998, 1999 Last Modified