CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified October 17, 2002 11:14 AM
FILE FORMATS II

Major Reference: All sections marked with (***) are from "The Encyclopedia of Graphics File Formats" by James Murray and William VanRyper, published by O'Reilly)

Formal vs "De Facto" Standards:

Formal Standards come about through official channels like: ISO (International Standards Organization), ANSI (American National Standards Institute), IEC (International Electrotechnical Commission), ITU (International Telecommunications Union. Standards that come about through these bodies go through many iterations, with committies representing various interest groups. It takes considerable time.
A "De Facto Standard" is one created by a single company or individual, which has become widely accepted. It often takes considerably less time, but there is no guarantee that it will be maintained, fixed, or updated if necessary. Backwards compatibility isn't necessarily considered. Examples of this type of standard are: BMP, ICO, WMF, GIF, TIFF.

Some Specifics:

Scientific: FITS, BUFR-GRIB
Graphics: GIF, JPEG, PNG
Video: MPEG, QuickTime, AVI
Audio: MIDI, MP3

FITS: Flexible Image Transport System
- copy of the standard available from NASA [nssdca.gsfc.nasa.gov, subdirectory FITS]
- originated from International Astronomical Union (IAU) 1982
- formalized as standard by: NASA 1990
- general data format
- designed for portability across platforms rather than software
- used primarily for space and ground-based image data
- data usually consists of N-dimensional raster images [N <= 999]
- data organization is sequential; N-dimensional array; column-major order (FORTRAN style)
- header is ASCII
- un-limited grey-scale
- uncompressed
- no limits on image size
- supports multiple images / file
Numerical Format: 2's compliment, big-endian
- available across most platforms
Header is ASCII
Data may be ASCII or binary
- typically consists of 2D array of numbers
- supports comments
- "image" may be grey-scale, pseudocolour, surface plot, contour plot (Q: why not true colour?)
- data values can be offset, scaled, or rotated; the units used can be named
- documentation information includes things like: author, date, observation-date, equinox, history (of data acquisition/processing), instrument used, objects observed, observer, telescope used
Header: 80 column lines (developed when cards were the common input / mass storage device); all unused bytes must be set to blank
col 1-8 = upper case keyword
      9 = '='
     10 = <blank>
  11-80 = <value>
/ coments may appear after a slash anytime after column 11
"Records" are multiples of 2880 bytes (36 X 80 chars)
Sample Header

The header of a basic FITS image file might appear as follows (the first two lines are for positional information only and are not included in the FITS file):

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890123456
SIMPLE  = T
BITPIX  = 8/ 8 bits per pixel; REQUIRED
NAXIS   = 2/ Table is a 2D matrix; REQUIRED
NAXIS1  = 168/ Width of table row in bytes; REQUIRED
NAXIS2  = 5/ Number of rows in table; REQUIRED
DATE    = '09/17/93'
ORIGIN  = 'O''Reilly & Associates'/ Publisher
AUTHOR  = 'James D. Murray'/ Creator
REFERENC= 'Graphics File Formats'/ Where referenced
COMMENT = 'Sample FITS header'
END

BUFR-GRIB
Meteorological Data
BUFR - (Binary Universal Form for the Representation of Meteorological Data) mostly observations
GRIB - (GRIdded Binary) mostly calculation results and simulations
- uncompressed
- fixed point data (with multipliers and offsets)
- very complex

GIF (Graphics Interchange Format) (***)
Because Murray & VanRyper said it so much beter than I...
"GIF (Graphics Interchange Format) is a creation of CompuServe and is used to store multiple bitmap images in a single file for exchange between platforms and systems. In terms of number of files in existence, GIF is perhaps the most widely used format for storing multibit graphics and image data. Even a quick peek into the graphics file section of most BBSs and file archives seems to prove this true. Many of these are high-quality images of people, landscapes, cars, astrophotographs, and anthropometric gynoidal data (you guess what that is). Shareware libraries and BBSs are filled with megabytes of GIF images. "
- Stream based - consists of series of data packets (blocks) along with protocol information
- as result, GIFs must be read as a continuous stream of data
- blocks and sub-blocks can be located anywhere in the file
- each block starts with a byte count 1-255
- image is always LZW compressed (privately owned by Compuserve - can't write your own readers/writers)
- 2 revisions: GIF87a and GIF89a, one of which must make up the first 6 bytes of the file

What follows are just a few of the bits in the file....
Header:
typedef struct _GifHeader
{
// Header
BYTE Signature[3];    /* Header Signature (always "GIF") */
BYTE Version[3];      /* GIF format version("87a" or "89a") */
// Logical Screen Descriptor
WORD ScreenWidth;     /* Width of Display Screen in Pixels */
WORD ScreenHeight;    /* Height of Display Screen in Pixels */
BYTE Packed;          /* Screen and Color Map Information */
BYTE BackgroundColor; /* Background Color Index */
BYTE AspectRatio;     /* Pixel Aspect Ratio */
} GIFHEAD;
Description of "Packed" ==>
Bits 0-2 Size of the Global Color Table (minus one; "111" means 8-bit pixel depth); 0=no global table
Bit 3 Color Table Sort Flag (1 = sorted from "most important [i.e. frequent] to least
Bits 4-6 Color Resolution (size of original colour palette minus one)
Bit 7 Global Color Table Flag (yes/no)
typedef struct _GifColorTable
{
BYTE Red;   /* Red Color Element */
BYTE Green; /* Green Color Element */
BYTE Blue;  /* Blue Color Element */
} GIFCOLORTABLE;

ColorTableSize = 3L * (1L << (SizeOfGlobalColorTable + 1));
The Local Image Descriptor appears before each section of image data and has the following structure:

typedef struct _GifImageDescriptor
{
BYTE Separator; /* Image Descriptor identifier */
WORD Left;      /* X position of image on the display */
WORD Top;       /* Y position of image on the display */
WORD Width;     /* Width of the image in pixels */
WORD Height;    /* Height of the image in pixels */
BYTE Packed;    /* Image and Color Table Data Information */
} GIFIMGDESC;


JPEG (Joint Photographic Experrts Group)(***)
- JPEG actually refers to a standards organization, a data compression method, and a file format
- JPEG spec doe not specify interchange format; that's what JFIF is for
- file is byte-stream, 16-bit, big-endian (or streams of blocks)
- first two bytes: FFh D8h
- can do 24-bit
- quite complex
- lossy compression (ALWAYS*)
- designed for photographs
- not good for images w/ large areas of single colour: doesn't compress well and tends to produce artifacts
- uses Discrete Cosine Transform to compress
- tends to preserve colour intensity but discards slight colour changes
- final step is Huffman
- does not support multiple images/file
Although JFIF files do not possess a formally-defined header, the SOI and JFIF APP0 markers taken together act as a header in the following marker segment structure:

typedef struct _JFIFHeader
{
BYTE SOI[2]; /* 00h Start of Image Marker */
BYTE APP0[2]; /* 02h Application Use Marker */
BYTE Length[2]; /* 04h Length of APP0 Field */
BYTE Identifier[5]; /* 06h "JFIF" (zero terminated) Id String */
BYTE Version[2]; /* 07h JFIF Format Revision */
BYTE Units; /* 09h Units used for Resolution */
BYTE Xdensity[2]; /* 0Ah Horizontal Resolution */
BYTE Ydensity[2]; /* 0Ch Vertical Resolution */
BYTE XThumbnail; /* 0Eh Horizontal Pixel Count */
BYTE YThumbnail; /* 0Fh Vertical Pixel Count */
} JFIFHEAD;



SOI is the start of image marker and always contains the marker code values FFh D8h.

APP0 is the Application marker and always contains the marker code values FFh E0h.

Length is the size of the JFIF (APP0) marker segment, including the size of the Length field itself and any thumbnail data contained in the APP0 segment. Because of this, the value of Length equals 16 + 3 * XThumbnail * YThumbnail.

Identifier contains the values 4Ah 46h 49h 46h 00h (JFIF) and is used to identify the code stream as conforming to the JFIF specification.

Version identifies the version of the JFIF specification, with the first byte containing the major revision number and the second byte containing the minor revision number. For version 1.02, the values of the Version field are 01h 02h; older files contain 01h 00h or 01h 01h.

Units, Xdensity, and Ydensity identify the unit of measurement used to describe the image resolution. Units may be 01h for dots per inch, 02h for dots per centimeter, or 00h for none (use measurement as pixel aspect ratio). Xdensity and Ydensity are the horizontal and vertical resolution of the image data, respectively. If the Units field value is 00h, the Xdensity and Ydensity fields will contain the pixel aspect ratio (Xdensity : Ydensity) rather than the image resolution. Because non-square pixels are discouraged for portability reasons, the Xdensity and Ydensity values normally equal 1 when the Units value is 0.

XThumbnail and YThumbnail give the dimensions of the thumbnail image included in the JFIF APP0 marker. If no thumbnail image is included in the marker, then these fields contain 0. A thumbnail image is a smaller representation of the image stored in the main JPEG data stream (some people call it an icon or preview image). The thumbnail data itself consists of an array of XThumbnail * YThumbnail pixel values, where each pixel value occupies three bytes and contains a 24-bit RGB value (stored in the order R,G,B). No compression is performed on the thumbnail image.

Storing a thumbnail image in the JFIF APP0 marker is now discouraged, though it is still supported for backward compatibility. Version 1.02 of JFIF defines extension markers that allow thumbnail images to be stored separately from the identification marker. This method is more flexible, because multiple thumbnail formats are permitted and because multiple thumbnail images of different sizes could be included in a file. Version 1.02 allows color-mapped thumbnails (one byte per pixel plus a 256-entry
colormap) and JPEG-compressed
*New proposed standard: JPEG 2000 includes non-lossy compression - see: http://www.jpeg.org/JPEG2000.htm


PNG (Thanks again to: Murray & VanRyper )
PNG and GIF89a share the following features:
Format organized as a data stream
Lossless image data compression
Storage of index-mapped images containing up to 256 colors
Progressive display of interlaced image data
Transparent key color supported
Ability to store public and private user-defined data
Independent from hardware and operating system
The following GIF features have been improved upon in PNG:
Legally unencumbered method of data compression
Faster progressive display interlacing scheme
Greater extensibility for storing user-defined data

The following PNG features are not found in GIF:
Storage of truecolor images of up to 48 bits per pixel
Storage of gray-scale images of up to 16 bits per pixel
Full alpha channel
Gamma indicator
CRC method of data stream corruption detection
Standard toolkit for implementing PNG readers and writers
Standard set of benchmark images for testing PNG readers

The following GIF features are not found in PNG v1.0:
Capability of storing multiple images
Support of storage of animation sequences
Payment of a licensing fee required to sell software that reads or writes the GIF file format

File Format:
Signature:
typedef struct _PngSignature
{
BYTE Signature[8]; /* Identifier (always 89504E470D0A1A0Ah) */
} PNGSIGNATURE;


IHDR Chunk
typedef struct _IHDRChunk
{
DWORD Width;      /* Width of image in pixels */
DWORD Height;     /* Height of image in pixels */
BYTE BitDepth;    /* Bits per pixel or per sample */
BYTE ColorType;   /* Color interpretation indicator */
BYTE Compression; /* Compression type indicator */
BYTE Filter;      /* Filter type indicator */
BYTE Interlace;   /* Type of interlacing scheme used */
} IHDRCHUNK;

PLTE Chunk
typedef struct _PLTEChunkEntry
{
BYTE Red;   /* Red component (0 = black, 255 = maximum) */
BYTE Green; /* Green component (0 = black, 255 = maximum) */
BYTE Blue;  /* Blue component (0 = black, 255 = maximum) */
} PLTECHUNKENTRY;
PLTECHUNKENTRY PLTEChunk[];

IDAT Chunk

IEND Chunk

typedef struct _PngChunk
{
DWORD DataLength; /* Size of Data field in bytes */
DWORD Type;       /* Code identifying the type of chunk */
BYTE Data[];      /* The actual data stored by the chunk */
DWORD Crc;        /* CRC-32 value of the Type and Data fields */
} PNGCHUNK;

MPEG
- designed for sound and motion video on CD and DAT
- image max size = 4095 X 4095 X 30 frames/sec

(Thanks again to: Murray & VanRyper )
MPEG uses two types of compression methods to encode video data: interframe and intraframe encoding. Interframe encoding is based upon both predictive coding and interpolative coding techniques, as described below.

When capturing frames at a rapid rate (typically 30 frames/second for real time video) there will be a lot of identical data contained in any two or more adjacent frames. If a motion compression method is aware of this "temporal redundancy," as many audio and video compression methods are, then it need not encode the entire frame of data, as is done via intraframe encoding. Instead, only the differences (deltas) in information between the frames is encoded. This results in greater compression ratios, with far less data needing to be encoded. This type of interframe encoding is called predictive encoding.

A further reduction in data size may be achieved by the use of bi-directional prediction. Differential predictive encoding encodes only the differences between the current frame and the previous frame. Bi-directional prediction encodes the current frame based on the differences between the current, previous, and next frame of the video data. This type of interframe encoding is called motion-compensated interpolative encoding.

To support both interframe and intraframe encoding, an MPEG data stream contains three types of coded frames:


I-frames (intraframe encoded)

P-frames (predictive encoded)

B-frames (bi-directional encoded)


An I-frame contains a single frame of video data that does not rely on the information in any other frame to be encoded or decoded. Each MPEG data stream starts with an I-frame.

A P-frame is constructed by predicting the difference between the current frame and closest preceding I- or P-frame. A B-frame is constructed from the two closest I- or P-frames. The B-frame must be positioned between these I- or P-frames.

A typical sequence of frames in an MPEG stream might look like this:

IBBPBBPBBPBBIBBPBBPBBPBBI

In theory, the number of B-frames that may occur between any two I- and P-frames is unlimited. In practice, however, there are typically twelve P- and B-frames occurring between each I-frame. One I-frame will occur approximately every 0.4 seconds of video runtime.

Remember that the MPEG data is not decoded and displayed in the order that the frames appear within the stream. Because B-frames rely on two reference frames for prediction, both reference frames need to be decoded first from the bitstream, even though the display order may have a B-frame in between the two reference frames.

In the previous example, the I-frame is decoded first. But, before the two B-frames can be decoded, the P-frame must be decoded, and stored in memory with the I-frame. Only then may the two B-frames be decoded from the information found in the decoded I- and P-frames. Assume, in this example, that you are at the start of the MPEG data stream. The first ten frames are stored in the sequence IBBPBBPBBP (0123456789), but are decoded in the sequence:

IPBBPBBPBB (0312645978)

and finally are displayed in the sequence:

IBBPBBPBBP (0123456789)

Once an I-, P-, or B-frame is constructed, it is compressed using a DCT compression method similar to JPEG. Where interframe encoding reduces temporal redundancy (data identical over time), the DCT-encoding reduces spatial redundancy (data correlated within a given space). Both the temporal and the spatial encoding information are stored within the MPEG data stream.


From: Wilson Woo <wilson00@HK.Super.NET>
To: submit@wotsit.demon.co.uk
Subject: MPEG Video

THIS TEXT CONTAINS ONLY MPEG VIDEO HEADER INFO - BY WILSON WOO
It's only what I know. Please feel free to update it.

Below is information got from someone.

/*****************************************************************/

Sequence Header

This contains information related to one or more "group-of-pictures"

Byte# Data Details
===================================================================
1-4 Sequence header In Hex 000001B3 code
12 bits Horizontal size In pixels
12 bits Vertical size In pixels
4 bits Pel aspect ratio See below
18 bits Picture rate See below
1 bit Marker bit Always 1
10 bits VBV buffer size Minimum buffer needed to decode this
                               sequence of pictures; in 16KB units
1 bit Constrained parameter flag
1 bit Load intra 0: false; 1: true (matrix follows) quantizer matrix
64 bytes Intra quantizer Optional matrix
1 bit Load nonintra 0: false; 1: true (matrix follows) quantizer matrix
64 bytes Nonintra quantizer Optional
matrix
- Squence extension Optional
Data
- User data Optional application-dependent data
===================================================================

Aspect raios are defined by a code which represents the height and
width of the Video image.
Picture rates are also defined by a code that represents the number
of pictures that may be displayed each second.

Each group of pictures has a header that contains one "I picture"
and zero or more B and P pictures. The header is concerned with
the time synchronisation for the first picture in this group, and
the closeness of the previous group to this one.

/*****************************************************************/

For picture rate:
1 = 23.976 frames/sec
2 = 24
3 = 25
4 = 29.97
5 = 30
6 = 50
7 = 59.94
8 = 60

Here gives an example. Below is Hex dump of first 256 bytes of
the first Video frame of TEST.MPG from XingMPEG.

00 00 01 B3 16 00 F0 C4 02 A3 20 A5 10 12 12 14
14 14 16 16 16 16 18 18 19 18 18 1A 1B 1B 1B 1B
1A 1C 1D 1E 1E 1E 1D 1C 1E 1F 20 21 21 20 1F 1E
21 23 23 24 23 23 21 25 26 27 27 26 25 29 2A 2A
2A 29 2D 2D 2D 2D 30 31 30 34 34 38 16 00 F0 C4
00 00 01 B8 00 08 00 00 00 00 01 00 00 0A 72 00
00 00 01 01 13 F9 50 02 BC B2 B8 BE 68 8B A4 9F
C5 B5 CA 00 56 76 39 65 F2 30 8B A6 9D 50 69 E7
DA FE 13 CF B7 FF 8F F4 CE 7B FA 0E F0 66 AE 1C
5D E7 00 C8 0A 92 B9 29 3C 21 23 F1 D6 40 13 06
F0 10 10 C6 27 80 A0 34 E1 C8 E4 0F 74 91 DA C4
03 A0 DC 03 12 60 18 49 27 1D D4 BC 67 0E 54 8C
96 FC 5D C0 06 E0 1A 72 11 7C 9A 8D C9 45 89 6D
CD C4 0B 63 DC 90 18 24 00 EC 84 90 18 10 C9 3B
1E A7 60 3C 9D 74 80 76 05 0B 02 81 A9 29 39 68
53 8F 59 F1 BF 93 FB A0 04 01 BC B0 CE 18 E1 25

Sequence header = (Hex) 00 00 01 B3
Horizontal size = 0x160 = 352
Vertical size = 0x0F0 = 240
Pel aspect ratio = [I don't know]
Picture rate = 4 = 29.97 frames/sec
Marker bit = 1

QuickTime (Thanks again to: Murray & VanRyper )
A QuickTime movie may be stored as a disk file or may be encoded on a DAT or a CD-ROM. Playback of audio and video data is quick, and the audio and video output at least matches the quality of a VCR-taped program.

The QuickTime format allows the storage of multiple tracks of audio and video data. Multiple audio tracks may be used to store the narration for a movie in several different languages. Multiple video tracks may be used to change the video output based on the user responses to an interactive multimedia application. QuickTime movies may also contain a preview, which is a five-second sequence of audio and video data from the movie, and a poster, which is a single frame displayed from the movie data. Both previews and posters are used to quickly identify a movie and its contents.

QuickTime movies are normally structured for the Macintosh environment. However, it is possible to store QuickTime movies in an interchange format, which allows time-based information to be exchanged between the Macintosh and other platforms. This ability allows many multimedia applications that run under non-Macintosh environments, such as Microsoft Windows, the capability of recording and playing back QuickTime movies.

The Movie Toolbox defines six different compression methods that may be used in a QuickTime movie. All of the compression methods used, except for JPEG (Joint Photographic Experts Group, described in Chapter 9, Data Compression), are proprietary to Apple Computer and are mentioned only briefly below.


The Photo Compressor uses the JPEG compression method to compress single-frame images. Continuous-tone images with a pixel depth of eight to 24 bits compressed are the optimal source images for the photo compressor.

The Video Compressor is a lossy, motion-video compression method, which uses both spatial and temporal compression techniques and has a very fast decompression time. The video compressor is for use with 24-bit, continuous-tone video images.

The Compact Video Compressor is a lossy, motion-video compression method which is for use with 16- and 24-bit continuous-tone video images. The Compact Video Compressor offers higher image quality, greater compression ratios, and a faster playback speed than is possible when using the Video Compressor, but it requires much more time to perform the initial compression of the video information.

The Animation Compressor uses a motion-video compression method to compress computer-generated and animation sequences. This compressor uses a run-length algorithm which operates on images of any pixel depth and may be selected to perform lossy or lossless compression. The lossy option offers greater data compression ratios at the expense of image quality. This compressor produces high compression ratios at the expense of a slower decompression speed.

The Graphics Compressor employs a compression algorithm that is used to encode 8-bit still images and image sequences. This compressor produces lower compression ratios, but is able to decompress the image data very quickly. This method is used to encode sequences that will be stored on slower devices, such as CD-ROMs.

The Raw Compressor is simply a conversion program that increases (pads) or reduces (decimates) the number of bits in a pixel. A 32-bit image is reduced to a 24-bit image by stripping off the alpha channel bits. A 16-bit image is decimated to an 8-bit image by throwing away the eight least significant bits of each pixel. A 4-bit image is padded out to an 8-bit image by adding four bits to each pixel. The Raw Compressor is used most for preprocessing image data to an appropriate pixel depth before it is encoded by another compressor.


Audio data in QuickTime movie files is digitally encoded into 8-bit samples. A sample is an amplitude value represented by the signed integer range of -128 to 127, with 0 representing silence (two's-complement sound encoding), or an unsigned integer range of 0 to 255, with 128 representing silence (offset-binary sound encoding). Samples stored using the Audio Interchange File Format (AIFF) use the two's-complement encoding method, while samples stored directly in a movie's sound media resource are offset-binary encoded.

Using a C syntax-like notation, you can see the nested structure of atoms within a QuickTime movie file:

struct _MovieDirectory
{
struct _MovieHeaderAtom;
struct _ClippingAtom
{
struct _ClippingRegionAtom;
}
struct _TrackDirectory
{
struct _TrackHeaderAtom;
struct _ClippingAtom
{
struct _ClippingRegionAtom;
}
struct _EditsAtom
{
struct _EditListAtom;
}
struct _MediaDirectory
{
struct _MediaHeaderAtom;
struct _MediaHandlerAtom;
struct _MediaInfoAtom;
{
struct _VideoMediaInfoAtom
{
}
struct _SoundMediaInfoAtom
{
struct _SoundMediaInfoHeaderAtom
{
struct _SoundMediaInfoHeaderAtom;
}
struct _HandlerAtom;
struct _DataReferenceAtom;
struct _SampleTableAtom;
}
}
}
struct _UserDataAtom;
}
struct _UserDataAtom
{
struct _MoviesUserData
{
}
}
}

MIDI
Please fogive me for using so much stuff from the book, but time is short and they've done it so well.... (Thanks again to: Murray & VanRyper ). If you are into this sort of stuff, I'd strongly recommend you get a copy for yourself.
Musical Instrument Digital Interface (MIDI) is an industry standard for representing sound in a binary format. MIDI is not an audio format, however. It does not store actual digitally sampled sounds. Instead, MIDI stores a description of sounds, in much the same way that a vector image format stores a description of an image and not image data itself.

Sound in MIDI data is stored as a series of control messages. Each message describes a sound event using terms such as pitch, duration, and volume. When these control messages are sent to a MIDI-compatible device (the MIDI standard also defines the interconnecting hardware used by MIDI devices and the communications protocol used to interchange the control information) the information in the message is interpreted and reproduced by the device.

MIDI data may be compressed, just like any other binary data, and does not require special compression algorithms in the way that audio data does.


MP3
- uses 2 compression techniques: 1 lossy and 1 not
- the lossy one uses a perceptual codec (it throws away stuff we can't hear anyway)
ASSYMETRIC IN THE EXTREME
bitrate vs samplerate (bitrate = amount of data stored per second) (samplerate = number of readings per second)
1. break signal into frames (typically a fraction of a second)
2. Analyse the signal to get the "spectral energy distribution" i.e. on the entirre spectrum, decide how to distribute the bits for this signal. Break it into sub-bands: each one can allocate bits differently
3. Knowing the encoding bitrate, figure out the maximum number of bits that can be allocated to each frame - this determines how much we can keep and how much we must throw away
4. Compare the frequency spreads against the known psychoacoustic models of human hearing - this tells us what we can keep and what must be turfed.
5. compress the result using Huffman coding
6. Reassemble the frames (with a header for each)
Psychoacoustics:
Masking effects:
Auditory Masking: simultaneous - similar sounds tend to mask each other (eg. 1KHz vs 1.1 KHz is harder to distinguish than 1KHz vs 1.4KHx). If we keep both signals we use 2 times the space but in the first part of the example we will only be able to hear one of them.
Temporal Masking: loud vs quiet at the same time
- we tend to have trouble distinguishing between the two - we can probably toss out the quiet one
- need to know how far apart in time they need to be before we can tell (with pure tones it turns out to me about 5 milliseconds)
- both pre- and post- masking occurs
Can also throw out anything outside the normal range of human hearing.
Also: we know that we need to place the speakers in our house correctly for the best sound. The placement of the sub-woofer doesn't seem to matter though - they can go anywhere. This is because we have trouble locating sounds at the extremes of our audible range. MP3 can make use of this in stereo: it can use a "shared" track - since the location of this sound is irrelevant - these can be shared between the left and right channels (so there is no need to store it twice). We can thus store a "middle" channel which is the sum of the Left and Right channels - now we have an opportunity to store the rest as differences.
All this must fit into the specified bitrate - typical is 128 Kbps
If the bitrate is too low we need to throw more out; if it's high, quality is better but the file is bigger.
Note: bitrate is total; so 128Kbps stereo is actually better than 2 X 64 Kbps mono
- because how the sound is distributed across the channels will allow bits to be allocated differently, thus preserving more info. Also: 1 channel may only be using 40% (say) of the bits in one frame; that means the other channel could 'borrow' some of them.
Bitrate can be constant or variable: CBR (constant bit rate) is simpler bit most music isn't very constant. If we use a variable bitrate (VBR) the 'encoder' gets to set it. WARNING: different encoders measure this differently. Some measure "quality" (so higher numbers are better) while others measure distortion (so then lower numbers mean less distortion, which is better) High quality == low distortion.
Streamed audio is often sent out at 1/2 or 1/4 CD-rate.
finally: some frames are so complex that they can't be encoded properly while others will have "space" left-over in the frame. Longer ones can and do use the left-over space of shorter ones.
CD & DAT usually use 16 bits; MP3 may end up using only 4-6 bits.
header for each frame takes up 32 bits. We get about 38 frames/sec. Given a sample rate of 1,152 samples/sec. (this is constant) : in 128,000 bit per second only 1,223 bits comprise header information.

Back to TopCPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified October 17, 2002 11:14 AM