-
When a sound is digitized, some information is ALWAYS lost.
- In order to recognize a particular wave the sample rate must be at least 2X the frequency. [Why? - Nyquist Sampling Theorem : for lossless digitization, the sample rate must be at least twice the maximum frequency.]
-
- Samples (snapshots; measurements) are taken at regular intervals. Sample rate is measured in Hertz (samples/second). This means if the highest frequency is 22,000 Hertz, we will need to sample at 44,000 samples/sec. Why did we choose 44,100?
This means we will end up with 44,100 readings (samples) per second : in other words - 44,100 NUMBERS for each second of sound. If we have stereo (2 channels), we must double this (88,200). Then we must decide how many bits to allow for each sample. What are the implications of choosing an 8-bit sample depth? How many distinct samples can we represent? Must they be continuous (in a row)?
There are some different classifications of sound (gee, just like images - maybe classifying things is a common strategy).
- Computer-Generated (synthesized): cartoon-like?
- Natural Source:
- music
- voice
- everything else
- What might be an advantage of distinguishing between these?
-
- Typical voice = 500 Hz - 2,000 Hz (compare to 20 - 20,000 Hz)
- Humans are MOST sensitive at 600-6000 Hz (Coincidence? I think NOT.)
- For voice encoding we need a smaller range of frequencies than for music. Typical telephone samples @ 8 kHz (8000 samples/second * 8 bits/sample = 64K (Windows Media Player defaults to 64Kbits/sec).
- Question: Why is it often hard to distinguish between "s" and "f" on the phone?
-
- Two sides to speech:
- Speech generation: need to be able to represent and manipulate in a particular way.
- Speech Analysis: different constraints / problems:
- verification: this is me
- identification: who is this
- recognition: what is said
- understanding: how it is said
-
- Look:
- Graphics (image generation) is to Vision (image analysis) what Speech & sound generation is to sound analysis. In one we are interested in how the humans like it, in the other case we are interested in doing measurements & calculations to extract new information (?)
-
-
- Normally, samples are given in 8 / 16 bits per second - in some sense, the # of bits relates to the "step" size for distinct sounds.
-