Speech homework: (fundamentals of speech processing)
Problem 1 (audio storage)
Suppose there is a 1-hour audio file with 8kHz sampling rate and 16-bit representation for each sample, how much disk space does it need to store it without compression?
Problem 2 (framing)
To process this 1-hour audio, typically we divide it into frames (small chunks). If the frame width is 30ms and the frame shift is 10 ms, how many frames will be generated from this 1-hour audio file? How many audio samples are there in each frame?
Problem 3 (feature storage)
If each frame of the speech signal is represented by a 13-dim cepstral feature vector in 32-bit floating point, how much disk space does it need to store these features without compression?
Problem 4 (spectrogram)
Record the utterance "deep learning for computer vision speech and language" using your voice. Plot out the spectrogram (preferably in black and white, something like lecture slides 28 and 29).
Problem 5 (pitch/formant)
Record the sound "/a:/" for about 1 second using your voice and plot its waveform and spectrum, (roughly) estimate your pitch and the first 3 formants for this sound (something like lecture slide 27).