• N.B. The majority of this page was lifted from Professor Mark S. Csele's Embedded Systems Design Page at Niagara College.
WAV files are probably the simplest of the common formats for storing audio
samples. Unlike MPEG and other compressed formats, WAVs store samples
"in the raw" where no pre-processing is required other that formatting
of the data.
The following information was derived from several sources including some on the
internet which no longer exist. Being somewhat of a proprietary Microsoft
format there are some elements here which were empirically determined and
so some details may remain somewhat sketchy. From what I've heard, the best source
for information is the File Formats Handbook by Gunter Born (1995, ITP Boston)
The WAV file itself consists of three "chunks" of information: The RIFF chunk which
identifies the file as a WAV file, The FORMAT chunk which identifies parameters
such as sample rate and the DATA chunk which contains the actual data (samples).
Each Chunk breaks down as follows:
Absolute Byte Offset |
Relative Byte Offset |
Description |
0 - 3 |
0 - 3 |
"RIFF" (ASCII Characters) |
4 - 7 |
4 - 7 |
Total Length Of Package To Follow (Binary, little endian) |
8 - 11 |
8 - 11 |
"WAVE" (ASCII Characters) |
FORMAT Chunk (24 bytes in length total)
Absolute Byte Offset |
Relative Byte Offset |
Description |
12 - 15 |
0 - 3 |
"fmt_" (ASCII Characters) |
16 - 19 |
4 - 7 |
Length Of FORMAT Chunk (Binary, always 0x10) |
20 - 21 |
8 - 9 |
Always 0x01 |
22 - 23 |
10 - 11 |
Channel Numbers (Always 0x01=Mono, 0x02=Stereo) |
24 - 27 |
12 - 15 |
Sample Rate (Binary, in Hz) |
28 - 31 |
16 - 19 |
Bytes Per Second |
32 - 33 |
20 - 21 |
Bytes Per Sample: 1=8 bit Mono, 2=8 bit Stereo or 16 bit Mono, 4=16 bit Stereo |
34 - 35 |
22 - 23 |
Bits Per Sample |
DATA Chunk
Absolute Byte Offset |
Relative Byte Offset |
Description |
36 - 39 |
0 - 3 |
"data" (ASCII Characters) |
40 - 43 |
4 - 7 |
Length Of Data To Follow |
44 - EOF |
8 - EOF |
Data (Samples) |
The easiest approach to this file format might be to look at an actual WAV file
to see how data is stored. In this case, we examine DING.WAV which is standard with
all Windows packages. DING.WAV is an 8-bit, mono, 22.050 KHz WAV file of 11,598 bytes
in length. Lets begin by looking at the header of the file (using DEBUG).
246E:0100 52 49 46 46 46 2D 00 00-57 41 56 45 66 6D 74 20 RIFFF-..WAVEfmt 246E:0110 10 00 00 00 01 00 01 00-22 56 00 00 22 56 00 00 ........"V.."V.. 246E:0120 01 00 08 00 64 61 74 61-22 2D 00 00 80 80 80 80 ....data"-...... 246E:0130 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................ 246E:0140 80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80 ................
As expected, the file begins with the ASCII characters "RIFF" identifying it as a
WAV file. The next four bytes tell us the length is 0x2D46 bytes (11590 bytes in
decimal) which is the length of the entire file minus the 8 bytes for the "RIFF" and
length (11598 - 11590 = 8 bytes).
The ASCII characters for "WAVE" and "fmt " follow. Next (line 2 above) we find
the value 0x00000010 in the first 4 bytes (length of format chunk: always constant at 0x10).
The next four bytes are 0x0001 (Always) and 0x0001 (A mono WAV, one channel used).
Since this is a 8-bit WAV, the sample rate and the bytes/second are the same at
0x00005622 or 22,050 in decimal. For a 16-bit stereo WAV the bytes/sec would be
4 times the sample rate. The next 2 bytes show the number of bytes per sample to
be 0x0001 (8-bit mono) and the number of bits per sample to be 0x0008.
Finally, the ASCII characters for "data" appear followed by 0x00002D22 (11,554 decimal)
which is the number of bytes of data to follow (actual samples). The data is a
value from 0x00 to 0xFF. In the example above 0x80 would represent "0" or silence
on the output since the DAC used to playback samples is a bipolar device (i.e. a value of
0x00 would output a negative voltage and a value of 0xFF would output a positive voltage at
the output of the DAC on the sound card).
Note that there are extension to the basic WAV format which may be supported
in newer systems -- for example if you look at DING.WAV in Windows '95 you'll
see some extra bytes added after the format chunk before the "data" area -- but
the basic format remains the same.
As a final example consider the header for the following WAV file recorded at 44,100 samples
per second in 16-bit stereo.
246E:0100 52 49 46 46 2C 48 00 00-57 41 56 45 66 6D 74 20 RIFF,H..WAVEfmt 246E:0110 10 00 00 00 01 00 02 00-44 AC 00 00 10 B1 02 00 ........D....... 246E:0120 04 00 10 00 64 61 74 61-00 48 00 00 00 00 00 00 ....data.H...... 246E:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
Again we find all the expected structures. Note that the sample rate is 0xAC44
(44,100 as an unsigned int in decimal) and the bytes/second is 4 times that figure since
this is a 16-bit WAV (* 2) and is stereo (again * 2). The Channel Numbers field is
also found to be 0x02 here and the bits per sample is 0x10 (16 decimal).