WAV File Format Description


• N.B. The majority of this page was lifted from Professor Mark S. Csele's Embedded Systems Design Page at Niagara College.

WAV files are probably the simplest of the common formats for storing audio samples. Unlike MPEG and other compressed formats, WAVs store samples "in the raw" where no pre-processing is required other that formatting of the data.

The following information was derived from several sources including some on the internet which no longer exist. Being somewhat of a proprietary Microsoft format there are some elements here which were empirically determined and so some details may remain somewhat sketchy. From what I've heard, the best source for information is the File Formats Handbook by Gunter Born (1995, ITP Boston)

The WAV file itself consists of three "chunks" of information: The RIFF chunk which identifies the file as a WAV file, The FORMAT chunk which identifies parameters such as sample rate and the DATA chunk which contains the actual data (samples).

Each Chunk breaks down as follows:

RIFF Chunk (12 bytes in length total)
Absolute Byte Offset
Relative Byte Offset
Description
0 - 3
0 - 3
"RIFF" (ASCII Characters)
4 - 7
4 - 7
Total Length Of Package To Follow (Binary, little endian)
8 - 11
8 - 11
"WAVE" (ASCII Characters)

FORMAT Chunk (24 bytes in length total)
Absolute Byte Offset
Relative Byte Offset
Description
12 - 15
0 - 3
"fmt_" (ASCII Characters)
16 - 19
4 - 7
Length Of FORMAT Chunk (Binary, always 0x10)
20 - 21
8 - 9
Always 0x01
22 - 23
10 - 11
Channel Numbers (Always 0x01=Mono, 0x02=Stereo)
24 - 27
12 - 15
Sample Rate (Binary, in Hz)
28 - 31
16 - 19
Bytes Per Second
32 - 33
20 - 21
Bytes Per Sample: 1=8 bit Mono, 2=8 bit Stereo or 16 bit Mono, 4=16 bit Stereo
34 - 35
22 - 23
Bits Per Sample

DATA Chunk
Absolute Byte Offset
Relative Byte Offset
Description
36 - 39
0 - 3
"data" (ASCII Characters)
40 - 43
4 - 7
Length Of Data To Follow
44 - EOF
8 - EOF
Data (Samples)

The easiest approach to this file format might be to look at an actual WAV file to see how data is stored. In this case, we examine DING.WAV which is standard with all Windows packages. DING.WAV is an 8-bit, mono, 22.050 KHz WAV file of 11,598 bytes in length. Lets begin by looking at the header of the file (using DEBUG).

246E:0100  52 49 46 46 46 2D 00 00-57 41 56 45 66 6D 74 20   RIFFF-..WAVEfmt
246E:0110  10 00 00 00 01 00 01 00-22 56 00 00 22 56 00 00   ........"V.."V..
246E:0120  01 00 08 00 64 61 74 61-22 2D 00 00 80 80 80 80   ....data"-......
246E:0130  80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80   ................
246E:0140  80 80 80 80 80 80 80 80-80 80 80 80 80 80 80 80   ................

As expected, the file begins with the ASCII characters "RIFF" identifying it as a WAV file. The next four bytes tell us the length is 0x2D46 bytes (11590 bytes in decimal) which is the length of the entire file minus the 8 bytes for the "RIFF" and length (11598 - 11590 = 8 bytes).

The ASCII characters for "WAVE" and "fmt " follow. Next (line 2 above) we find the value 0x00000010 in the first 4 bytes (length of format chunk: always constant at 0x10). The next four bytes are 0x0001 (Always) and 0x0001 (A mono WAV, one channel used).

Since this is a 8-bit WAV, the sample rate and the bytes/second are the same at 0x00005622 or 22,050 in decimal. For a 16-bit stereo WAV the bytes/sec would be 4 times the sample rate. The next 2 bytes show the number of bytes per sample to be 0x0001 (8-bit mono) and the number of bits per sample to be 0x0008.

Finally, the ASCII characters for "data" appear followed by 0x00002D22 (11,554 decimal) which is the number of bytes of data to follow (actual samples). The data is a value from 0x00 to 0xFF. In the example above 0x80 would represent "0" or silence on the output since the DAC used to playback samples is a bipolar device (i.e. a value of 0x00 would output a negative voltage and a value of 0xFF would output a positive voltage at the output of the DAC on the sound card).

Note that there are extension to the basic WAV format which may be supported in newer systems -- for example if you look at DING.WAV in Windows '95 you'll see some extra bytes added after the format chunk before the "data" area -- but the basic format remains the same.

As a final example consider the header for the following WAV file recorded at 44,100 samples per second in 16-bit stereo.

246E:0100  52 49 46 46 2C 48 00 00-57 41 56 45 66 6D 74 20   RIFF,H..WAVEfmt 
246E:0110  10 00 00 00 01 00 02 00-44 AC 00 00 10 B1 02 00   ........D.......
246E:0120  04 00 10 00 64 61 74 61-00 48 00 00 00 00 00 00   ....data.H......
246E:0130  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

Again we find all the expected structures. Note that the sample rate is 0xAC44 (44,100 as an unsigned int in decimal) and the bytes/second is 4 times that figure since this is a 16-bit WAV (* 2) and is stereo (again * 2). The Channel Numbers field is also found to be 0x02 here and the bits per sample is 0x10 (16 decimal).