wezm.net/content/technical/2008/04/mp3-decoder-libraries-compared.html
2010-03-12 07:30:27 +11:00

206 lines
6 KiB
HTML

For my current software project I have the need to decode MP3 files for the purpose of producing an <a href="http://images.google.com/images?q=audio+waveform">audio waveform</a>. It doesn't need to be overly accurate as the decoded samples will be displayed, not played. However it does need to be fast, as a typical use case for the application will be MP3 files of around 100Mb (full length CDs). The application is for Mac OS X, although the results of my testing below could be useful for other platforms.
Assisted by a code sample from Apple I wrote an initial version of the decoder that would read the source MP3 file and write the raw linear PCM data out to a file. I did this using the <a href="http://developer.apple.com/documentation/MusicAudio/Conceptual/CoreAudioOverview/index.html">Core Audio</a> framework built into Mac OS X. Once the program was working I tested it against some sample files and came to the conclusion that 4 seconds to decode a 3 min track was great but over 100 seconds for a full length CD, not so great.
I did some searching and came up with two other libraries that seemed well suited to the task of MP3 decoding, they were <a href="http://www.mpg123.de/">mpg123</a> (libmpg123) and <a href="http://www.underbit.com/products/mad/"><abbr title="MPEG Audio Decoder">MAD</abbr></a> (libmad). mpg123 had claims of being very fast, mad claimed it was very accurate.
<h3>Methodology</h3>
I built the two additional libraries with the default configuration options, except for libmad, which I added the <code>--enable-speed</code> option. With the help of example code I made programs out of each that were comparable to the first version for Core Audio. I.e. MP3 file in, 16-bit Linear PCM audio samples out.
To provide a benchmark I wrote a script that would run each of the three programs against a source MP3 file. Each program reported the elapsed time (via <a href="http://developer.apple.com/documentation/Darwin/Reference/ManPages/man3/time.3.html">time(3)</a>) and the processor time (via <a href="http://developer.apple.com/documentation/Darwin/Reference/ManPages/man3/clock.3.html">clock(3)</a>) when it finished decoding. The programs were run one after another on the source file 10 times. Their PCM output was written to a new file for each invocation.
<!--more-->
<h3>Environment</h3>
The tests were performed on my dual 1.8Ghz Power Mac G5 with 2Gb RAM running Mac OS X 10.5.2. I didn't make any special attempt to quit all other running programs while I ran the tests but iTunes was paused and nothing was doing anything significant in the background. I also didn't do anything on the computer while the tests were running.
<h3>Results</h3>
The results of the tests for my sample MP3s is below. The individual times were averaged over the 10 runs for each library. The standard deviation of the processor time is also included to given an indication of how consistent the decoding time was.
<h4>Small File</h4>
<!-- 1-08 Pushin.mp3 -->
<table class="left_headers">
<tr>
<th>Size</th><td>4,296,251 bytes</td>
</tr>
<tr>
<th>Bitrate</th><td>160 kbps</td>
</tr>
<tr>
<th>Channels</th><td>2 (Joint Stereo)</td>
</tr>
<tr>
<th>Length</th><td>3:34</td>
</tr>
</table>
<table class="top_headers">
<tr>
<th>Library</th>
<th>Average Elapsed Time</th>
<th>Average Processor Time</th>
<th>Processor Time Std. Deviation</th>
</tr>
<tr>
<td>mpg123</td>
<td>2.30</td>
<td>1.71</td>
<td>0.95</td>
</tr>
<tr>
<td>Core Audio</td>
<td>4.10</td>
<td>3.63</td>
<td>0.27</td>
</tr>
<tr>
<td>mad</td>
<td>4.80</td>
<td>4.42</td>
<td>0.02</td>
</tr>
</table>
<h4>Medium Mono File</h4>
<!-- Sensation Black 2007 Warm Up Mix.mp3 -->
<table class="left_headers">
<tr>
<th>Size</th><td>83,091,456 bytes</td>
</tr>
<tr>
<th>Bitrate</th><td>320 kbps</td>
</tr>
<tr>
<th>Channels</th><td>1 (Mono)</td>
</tr>
<tr>
<th>Length</th><td>34:37</td>
</tr>
</table>
<table class="top_headers">
<tr>
<th>Library</th>
<th>Average Elapsed Time</th>
<th>Average Processor Time</th>
<th>Processor Time Std. Deviation</th>
</tr>
<tr>
<td>mpg123</td>
<td>11.10</td>
<td>9.40</td>
<td>0.03</td>
</tr>
<tr>
<td>mad</td>
<td>26.90</td>
<td>24.60</td>
<td>0.03</td>
</tr>
<tr>
<td>Core Audio</td>
<td>33.60</td>
<td>30.33</td>
<td>1.18</td>
</tr>
</table>
<h4>Large File</h4>
<!-- 1-01 Clubbers Guide 2007 - CD 1.mp3 -->
<table class="left_headers">
<tr>
<th>Size</th><td>126,083,072 bytes</td>
</tr>
<tr>
<th>Bitrate</th><td>224 kbps</td>
</tr>
<tr>
<th>Channels</th><td>2 (Joint Stereo)</td>
</tr>
<tr>
<th>Length</th><td>1:15:02</td>
</tr>
</table>
<table class="top_headers">
<tr>
<th>Library</th>
<th>Average Elapsed Time</th>
<th>Average Processor Time</th>
<th>Processor Time Std. Deviation</th>
</tr>
<tr>
<td>mpg123</td>
<td>37.00</td>
<td>32.21</td>
<td>0.19</td>
</tr>
<tr>
<td>Core Audio</td>
<td>84.00</td>
<td>78.34</td>
<td>0.34</td>
</tr>
<tr>
<td>mad</td>
<td>100.10</td>
<td>94.44</td>
<td>0.15</td>
</tr>
</table>
<h4>Large VBR File</h4>
<!-- 1-01 Clubbers Guide to 2007 (AU Edition).mp3 -->
<table class="left_headers">
<tr>
<th>Size</th><td>123,028,672 bytes</td>
</tr>
<tr>
<th>Bitrate</th><td>210 kbps (VBR)</td>
</tr>
<tr>
<th>Channels</th><td>2 (Joint Stereo)</td>
</tr>
<tr>
<th>Length</th><td>1:17:46</td>
</tr>
</table>
<table class="top_headers">
<tr>
<th>Library</th>
<th>Average Elapsed Time</th>
<th>Average Processor Time</th>
<th>Processor Time Std. Deviation</th>
</tr>
<tr>
<td>mpg123</td>
<td>37.90</td>
<td>32.96</td>
<td>0.13</td>
</tr>
<tr>
<td>Core Audio</td>
<td>86.40</td>
<td>80.52</td>
<td>0.18</td>
</tr>
<tr>
<td>mad</td>
<td>104.30</td>
<td>98.43</td>
<td>0.10</td>
</tr>
</table>
<br />
<strong>Note:</strong> All files had a 44100 Hz sample rate.
<h3>Conclusion</h3>
The results speak for themselves, mpg123 lives up to its claim of being high performance. In all the tests it was consistently around twice as fast - a very impressive result. Its certainly the library I'll be using. Of note it also has quite a nice API. Of the three Core Audio's has the most tedious API but it is low level and capable of quite a bit more than what I was using it for.