Sampling Sound in Windows 32

Sampling sounds in Windows 32 is relatively simple—once you know how to deal with asynchronous input. Here's how it works: Windows tells your sound card to start sampling the input from the microphone and store the samples in a client buffer. Once the buffer is full, Windows notifies the client, who is supposed to process the bufferful of data. There are several notification options. The simplest (but pretty much useless) is for the client to keep polling a flag until Windows changes its value. This of course eats up a lot of CPU time that could be spent doing some useful work.

The second option is to get notifications in the form of a Windows messages.

The third and, in my opinion, the best is the solution available only in Win32. The client creates a separate thread which is suspended waiting on an event. The event is triggered by Windows when the samples are ready. The client thread then wakes up, does the necessary work and goes back to sleep waiting for the next event.

I will describe the multi-threaded solution. If you haven't gone through my tutorial on threads, events and active objects, now's the time. Just click here and come back later.

Now that you're back, with the knowledge about active objects fresh in your mind, let me explain how to create a concrete active object that is responsible for processing sound samples. It waits for the buffer, calculates the Fourier transform of the samples and graphs the results. Since its main purpose is to display the results, I called it Painter. It doesn't talk directly to the Windows multimedia subsystem. That's the duty of another object, the Recorder.

Notice the two View objects passed to the Painter's constructor. These objects encapsulate the two panes used for displaying the sample data and its FFT. The sampling parameters are passed to the constructor as initial settings and to the ReInit method as new settings. The FFT transformer and Recorder are hidden inside smart pointers PtrFft and PtrRecorder. Finally, the Painter object contains two synchronization objects—a mutex to ensure serialized access to Painter's data and an event that is used for synchronization with the multimedia subsystem (that's the event we'll be waiting on). Notice that, as explained in the ActiveObject tutorial, the constructor of Painter has to call _thread.Resume () to start the execution of the captive thread.

class Painter: public ActiveObject
{
public:
    Painter (
        HWND hwnd,
        ViewWave & viewWave,
        ViewFreq & viewFreq,
        int samplesPerBuf,
        int samplesPerSec,
        int fftPoints);

    BOOL ReInit (
        int samplesPerBuf,
        int samplesPerSec,
        int fftPoints,
        int bitsPerSample);

    BOOL Start ();
    void Stop ();

    int HzToPoint (int hz)
    {
        Lock lock (_mutex);
        return _pFftTransformer->HzToPoint (hz);
    }

    int Points ()
    {
        Lock lock (_mutex);
        return _pFftTransformer->Points ();
    }

private:
    // Active object overrides
    void InitThread () {}
    void Run ();
    void FlushThread ();

    void LokWaveInData ();

private:
    ViewWave   & _viewWave;
    ViewFreq   & _viewFreq;

    int         _samplesPerBuf;
    int         _samplesPerSecond;
    int         _fftPoints;
    int         _bitsPerSample;

    HWND        _hwnd;

    Mutex       _mutex;
    Event       _event;

    PtrRecorder _pRecorder;
    PtrFft      _pFftTransformer;
};

Painter::Painter (
        HWND hwnd,
        ViewWave & viewWave,
        ViewFreq & viewFreq,
        int samplesPerBuf,
        int samplesPerSec,
        int fftPoints)
:   _hwnd (hwnd),
    _viewWave (viewWave),
    _viewFreq (viewFreq),
    _samplesPerBuf (samplesPerBuf),
    _samplesPerSecond (samplesPerSec),
    _fftPoints (fftPoints),
    _bitsPerSample (16),
    _pFftTransformer (fftPoints, samplesPerSec),
    _pRecorder (samplesPerBuf, samplesPerSec)
{
    _thread.Resume ();
}
                    

The Painter's implementation of ActiveObject::Run is pretty typical. It has an "infinite" loop which is exited only when the _isDying flag is set by the Kill method. The thread waits on the event until it is released—in our case by the multimedia subsystem. We check the state of the buffer and call LokWaveInData under the lock. By convention, the methods that are called under the lock have a prefix Lok (this is not Hungarian—the prefix has nothing to do with types).

The FlushThread method releases the event, so that the thread has the opportunity to run and check the _isDying flag. It's all pretty standard ActiveObject stuff.

void Painter::Run ()
{
    for (;;)
    {
        _event.Wait ();
        if (_isDying)
            return;

        Lock lock (_mutex);
        if (_pRecorder->IsBufferDone ())
            LokWaveInData ();
    }
}

void Painter::FlushThread ()
{
    _event.Release ();
}
                    

When the wave data is in, we create an iterator—sort of like a tape deck loaded with the "tape" of samples. After the iterator is created we immediately notify the recorder that we are done with the buffer. We then copy the data into the FFT transformer, transform it and update the two views.

void Painter::LokWaveInData ()
{
    SampleIter iter (_pRecorder.GetAccess());
    // Quickly release the buffer
    if (!_pRecorder->BufferDone ())
        return;

    _pFftTransformer->CopyIn (iter);
    _pFftTransformer->Transform();
    _viewFreq.Update (_pFftTransformer.GetAccess());
    _viewWave.Update (_pFftTransformer.GetAccess());
}

                    

Here's what the ViewWave does with the new data. It creates the canvas object (see canvas tutorial), clears the pane's rectangle by overpainting it black and puts the data into the polyline object. Before painting the polyline, it attaches the green pen (see pens tutorial) to the canvas. Notice that painting is not done in response to the WM_PAINT message. It is done every time new data is available.

void ViewWave::Update (Fft const & fftTransformer)
{
    UpdateCanvas canvas (Hwnd ());
    ClientRect rect (Hwnd ());
    canvas.ClearBlack(rect);
    int cMaxPoints = min (fftTransformer.Points(), 
                          _poly.Points());
    for (int i = 0; i < cMaxPoints; ++i)
    {
        int s = fftTransformer.Tape(i) / 512
              + (rect.bottom - 1) / 2;
        if (i >= rect.right)
        {
            _poly.Add (i, rect.right - 1,
                       (rect.bottom - 1) / 2);
        }
        else
        {
            if ( s < 0 )
                _poly.Add (i, i, 0);
            else if (s >= rect.bottom)
                _poly.Add (i, i, rect.bottom - 1);
            else
                _poly.Add (i, i, s);
        }
    }
    PenHolder pen (canvas, _penGreen);
    _poly.Paint (canvas, cMaxPoints);
}
                    

Recorder is the object that bridges the gap between the client code and the multimedia subsystem. It keeps a circular queue of buffers and WaveHeaders and keeps passing them to Windows. You see, when Windows wakes our thread to process data from one buffer, it has to have another buffer ready to be filled with incoming samples. The sound card can't wait while we are calculating the FFT and drawing the graphs. It keeps spitting samples at constant rate.

In our example, the Recorder keeps a pool of eight buffers (probably an overkill) and most of them spend most of their time under the control of the multimedia subsystem. Actually, all 8 buffers are allocated in one chunk of memory; it's the WaveHeaders that have pointers to appropriate areas of this chunk.

Take note of the fact that the method GetSample is defined as pure virtual. That's because the way samples are stored in the buffer depends on the number of bits per sample. GetSample will be implemented differently in different subclasses of the Recorder that we'll see in a moment.

Although we are getting closer to Windows API, there is still one more layer, the WaveInDevice, to protect us from calling it directly.

class Recorder
{
    friend class SampleIter;
    enum { NUM_BUF = 8 };
public:
    Recorder(
        int cSamples,
        int cSamplePerSec,
        int nChannels,
        int bitsPerSecond);

    ~Recorder();
    BOOL    Start (Event & event);
    void    Stop ();
    BOOL    BufferDone ();

    BOOL    IsBufferDone () const
    {
        return _header [_iBuf].IsDone ();
    }

    BOOL    IsStarted () const { return _isStarted; }
    int     SampleCount () const { return _cSamples; }
    int     BitsPerSample () const { return _bitsPerSample; }
    int     SamplesPerSecond () const { return _cSamplePerSec; }
protected:
    virtual int GetSample (char *pBuf, int i) const = 0;
    char * GetData () const { return _header [_iBuf].lpData; }

    BOOL            _isStarted;

    WaveInDevice    _waveInDevice;
    int             _cSamplePerSec;     // sampling frequency
    int             _cSamples;          // samples per buffer
    int             _nChannels;
    int             _bitsPerSample;
    int             _cbSampleSize;      // bytes per sample

    int             _cbBuf;             // bytes per buffer
    int             _iBuf;              // current buffer #
    char           *_pBuf;              // pool of buffers
    WaveHeader      _header [NUM_BUF];  // pool of headers
};

Recorder::Recorder (
    int cSamples,
    int cSamplePerSec,
    int nChannels,
    int bitsPerSample)
: _iBuf(0),
  _cSamplePerSec (cSamplePerSec),
  _cSamples (cSamples),
  _cbSampleSize (nChannels * bitsPerSample/8),
  _cbBuf (cSamples * nChannels * bitsPerSample/8),
  _nChannels (nChannels),
  _bitsPerSample (bitsPerSample),
  _isStarted(FALSE)
{
    _pBuf = new char [_cbBuf * NUM_BUF];
}

Recorder::~Recorder ()
{
    Stop();
    delete []_pBuf;
}
                    

To start the recorder, we first initialize the data structure WaveFormat that contains the parameters of our recording: number of channels, number of samples per second (sampling frequency) and number of bits per sample. We then check if the format is supported by the Windows sound input device. The WAVE_MAPPER device corresponds to Windows built-in mixer. Open Windows Recording Control to select the input you want to be displayed in the Frequency Analyzer.

Next, we open the sound input device to record data in a given format. We pass it the event that is to be used for asynchronous communication. That's the event the multimedia subsystem will trigger whenever a new buffer full of data is ready. And that's the event our captive thread is waiting on inside the Run method.

We initialize WaveHeaders one by one by attaching data buffers and calling the device to prepare them (whatever that means). We send all but one buffer to the device, so that it can make use of them to store a continuous stream of samples. We leave the last buffer unprepared, so that it's ready for recycling when the first buffer arrives with data. Finally, we tell the device to start recording.

BOOL Recorder::Start (Event & event)
{
    WaveFormat format (
        _nChannels,
        _cSamplePerSec,
        _bitsPerSample );

    if (!format.isInSupported(WAVE_MAPPER))
    {
        MessageBox (0, "Format not supported",
                       "Recorder", MB_OK);
        return FALSE;
    }

    _waveInDevice.Open (WAVE_MAPPER, format, event);
    if (!_waveInDevice.Ok())
    {
        char buf[164];
        if (_waveInDevice.isInUse())
        {
            strcpy (buf,
                    "Another application is recording audio."
                    "Stop recording "
                    "with this other application "
                    "and then try again.");
        }
        else
            _waveInDevice.GetErrorText (buf, sizeof (buf));
        MessageBox (0, buf, "Recorder", MB_OK);
        return FALSE;
    }

    // Don't initialize the last buffer
    // It will be initialized in the
    // first call to BufferDone
    for ( int i = 0; i < NUM_BUF - 1; i++ )
    {
        _header[i].lpData = & _pBuf [i * _cbBuf];
        _header[i].dwBufferLength = _cbBuf;
        _header[i].dwFlags = 0;
        _header[i].dwLoops = 0;

        _waveInDevice.Prepare (& _header[i]);

        _waveInDevice.SendBuffer (& _header [i]);
    }
    _isStarted = TRUE;
    _iBuf = 0;
    _waveInDevice.Start();
    return TRUE;
}

BufferDone is called whenever our captive thread is woken up and starts processing data from the buffer. The recorder keeps track of which buffer is current and it unprepares it (whatever that means). Then it takes the previous buffer from the circular queue, recycles it and sends it back to the device. The buffer on which we are currently operating will be recycled the same way next time BufferDone is called.

Notice what happens the first time around. Current buffer index _iBuf is zero. Previous buffer will therefore have index -1, or, after adjusting for the circularity of our queue, NUM_BUF - 1. It so happens that this is exactly the index of the buffer we left uninitialized in the Start method. We initialize it now and send it to the device. While we are processing buffer 0, the device is already busy filling buffer number 1, which becomes our current buffer after _iBuf is incremented. In fact, we gave the device seven buffers which it will fill one by one even if we don't call BufferDone on time. These are the emergency buffers—to be used in case our thread has to wait for its time slice longer than usual.

BOOL Recorder::BufferDone ()
{
    Assert (IsBufferDone ());
    _waveInDevice.UnPrepare (& _header [_iBuf]);
    int prevBuf = _iBuf - 1;
    if (prevBuf < 0)
        prevBuf = NUM_BUF - 1;

    // Next buffer to be filled
    _iBuf++;
    if ( _iBuf == NUM_BUF )
        _iBuf = 0;

    _header[prevBuf].lpData = & _pBuf [prevBuf * _cbBuf];
    _header[prevBuf].dwBufferLength = _cbBuf;
    _header[prevBuf].dwFlags = 0;
    _header[prevBuf].dwLoops = 0;
    _waveInDevice.Prepare (& _header [prevBuf]);

    _waveInDevice.SendBuffer (& _header [prevBuf]);
    return TRUE;
}

void Recorder::Stop ()
{
    _isStarted = FALSE;
    _waveInDevice.Reset ();
    _waveInDevice.Close ();
}
                    

Below are two subclasses of Recorder with their own implementations of GetSample. The first one is used for mono recordings with the accuracy of 8 bits per sample and the second one for mono recordings using 16 bits per sample.

Each byte in the 8-bit recording corresponds to one sample. There are only 256 possible values of the sample and they are biased, so that no signal at all corresponds to all samples being equal to 128. We subtract the bias from the sample and multiply the value by 64 to normalize its range to that of a 16-bit recording.

In a 16-bit recording, each pair of bytes stores a signed number corresponding to a single sample. We cast the buffer to a pointer to short and simply access it as an array of (signed 16-bit) shorts.

Similarly, for 8-bit stereo we would return the sum of two channels as

    (pBuf[2*i] + pBuf[2*i+1] - 2 * 128) * 128;
and for 16 bit stereo as
    ( ((short *) pBuf)[2*i] +  ((short *) pBuf)[2*i+1] ) / 2
Of course, you can build a stereo iterator that would separately return the left channel sample and the right channel sample.
class RecorderM8: public Recorder  // 8 bit mono
{
public:
    RecorderM8 (int cSamples, int cSamplesPerSec)
    : Recorder (cSamples, cSamplesPerSec, 1, 8) {}
protected:
    int GetSample (char *pBuf, int i) const
    {
        return ((unsigned char)pBuf[i] - 128) * 64;
    }
};

class RecorderM16: public Recorder  // 16 bit mono
{
public:
    RecorderM16 (int cSamples, int cSamplesPerSec)
    : Recorder (cSamples, cSamplesPerSec,  1, 16) {}
protected:
    int GetSample (char *pBuf, int i) const
    {
        return ((short *) pBuf)[i];
    }
};

We went through the trouble of subclassing the Recorder so that we can use one universal iterator to access any recording. The iterator just calls the virtual method of the recorder to decode a given sample. Of course, if you wanted to restrict your recordings to 16-bit only, you could build the decoding into the iterator and forget about subclassing the Recorder. It would save you one virtual call per sample. When I did the profiling of the Frequency Analyzer, I found out that the time it took to decode the samples was infinitesimal in comparison with all the complex number multiplications done in the FFT. So I didn't bother optimizing that part.

class SampleIter
{
public:
    SampleIter (Recorder const & recorder);
    BOOL AtEnd () const { return _iCur == _iEnd;}
    void Advance () { _iCur++; }
    void Rewind () { _iCur = _iEnd - _recorder.SampleCount(); }
    int  GetSample () const 
    {
        return _recorder.GetSample(_pBuffer, _iCur);
    }
    int  Count () const { return _recorder.SampleCount(); }
private:
    char       *_pBuffer;
    Recorder const & _recorder;
    int         _iCur;
    int         _iEnd;
};

// Call BufferDone after creating the iterator
SampleIter::SampleIter (Recorder const & recorder)
: _recorder (recorder), _iCur(0)
{
    _pBuffer = recorder.GetData ();
    _iEnd = recorder.SampleCount();
}

Finally we are getting down to Windows API. Class WaveFormat is a simple encapsulation of structure WAVEFORMATEX used by Windows. It has a method to check whether a given format is supported by a given input device. Similarly, WaveHeader is a thin veneer over Windows' WAVEHDR.

class WaveFormat: public WAVEFORMATEX
{
public:
    WaveFormat (
        WORD    nCh, // number of channels (mono, stereo)
        DWORD   nSampleRate, // sample rate
        WORD    BitsPerSample)
    {
        wFormatTag = WAVE_FORMAT_PCM;
        nChannels = nCh;
        nSamplesPerSec = nSampleRate;
        nAvgBytesPerSec = nSampleRate * nCh * BitsPerSample/8;
        nBlockAlign = nChannels * BitsPerSample/8;
        wBitsPerSample = BitsPerSample;
        cbSize = 0;
    }

    BOOL isInSupported (UINT idDev)
    {
        MMRESULT result = waveInOpen
            (0, idDev, this, 0, 0, WAVE_FORMAT_QUERY);
        return result == MMSYSERR_NOERROR;
    }
};

class WaveHeader: public WAVEHDR
{
public:
    BOOL IsDone () const { return dwFlags & WHDR_DONE; }
};

Class WaveInDevice combines a bunch of Windows API that deal with sound input devices. These functions share the same prefix waveIn, somehow suggesting their grouping into a single class. Notice how we are calling the function waveInOpen with a handle to an event and the flag CALLBACK_EVENT that tells the multimedia subsystem that we are expecting to be notified by its triggering this event.

class WaveInDevice
{
public:
    WaveInDevice ();
    WaveInDevice (UINT idDev, WaveFormat & format, Event & event);
    ~WaveInDevice ();
    BOOL    Open (UINT idDev, WaveFormat & format, Event & event);
    void    Reset ();
    BOOL    Close ();
    void    Prepare (WaveHeader * pHeader);
    void    UnPrepare (WaveHeader * pHeader);
    void    SendBuffer (WaveHeader * pHeader);
    BOOL    Ok () { return _status == MMSYSERR_NOERROR; }
    void    Start () { waveInStart(_handle); }
    void    Stop () { waveInStop(_handle); }
    BOOL    isInUse () { return _status == MMSYSERR_ALLOCATED; }
    UINT    GetError () { return _status; }
    void    GetErrorText (char* buf, int len);
private:
    HWAVEIN     _handle;
    MMRESULT    _status;
};

inline WaveInDevice::WaveInDevice ()
{
    _status = MMSYSERR_BADDEVICEID;
}

inline WaveInDevice::WaveInDevice (
    UINT idDev, WaveFormat & format, Event & event)
{
    Open (idDev, format, event);
}

inline WaveInDevice::~WaveInDevice ()
{
    if (Ok())
    {
        waveInReset (_handle);
        waveInClose (_handle);
    }
}

inline BOOL WaveInDevice::Open (
    UINT idDev, WaveFormat & format, Event & event)
{
    _status = waveInOpen (
        & _handle,
        idDev,
        & format,
        (DWORD) (HANDLE) event,
        0, // callback instance data
        CALLBACK_EVENT);

    return Ok();
}

inline void WaveInDevice::Reset ()
{
    if (Ok())
        waveInReset (_handle);
}

inline BOOL WaveInDevice::Close ()
{
    if ( Ok() && waveInClose (_handle) == 0)
    {
        _status = MMSYSERR_BADDEVICEID;
        return TRUE;
    }
    else
        return FALSE;
}

inline void WaveInDevice::Prepare (WaveHeader * pHeader)
{
    waveInPrepareHeader (_handle, pHeader, sizeof(WAVEHDR));
}

inline void WaveInDevice::SendBuffer (WaveHeader * pHeader)
{
    waveInAddBuffer (_handle, pHeader, sizeof(WAVEHDR));
}

inline void WaveInDevice::UnPrepare (WaveHeader * pHeader)
{
    waveInUnprepareHeader (_handle, pHeader, sizeof(WAVEHDR));
}
inline void WaveInDevice::GetErrorText (char* buf, int len)
{
    waveInGetErrorText (_status, buf, len);
}

There's one more thing: the program has to make sure that the computer has a sound card. This is done by calling waveInGetNumDevs and then checking the capabilities of the appropriate device (in our case device 0, the mike).

if (waveInGetNumDevs() == 0)
    throw WinException ("No sound card installed !");

WAVEINCAPS waveInCaps;
if (waveInGetDevCaps (0, 
                  &waveInCaps, 
                  sizeof(WAVEINCAPS))
               != MMSYSERR_NOERROR)
{
    throw WinException ("Cannot determine "
                        "sound card capabilities !");
}

// waveInCaps.dwFormats contains information 
// about available wave formats.

Now you have all the information to build your own app that could, for instance, display the waveform of the sound in a little window, or measure its intensity by summing up the last 100 samples, etc. Your time and imagination will be the only limiting factors.