Using Dirac with FMOD to Change Pitch and Speed of Audio in Real Time

This tutorial outlines a way to use Dirac to change pitch and speed independently of each other using FMOD to process and play back sound in real time. The example presented here uses FMOD’s streaming sound API but can be easily adapted for offline processing and other use cases as well.

Introduction

Over the past couple of weeks I have been contacted several times by people using the excellent FMOD library to take care of the audio part in their games and various other projects, asking me whether it was possible to use Dirac with FMOD. After testing and profiling a couple of ways to do it I came up with the following code, which I intend to make part of the regular Dirac distribution in one of the upcoming versions.

Where to Start From

The code used in our DiracFMOD project is based on the “dsp_custom” C++ example that ships with FMOD, so you might consider modifying that particular example in order to achieve the least amount of code changes.

Setting up the Project

First, we need to add the Dirac library and header file to the project. You can either just use the free DiracLE library for this, or contact me to get a free evaluation version of our Pro package which supports unlimited sample rates and an unlimited number of audio channels per instance. In any event, you will need to add the relevant library file, as well as Dirac.h. Note that if you use DiracLE, the project will refuse to work with audio files that contain more than 1 channel of audio.

Adding Accelerate on the Mac

If you, like me, are using a Mac you also need to add the Accelerate framework to your FMOD project in order for it to link without errors.

Modifying main.cpp

Once you’re done reconfiguring the project you need to make some changes to the code in your main.cpp file. To make it easier, you can get the entire modified main.cpp from this link.

The Challenge

Dirac can change the length of the sound, speeding a recording up or slowing it down. In fact, this is the main purpose of the Dirac library. This means that the length (speed) of the input stream can be different from the length of the output stream, so the input needs to be independent of the output with regard to the number of frames that we need to process on each call.

If, for instance, we time stretch a sound we read less data from the input than we generate at the output. However, FMOD doesn’t provide a way to request fewer input frames than it produces output frames, hence we need to get creative. FMOD does have a way to insert an arbitrary callback into its processing chain using the system call addDSP(), which is exactly what we need in order to use Dirac.

Also, FMOD offers a way to read an arbitrary portion of a sound from file using the Sound calls lock() and unlock(). Their use may seem a bit awkward as they are dealing with bytes and not frames and require all sorts of pointers to things, and they read the sound in raw PCM format and not in the IEEE754 single precision float format that Dirac requires. But a good engineer makes things he wants from things he can get, so we will use them anyway.

On the other hand, the good news is that FMOD does all the decoding and little/big endian byte swapping for us (with the exception of 32bit AIFF/WAV files that don’t seem to be handled¬† properly), so the only information we need to deal with is word length (how many bits per sample) and the number of channels. We don’t have to deal with low-level format/decoding issues. This is great!

So in order to “untie” our input and output streams we will produce the number of processed frames that FMOD requests for playback by calling DiracProcess(). On the input side, we will use Dirac’s data provider callback in combination with lock() and unlock() to obtain a piece of the input sound for processing.

In effect, the callback inserted into FMOD’s processing chain using addDSP() will call DiracProcess() to copy the processed sound into the output stream that is being played back in real time. DiracProcess() itself will fetch data from the original sound at an arbitrary rate, ie. whenever it needs it to do its magic.

The entire diagram of the graph then looks like this:

How it Works

In a nutshell, we’re doing the following in our FMOD project in order to use Dirac:

First, we set up the FMOD sound system by calling FMOD::System_Create() and system->init(). These calls set up and allocate the FMOD sound system and configure it so that it uses the recommended number of channels. Once we’ve done that we call…

result = system->getSoftwareFormat(&sampleRate, NULL, NULL, NULL, NULL, NULL);
ERRCHECK(result);

…in order to determine the sample rate that the playback system uses. Note that passing NULL causes FMOD to ignore the other parameters. A better approach would be to use the file’s sample rate to create the Dirac instance, but I have been unable to find a call that returns the sample rate of a Sound object in FMOD, so the system sample rate will have to do for now.

Note that unlike FMOD we do not use a resampler in our code, which means that if the file’s sample rate doesn’t match the system sample rate the result will be out of tune. We could use Dirac to correct for this if we wanted to, but since there doesn’t seem to be a recommended way to get the file’s sample rate we decided not to include that in our example code to make it more readable.

Now, a call to…

result = system->createSound(SOUNDFILE_PATH, FMOD_SOFTWARE | FMOD_LOOP_NORMAL, 0, &sound);
ERRCHECK(result);

…creates our sound object. SOUNDFILE_PATH points to the relevant file (due to the paths being different on Mac and Windows we use a macro for this). Once this is done, we can create and initialize our Dirac instance.

In order for the Dirac data provider callback to work we need to obtain and store a set of values that help describe what the data in our file looks like. Usually we would set up some member variables within our class for this, but since we’re within main() we don’t have a class context and therefore need to use a struct to keep these variables together. By using a struct we have a single pointer that we can pass to Dirac by which we can later access the individual variables.

So this is our struct:

typedef struct {
 unsigned int sReadPosition, sFileNumFrames;
 int sNumChannels, sNumBits;
 FMOD::Sound *sSound;
} userDataStruct;

It contains the current position that we read from in our input file, the entire duration of the file in frames so we can loop playback and wrap around at the correct position. It also contains information on the number of channels in the file and the number of bits, along with a pointer to the FMOD sound object that takes care of our input file.

Before creating Dirac, we fill this struct with the necessary information:

userDataStruct state;
state.sReadPosition = 0;
state.sSound = sound;
result = sound->getLength(&state.sFileNumFrames, FMOD_TIMEUNIT_PCM);
ERRCHECK(result);
result = sound->getFormat(NULL, NULL, &state.sNumChannels, &state.sNumBits);
ERRCHECK(result);

Finally, we can create our Dirac instance (we use the interleaved version as FMOD provides sound data in that format), kick off processing by adding our custom DSP callback using addDSP() and calling playSound() to start playback.

void *dirac = DiracCreateInterleaved(kDiracLambdaPreview, kDiracQualityPreview,
 state.sNumChannels, sampleRate,
 &diracDataProviderCallback, (void*)&state);

Note that we pass our state struct to Dirac when it is created so we can access it in our diracDataProviderCallback() later.

FMOD will now play back the file from disk. If bypass is off, our custom DSP callback is being called, which in turn requests a certain amount of audio data from Dirac. Dirac will generate that data, and in turn read data from the input sound whenever it needs it to create the time stretched output. In diracDataProviderCallback() we use lock() and unlock() to access the raw sound data (via a call to readFromSound()), and convert it to the float format required by Dirac by calling intToFloat().

Note that this lets you play back sounds in any of the sound formats that FMOD supports. However, with some formats FMOD’s getLength() seems to provide wrong return values for to the actual length of the file. You might need to work around this, or ask FMOD support for help in determining the correct length for files in MP3 and IMA ADPCM format.

I hope you enjoyed this tutorial on Dirac and FMOD. If you have any questions on this tutorial or the code please feel free to contact me via our contact form at any time.

About Bernsee
Stephan Bernsee is the founder and one of the authors / developers working at the DSP Dimension.

Comments are closed.