PART 2: In the concluding part of this short series, our intrepid MIDI musicians turn their attention to the creation of sound effects and the final mix... By PAUL D LEHRMAN with STEVE OLENICK. This is the last article in a two‑part series.
Last month we looked at the beginnings of the audio production for an ambitious three‑part television documentary called Blood and Iron: The Story of the German War Machine. I told you how I bid for the project as a medium‑budget all‑MIDI production, using sequencers and samplers for both the music and effects tracks. I did the music tracks myself, mixing them into mono with a bit of stereo reverb, in my home studio on a Kurzweil K2000 and a handful of other synth modules. For the effects tracks, I sub‑contracted sound designer Steve Olenick, also a K2000 owner.
The Sound Effects
Steve realised early on in the project that efficiency would be of the utmost importance if Blood and Iron was to be finished on schedule. With close to three hours of silent black‑and‑white footage and stills, much of which required effects, this would be a massive undertaking. Flexibility was also a major goal: since no one was going to hear all the audio elements (effects, music, narration, dialogue, and sync sound) together until the final mix, the ability to tweak individual events during the mix — without having to go back to the original sources — would be essential.
Although the traditional way of creating layered effects tracks is with multitrack tape or multiple reels of magnetic film, Steve and I wanted to try to create the tracks in a MIDI environment. The Kurzweil sampler could play up to 24 effects 'tracks' (i.e. voices) at a time, and by controlling them with a sequencer, last‑minute changes in timing, level, or even sound could be made without disturbing anything else.
Steve and I met on several occasions with New York producers Robert Ross and Herb Krosney, to discuss the creative approach to take for the effects. How many layers of effects would we use? Where would we leave out the effects and let the music or voice‑over alone carry the scene? How realistic should the effects be? To do a multi‑layered effects track with the depth of a feature film was impossible in the time available — not to mention the fact that this would have taken us way over budget. On the other hand, we didn't want to take the typical war program approach of simply putting in spot effects for foreground events and leaving the rest silent. Ambience, we felt, was crucial to setting this series apart from the standard documentary relying on black‑and‑white silent footage.
We also wanted to create an ebb‑and‑flow effect, to help keep the viewer's attention. The contrast between scenes with effects and those without, if handled properly, could have a strong emotional impact. A scene of people being put in trains to be shipped to concentration camps, for example, would much more effective without jangling train wheels or crowd sounds — the ominousness of the event would be better conveyed with music and narration alone. In addition, to create a consistent stylistic approach, only motion picture scenes would have effects under them — still pictures, no matter how ingeniously the camera moved over them, would have only music and voice‑overs.
The next step was to find, listen to, log, and categorise the sounds. To help with this phase (and, as it turned out, later phases as well) Steve hired David Williams, a radio and video editor/producer who had worked with Steve recently on a large multimedia project. They culled sounds from CD libraries, LP libraries, sampler disks, and also sounds provided by the series' producers from some of their un‑edited footage that had sync sound.
Using a sampler for the effects gave a lot more 'bang‑per‑megabyte' than a hard‑disk recording system would. In a sampler, sounds can be modified and disguised using pitch changes, filters, loops, and envelopes, so that one good ambience, for example, can be used many different times without sounding repetitious. The down side is that the sampling process itself takes time — the same amount of time it might take to record directly to tape. But the added flexibility, plus the fact that any samples gathered can be stored permanently in the sampler's native format for use on future projects, more than makes up for the extra effort.
All of the chosen effects (and there were several hundred) were copied to DAT, and their locations on the tape logged into a custom spreadsheet Steve created in Microsoft Works. The spreadsheet could then be studied and re‑arranged to put the sounds in the order in which we wanted to sample them. A second spreadsheet was created that displayed sample names, DAT locations, original source locations (which CD or LP, which track, and so on), sample‑key assignments, root notes, keymap names, and program names.
The Samplers
The major instrument used in the process was a Kurzweil K2000S sampler. Steve's K2000 had 40Mb of RAM available, and he calculated that he could put one episode's worth of sounds into the machine at a time, loading other sounds in for each show. His external storage medium was a 128Mb removable magneto‑optical drive.
At the time of production, the Kurzweil had a very clumsy operating system, which only allowed entire banks of samples, keymaps, and programs to be loaded or saved to disk at a time, and made moving samples around in memory, or loading individual samples or keymaps into an existing program, impossible. (Version 3 of the operating system is now shipping, and the file handling has been vastly improved.) Thus we decided to do the original sampling on my Roland S750, since its file structure was far more flexible, and its disks are readable — surprisingly well — by the K2000. This way we could save files from the Roland at each step of the way on disk, as well as saving the finished banks from the K2000.
Hooking the magneto‑optical drive up to both samplers worked beautifully — the drive itself, with both samplers plugged in simultaneously, became the network between the samplers. Some experimentation was needed to determine in which order the various units should be turned on, so that they wouldn't crash (and to discover that the cartridge had to be ejected after powering on, and then re‑inserted), but once that was over the system worked consistently.
Sounds from the various sources were sampled into the S750 and saved as individual samples and as keymaps on a Roland‑formatted cartridge. Those files were then opened by the K2000 and loaded into the Kurzweil's RAM. In the K2000, the samples and keymaps were arranged in programs and banks, and then a Kurzweil‑formatted cartridge was put into the drive and the banks were saved to it.
Besides the S750's better file structure, there were two other advantages to this method. Firstly, the S750 allows the use of a full‑screen color monitor (the K2000 only has an LCD display), and this made the sampling, initial editing, and mapping much faster. Secondly, the samples could all be recorded on the Roland at 22.05 kHz, which stretched the available RAM considerably. The K2000 can play and edit samples at that rate, but not record them.
Once the source DAT tapes and the sampling spreadsheet were done, the sampling began. This was a fairly mechanical process. Some creative decisions were made during sampling, but for the most part it was a simple matter of cueing up the DAT tape, naming a sample in the Roland, setting level, recording the sample, and assigning it to a note range. Each keymap took up one octave.
The Roland keymap was then imported into the K2000 as a 'layer'. Three layers, each one octave in range and located adjacent to each other on the keyboard, made up one Kurzweil program. The mappings were set up to be as intuitive as possible, since the effects were actually going to be played from the keyboard.
For the most part, Steve and David felt that the sounds were too bright and had too much presence to accompany the older footage. Rather than filtering or re‑sampling them, however, Steve used the Kurzweil's extensive real‑time DSP facilities to give them the right 'dark' qualities. While working so hard to collect the samples with high fidelity, it occurred to Steve that they would make an excellent library for future use. Why ruin them when the next documentary might be able to take advantage of them at full bandwidth?
The spreadsheet was then printed out and taped up above the keyboard. Little by little, we became familiar with the notes to play for given sounds and never had to scratch our heads in confusion (well, hardly ever).
Sequencing
The next creative step was placing the sound effects into Performer. The sequencer was set up to slave to the 3/4‑inch video tape from the producers, which had been striped with timecode and had window burn inserted. Each scene was then run repeatedly, and effects were played into the sequencer on the Kurzweil keyboard in real time. Here's an example of how it was done: say, for example, a factory scene needed an ambience track running through it. On the sequencer, the 'Industrial Ambience' track (see Figure 2) is placed in record‑enable mode, sounds are auditioned on the keyboard against picture, and after the decisions are made, the scene is run again, with the sequencer recording the keyboard.
Any mistakes could be edited immediately. An effect could be moved, shortened or lengthened, pitch‑shifted, or have its volume (velocity) changed. In many cases a known amount of backwards shift (Steve's reaction time) was applied to the track right after recording. Next, to add specific sounds of machinery to accompany visual events, Steve would record‑enable the 'Industrial Spots' track, play a couple of spot effects in, and then adjust their placement. This continued until we were satisfied with the whole scene.
On occasion, none of the samples we had recorded were right for a particular visual event, and so a new sound had to be found, sampled and placed directly into a keymap in the K2000. Having anticipated this, however, we had left vacant slots in keymaps and merely had to assign the sample to one of those slots. We did run out of sample memory at a couple of points, but we were able to find unused samples that could be deleted, or loops that could be shortened to make more RAM available. When all of the shows were done, we realised we could fit the sounds for all three of them into the 40Mb of RAM available.
I had grouped the instrumental families in the music tracks to my Kurzweil's four outputs by assigning individual MIDI channels to specific outputs. Steve assigned his Kurzweil patches to the outputs the same way. The first group of four patches, on the first four MIDI channels, went to the left 'A' output of the K2000. The next group went to right 'A', the third group to left 'B', and the final group to right 'B'. We were still up in the air as to exactly how the final mix would be accomplished, but this scheme would give us maximum flexibility, regardless of whether we were going to tape, to a hard‑disk system, or directly to the video master. Each group could have its own track or fader, all carefully documented on the spreadsheet, and that would greatly facilitate the mixing process.
Getting It Together
Originally, we fantasised about doing the final post‑production of Blood and Iron without ever going to tape. The posting was to take place at This Way Productions, a New York studio owned by Dan Caccavo, with whom I have had a long association. Besides the music and effects, other audio elements that had to be dealt with were narration, dialogue (mostly historical quotes read by actors), and the sync sound accompanying some of the original footage. We first proposed putting everything into Dan's Digidesign Pro Tools system, recording the music and effects in Boston with a Sound Tools system and bringing the disks to New York, but we realized we would need 12 tracks (his system had only eight) and significantly more disk space than he had available.
The next idea was to put only the voices and sync sound into Pro Tools, and do the music and sound effects directly from the two K2000s (and the rest of my MIDI gear) as a 'virtual multitrack' mix in Dan's studio. A sequencer's lock‑up time is usually much shorter than a tape deck's, which was a big plus. We would need to bring two more Macs (to add to the one running Pro Tools, and the one running Dan's console automation), one for the samplers and the other for the music, with everything locked to timecode.
As we became closer to finishing the audio, however, we had to reject this concept, for a number of reasons. First, I realised that mixing the music at the same time Dan was mixing the rest of the show would make the sessions chaotic, to put it mildly. Second, Cindy Kaplan‑Rooney, the film editor on two of the three episodes, expressed considerable fear that something costly might go wrong. "I know it's supposed to work," she said, "and I trust you guys, but I've never seen it work, and so I'm nervous." Since we had never really seen it work on this kind of scale either, we weren't going to argue with her.
Steve had the most compelling reason: in the stop‑and‑go process of mixing, there would inevitably be times when a long sampled sound was supposed to start just before the point where we were rolling tape. In that case, one of two things would happen: either the sound would not play, or if the sequencer's 'note‑chasing' feature was on, the sound would start from the beginning as soon as we rolled tape. Thus we would be hearing either the wrong part of the sound, or a loop that was out of sync, which meant it would be impossible to hear the mix as it really was unless we always played each segment of the program all the way from the beginning.
Adding to the unease was the fact that, due to changes in the production schedule, Steve would not be able to come to New York for the mixing session. If there were some kind of MIDI glitch and an effect ended up missing or in the wrong place, there was a strong likelihood no one would notice it — until, of course, the show was on the air. David, the associate sound designer, would be there for the mix, helping make sure the balances were correct, but as he was not intimately familiar with the sequences, he might well miss any MIDI problems.
So we decided to use multitrack tape. Our first idea was to use multiple timecode DATs, but this would take a lot of extra time, and the cost of hiring four or more extra decks in New York (Dan was already hiring one for printing the final audio) would be prohibitive. We investigated hiring an Alesis ADAT, but since none of us had used one at the time, we were a bit nervous about the complexities of slaving it to timecode (using the outboard Big Remote Control). We were also slightly put off by the format's tape‑length limitation: only about 40 minutes can be recorded on a cassette, which meant that each show would have to be broken up into at least two reels, and finding break points in all the shows that worked for both effects and music might not be easy.
We ended up with a Tascam DA88, which had an SY88 sync card already installed. An all‑in‑one unit, it promised hassle‑free synchronisation, and records over 100 minutes on each Hi‑8 cassette. We found one at a local music store actually hiring for less than the Alesis system, and we snapped it up. It turned out to be an excellent choice.
Getting started on the DA88 was not quite as easy as we would have liked. Each new tape used in the deck has to be 'formatted', which involves running it once from end to end. Fortunately, the machine can also generate a SMPTE timecode stripe, starting at any frame number you like, on an invisible track while it formats. (It can also record audio at the same time, but this seemed unnecessarily risky.) Unfortunately, the manual does a miserable job of describing the procedure — but after a few tries we managed to get it right.
After the timecode was recorded, I recorded the music, locking my sequencer to the DA88's SMPTE. Although I had combined my shorter sequences into longer ones, there were still some very fast transitions that needed to be made between sequences, and even some overlaps. Resisting the temptation to test the machine's digital punch‑in capability, I used alternating pairs of tracks: 1 and 2 for the first cue, 3 and 4 for the next, and so on. In the mixing studio, the faders could simply be left on for all four tracks, because the unused tracks at any time were truly silent. As I was mixing the music, Cindy called from New York and told me where the commercial breaks were going to fall in the American broadcast version. Where I could, I recorded a second version of the cues surrounding the breaks on the second pair of tracks, with appropriate fade‑outs and ins. All three shows were printed to tape in three days.
Then the DA88 was carried across town to Steve's studio. He plugged in his Mac and K2000 and laid down the effects directly from the sampler's outputs onto the four remaining tracks of tape, with no reverb or other processing. Co‑producer Herb Krosney flew up from New York for the effects‑laying session. This would be the first time that any of us would hear the music and effects combined. Steven summed all his outputs into mono for monitoring, and adjusted the levels going to the tape as if he was doing a mix — even though they were going onto separate channels on tape.
Herb had a few small suggestions, but essentially he was extremely pleased with what he heard (as were we). The effects for all of the shows went down in one long afternoon. Steve's track sheets now reflected what was on the tape, and they came down with us to New York, so that Dan could make use of them.
Final Mix
At This Way Productions, located in a hiply refurbished old warehouse in the bustling and trendy part of downtown Manhattan known as SoHo, all of the audio tracks were put together. The narration and dialogue voice‑over tracks had been recorded elsewhere, and placed on timecode DAT. Dan loaded each segment, with its associated start time, into his Pro Tools system. The sync sound segments were loaded in the same way.
A JVC 3/4‑inch U‑Matic video player supplied the master picture and timecode. The Tascam DA88, containing all of the music and effects tracks, was put into slave mode, as was Pro Tools. An Adams Smith Zeta 3 synchroniser distributed the transport commands and timecode, and also converted it to MIDI Time Code for yet another Macintosh — the one running the studio's MegaMix automation system installed in its Soundcraft 1600 desk.
In attendance were Dan and his assistant engineer Bill Kreth, co‑producer Robert Ross, editor Cindy Kaplan‑Rooney, associate sound designer David Williams, and myself. We were all confident that we would be out of there in three days. Alas, it was not to be. Part of the slow progress could be blamed on the degree of flexibility we had built into the process, and the inevitable "Let's try it this way!" and "Let's keep on it until it's perfect!" syndromes which this flexibility fostered. For example, since none of us had heard a complete mix before, there were a few spots where the music or effects 'stepped on' a voice‑over, or vice versa. A discussion would ensue over which element should be brought up, or moved in time, and sometimes several different solutions were tried. Consensus was not difficult to achieve — everyone usually agreed readily when the right solution was found — but such creativity takes time. What was most surprising to all of us was how well everything fitted together, even though we had all been working essentially in isolation.
A far more serious obstacle arose when we discovered that many of the voice‑over segments had nasty little 'clicks' at the beginning. Dan worked out that this had occurred when the studio who had placed the cues on DAT inserted (without being asked) a very fast, but not very good, noise gate into the signal chain. Some of the clicks could be eliminated by putting a volume slope on the cue in Pro Tools, while others were better served with a fast automation move. The process was not difficult, but it was extremely tedious.
The DA88 performed flawlessly, locking up in a few seconds just about every time the video‑tape started. The automation also worked perfectly, and Pro Tools — especially in Dan's extremely knowledgeable hands — proved itself the ideal tool for dialogue editing.
After the third day, I had to get back home to Boston, and two days later David did the same. The mix continued well into a sixth day, but the Boston contingent did not worry unduly about not being there: between Cindy's, Robert's, and Danny's excellent ears and sensibilities — as well as Steve's comprehensive cue sheets — we considered the project to be in good hands.
On the sixth day, Dan produced the music‑and‑effects‑only version ('M+E'). Producing this version wasn't just a question of eliminating the voices and leaving the other levels where they were, nor did it mean creating a mix that put the music and effects out front and pretended there were no voices at all — there was a middle ground he had to find. But since everything was automated, it was a straightforward matter to tweak the automation files so that the vocals were turned off, and the other levels brought up a modest amount and re‑balanced. On the seventh day, everybody rested!
The first copies went out to the networks in the US and Europe who had originally sponsored the series. The next ones went to a distribution company who were hoping to sell the series to other networks around the world. As of this writing, the show has aired on the Discovery Channel in Europe [in May 1994; there is also a possibility that the show will be repeated], while the French broadcast and American cablecast are scheduled for early 1995.
Robert Ross and myself recently spoke on the phone, and he paid us what might be the ultimate compliment from a producer: "I wish I had allowed more time in the film for the music and sound to come forward," he said. "They often did a better job of setting the scene than the narration did."
We can't wait for Krossfire's next project. Hopefully it will be something a little more cheery. Maybe something about Canada's long‑standing successful multi‑lingual culture?...no, that won't work; how about the joys of mutual‑fund investing in post‑Communist Russia?...no, not that either; well, how about...
Sequenced Sound Effects: Beyond Tape Editing
Using a sequencer for effects offered us a wide variety of editing functions. We could move events around using the graphic screen display, or type in new time‑code locations in a list, or highlight groups of events and use regional commands to adjust parameters. If a sound was in the right place, but it was the wrong sound, we could click on the note and drag it up or down in the graphic display until the correct sound was heard. The placement could be locked during this drag, so that only the note changed, not the time‑code location.
One of the banes of sound editors' lives has always been film editors re‑cutting a reel at the last minute and sending it over to the sound room for re‑synchronisation. In a MIDI environment, this is no problem: we could simply select all of the events after an edit, and move them backwards or forwards by the appropriate number of frames in a single operation. No razor blades, destructive punching, or losing tiny little bits of tape on the floor. The editors in New York could make all the changes they wanted, and we could adjust the effects tracks in literally a couple of minutes.
Blood & Iron Background
Blood & Iron is a three‑part series exploring the origins of Germany's aggressive military history, which has been closely associated with its industrial development. Programme One focuses on the factors which made Germany into a forceful presence at the armaments stage, leading to the First World War. Programme Two looks at the build‑up to World War II and at how the nation's aeronautical genius was harnessed to ensure air supremacy. Programme Three explores Germany's immediate post‑second World War history.
Blood & Iron was shown on satellite's Discovery Channel from Wednesday 18th May. There is a possibility that it will be repeated in the future, so keep an eye on your TV listings magazine if you have satellite.