ACE Studio is an AI‑powered virtual vocalist that’s frighteningly close to the real thing.
While virtual drummers, bassists, pianists or guitarists have already passed the point where they can generate a believable impression of a human player, virtual vocals have, for understandable reasons, proved more challenging. Vocaloid has been a long‑standing option for a more synthetic style. However, as I experienced when reviewing Dreamtonics’ Synthesizer V in the March 2023 issue of SOS, on the back of advances in AI things have started to move very rapidly when it comes to more natural‑sounding results. Synthesizer V is not the only product riding this wave, though: ACE Studio is also generating a lot of interest. So, if you could use a virtual session singer within your own music production workflow, should you be booking ACE Studio for an audition?
ACE Concepts
ACE Studio operates as a standalone application. However, the ACE Bridge plug‑in (AU and VST3) can be loaded within a suitable DAW, and this provides both sync and audio routing from ACE Studio back to your DAW. The package currently offers some 40+ AI‑based vocalists, each with their own singing style. The AI engine supports Chinese, Japanese and English for all vocalists but each vocal database is ‘native’ in one of these and there are currently five native English singers; Lien, Bianca, David, Naples and Sidney.
The ACE Bridge plug‑in allows you to easily sync ACE Studio with your DAW.
Transport and menu bar aside, the UI offers three main sections; the Singer Library, Arrangement View and Clip View. Located far left is the Singer Library containing a list of available singers. The other two views dominate the rest of the display. At the top, the Arrangement View provides an overview of the various Audio Tracks (for example, to house an instrumental backing track) and Singer Tracks within the current project. Singer Tracks are essentially hosts for the MIDI‑based clips upon which the voice synthesis is based. The contents of individual clips — audio or MIDI — can be shown in the lower Clip View panel and it’s here that the key tools for editing notes, adding lyrics, customising the pitch and automating the singer’s dynamics/character (via a parameter panel at the very base of the display) are to be found.
Two additional panels, the Track Control Panel and the Mixer, can be popped open as required using the buttons located top left. The latter is self‑explanatory (and shown in one of the screenshots) while I’ll say more about the former below.
A couple of other practical points are worth noting. First, ACE Studio is currently only available via a subscription model. An active subscription gives you access to all of the available singers. Second, and in contrast to the difficulties that might arise in using AI‑generated vocals from some online sources (where the copyright of the AI’s training materials may cause issues), the AI vocals you create with ACE Studio are completely licence free to use within any commercial context. There are a few singer exceptions that are highlighted within the Singer Library panel and an online licensing process is available for these.
Studio Workflow
Used as a standalone application, the most obvious workflow involves importing some sort of audio backing track for your song project to use as context for any vocal parts you wish to generate. Once you have constructed whatever vocal parts you need — and these can involve multiple Singer Tracks — those vocals can be exported as audio files to use within your DAW for mixing/arranging alongside the rest of the instruments within the project.
Having selected an AI singer for a Singer Track, you can manually enter notes with the various note editing tools or import a MIDI file (for example, if you have already written an initial melody line within your DAW). At present, it doesn’t seem possible to record MIDI data directly into ACE Studio. There is a third option for creating your initial ‘MIDI with lyrics’ data. It’s included within the software but currently classed as a beta feature. It is, however, something quite special and I’ll come back to it later.
Whether you have added new notes within the Clip View, or edited existing ones (pitch, lyrics or other parameter adjustments), when you initiate playback, ACE Studio has to pre‑render any changes. A technical point is worth noting here; that rendering is done in the cloud via ACE Studio’s own servers. That obviously requires an active Internet connection (and an active subscription). The process is fairly rapid but it does involve a short wait each time the vocal is reprocessed.
There are some really nice touches with the note editing process itself. For example, within a note, shading indicates the portion of each note’s length set aside for either consonant or vowel sounds, and you can drag with the mouse to adjust this to manipulate the way a word or phoneme is sounded. You can, of course, also edit the phonemes themselves (if you zoom in close enough, these are shown above each note). You can create/edit a user pitch curve if you want to adjust that generated by the AI engine. The mouse‑based graphical editing of pitch modulation and vibrato are particularly well designed and incredibly easy to use.
ACE Studio’s pitch modulation and vibrato editing tools are very well designed.
Regardless of which virtual singer is used, you get six parameters — Breath, Air, Falsetto, Tension, Energy and Formant — that you automate via the parameter panel at the bottom of the Clip View. Breath allows you to add breaths into the performance for added realism, while Formant lets you adjust the ‘gender’ of the voice. Used subtly, it can provide a useful shift in character if required. The other four parameters allow you to add various types of expression or character to the performance to squeeze as much ‘human’ out of the engine as you possibly can.
Three other workflow features are also worth noting. First, ACE Studio offers a very effective Vocal Double feature. This automatically creates two additional versions of the selected track on two new Singer Tracks. These will have timing and pitch variations applied and you can adjust the tightness of the double tracking before generating the new tracks. It works really well and it’s impressively easy to use. After generating the doubles, you can, of course, then change the singer used or adjust the pitch to create harmony parts.
Second, the Track Control Panel I mentioned earlier provides the very interesting Voice Mix feature. Essentially, this allows you to blend multiple Singers together to create a composite voice, with control over the level of Style and Timbre characteristics used from each voice. This means you can create unique voice styles to suit your needs and then save your creations as a new voice preset.
Third — and the special beta feature I mentioned earlier — is an alternative workflow for creating the MIDI note/lyric data for a Singer Track. If you import an audio file that contains a guide vocal (cleanly recorded without reverb or delay effects or vocal harmonies), you can drag it onto a Singer Track. ACE Studio will then analyse it and convert it into a MIDI clip, add the required MIDI notes at the appropriate pitch, and attempt to add the lyrics from the audio. If you are happy to record a scratch demo of your vocal, even if your own performance is not fit for public consumption, this is a workflow that offers a massive time saver over manually entering notes and lyrics, even if some subsequent editing is still required. And, of course, your demo vocal can then be sung by any of the 40+ AI singers (male or female), and pitch, lyrics and delivery style can all be adjusted to suit the song’s needs.
The option to automatically convert an audio vocal into an AI vocal provides an impressive workflow option, and it’s quite remarkable to see how well it works.
Get Real
The Voice Mix panel lets you create unique blends for the included AI singers.
So, the feature list is impressive, but how good are the actual results? Well, I’ll qualify my comments with an acknowledgement that, despite working with ACE Studio over a couple of weeks, I’d still describe myself as a new user. Like Synth V, this is an application that can produce impressive results almost instantly, but experience will bring further incremental improvements. That qualifier made, how far might we go up the quality scale? Well, if you need some backing vocal parts or vocal hooks to place within an electronic dance track, ACE Studio can deliver. Equally, if you need to create a fully realised demo vocal for a song project (for example, so that your actual human session singer has a better idea of what they are shooting for), again, ACE Studio can do the business.
With enough experience to fully capitalise on all the features the synthesis engine offers, I think you could also pull off a lead vocal (including any harmonies) for a contemporary EDM or busy pop track and many listeners would simply not realise it was a computer‑generated vocal. It might be more difficult to do in a bare‑bones piano/vocal ballad, where the voice is more exposed.
In all cases, while the vocal might sound technically proficient, instilling it with all the subtle details of emotion that a really good singer brings to a performance will be a challenge. But, goodness me, it’s still impressive stuff; ACE Studio’s vocals will be able to pass the test in plenty of potential use‑case scenarios. And, as a taster of what’s possible, I’ve created a few short audio examples, which you can audition on the SOS website, at https://sosm.ag/ace‑studio-audio.
Singing Competition
A comparison with Dreamtonics’ Synthesizer V is inevitable. The two products attempt broadly the same task, and in terms of their UIs, the workflows have many things in common. There is a lot that could be said here so I’ll confine myself to the most obvious of pros and cons. For instance, when it comes to manual pitch editing, I think the approach adopted within ACE Studio currently has the edge, particularly when it comes to manually editing vibrato. Equally, the option for 40+ singers straight out of the box, and the interesting Voice Mix feature, will undoubtedly appeal to some potential users. However, for English language vocals (I’m not qualified to comment on those of the other languages supported), I do think Synth V’s voice synthesis is currently superior and provides better pronunciation, leading to a more realistic end result. I also prefer Synth V’s voice‑specific ‘vocal modes’ system to control the expression and dynamics, giving each of the virtual singers a very individual character.
There are also some technical differences between the two. ACE Studio is based upon a subscription model and uses cloud‑based rendering, whereas Synth V is a one‑time payment (with each AI voice also being a separate purchase) and all rendering is done locally. Incidentally, the automatic a capella‑to‑MIDI conversion process is also a feature that Dreamtonics have added in the current public beta of Synth V, and it works very well.
It’s hard to imagine that the obvious competition between the two products is not pushing both development teams forward. The resulting pace of development is an obvious up side for existing users of either product. However, as a down side, it might make a purchase decision for potential new users more difficult as some feature leapfrogging will undoubtedly occur over the next year or two through their competing release cycles. Watch this space...
It’s worth reminding ourselves just how remarkable it is that AI vocal synthesis can even get close to sounding ‘human’.
Holding All The ACEs
Whether you like the concept of a computer‑generated lead vocal or not, products like ACE Studio and Synthesizer V represent truly groundbreaking technology. Bells and whistles of their respective workflows aside, it’s worth reminding ourselves just how remarkable it is that AI vocal synthesis can even get close to sounding human. If you already have a virtual band of musicians within your studio computer, the thought of adding a virtual vocalist might be a tempting one. Personally, I think these products are now at a level where that’s a realistic proposition and, in the right context, these tools have very practical applications.
Thankfully, you can find out for yourself whether you agree with me as both ACE Studio and Synth V provide options to try before you buy. In the case of ACE Studio, that’s via a free trial period, and I’d encourage any potential user to take advantage of that to fully explore what’s possible. Be prepared to invest a little time though; experience and experimentation are required to get the best from the engine. I know this final sentence might sound a little weird... but virtual singers are now a real thing.
Pros
- Capable of genuinely useful results.
- Impressive a capella to virtual vocal conversion process.
- Very slick pitch, modulation and vibrato editing.
Cons
- Subscription model and cloud‑based rendering require an Internet connection at all times.
Summary
Vocal synthesis is making remarkable progress. In experienced hands, ACE Studio is capable of generating genuinely useful results.