Skip to main content

Bark, the Open Source Text To Speech AI

 


When you think of Text to Speech in AI terms, the first company you may think of is Eleven Labs as the quality of their product literally speaks for itself. If you are looking for an Open Source tool, then Bark, by Suno may be of interest.  

In Hacker News one of the founders of Suno said this of Bark: 'At Suno we work on audio foundation models, creating speech, music, sounds effects etc….

Text to speech was a natural playground for us to share with the community and get some feedback. Given that this model is a full GPT model, the text input is merely a guidance and the model can technically create any audio from scratch even without input text, aka hallucinations or audio continuation. 

When used as a TTS model, it’s very different from the awesome high quality TTS models already available. It produces a wider range of audio – that could be a high quality studio recording of an actor or the same text leading to two people shouting in an argument at a noisy bar.'

This tool is already available on Hugging Face (which I'm due to do a blog piece on - the ToDO list is growing) which increases the utility. 

The GitHub description states:

'Similar to Vall-E and some other amazing work in the field, Bark uses GPT-style models to generate audio from scratch. Different from Vall-E, the initial text prompt is embedded into high-level semantic tokens without the use of phonemes. It can therefore generalize to arbitrary instructions beyond speech that occur in the training data, such as music lyrics, sound effects or other non-speech sounds. A subsequent second model is used to convert the generated semantic tokens into audio codec tokens to generate the full waveform. To enable the community to use Bark via public code we used the fantastic EnCodec codec from Facebook to act as an audio representation.'

Comments

Popular posts from this blog

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

 Large Language Models (LLMs) are rapidly transforming how we interact with technology, offering incredible potential for tasks ranging from content creation to complex analysis. However, as these powerful tools become more integrated into our lives, so too do the novel security challenges they present. Among these, prompt injection attacks stand out as a particularly persistent and evolving threat. These attacks, as one recent paper (Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/abs/2502.05206) highlights, involve subtly manipulating LLMs to deviate from their intended purpose, and the methods are becoming increasingly sophisticated. At its core, a prompt injection attack involves embedding a malicious instruction within an otherwise normal request, tricking the LLM into producing unintended – and potentially harmful – outputs. Think of it as slipping a secret, contradictory instruction into a seemingly harmless conversation. What makes prompt inj...

Podcast Soon Notice

I've been invited to make a podcast around the themes and ideas presented in this blog. More details will be announced soon. This is also your opportunity to be involved in the debate. If you have a response to any of the blog posts posted here, or consider an important issue in the debate around AGI is not being discussed, then please get in touch via the comments.  I look forward to hearing from you.

The tech utopia of endless leisure time is here: goodbye jobs

  'AI eliminated nearly 4,000 jobs in May' so it's reported by hallenger, Gray & Christmas, Inc. Following on from reports by IBM et al that thousands of job cuts will occur due to AI replacement, there is no need to wait for the utopia of AI allowing humans more leisure time, as that's already here, in the form of redundancies, if we are to accept the reports findings. 'With the exception of Education, Government, Industrial Manufacturing, and Utilities, every industry has seen an increase in layoffs this year.' What's particularly notable is that it's the Tech sector that's the most affected from job cuts in the US economy: 'The Technology sector announced the most cuts in May with 22,887, for a total of 136,831 this year, up 2,939% from the 4,503 cuts announced in the same period last year. The Tech sector has now announced the most cuts for the sector since 2001, when 168,395 cuts were announced for the entire year. ' Another reason ...