3Play Media’s Post

View organization page for 3Play Media, graphic

7,445 followers

3mo

Is Automatic Speech Recognition (ASR) ready for primetime? Our annual report dives deep into the performance of leading ASR engines for captioning & transcription. Download it for FREE & gain insights on accuracy, limitations, and the future of voice-to-text: https://bit.ly/3RRBiCV. #ASR #Accessibility

2024 State of ASR Report | 3Play Media

go.3playmedia.com

To view or add a comment, sign in

More Relevant Posts

Josh Miller

Co-CEO and Co-founder at 3Play Media
3mo
Report this post
How good is Automatic Speech Recognition (ASR) technology... for real? 3Play Media's annual State of ASR Report is here! This in-depth analysis explores how ASR engines perform for captioning and transcription. Download the free report to read through our findings and insights on the latest advancements of top ASR engines. #ASR #StateOfASR #accessibility #captioning #transcription #3PlayMedia https://lnkd.in/e_MawqF6

2024 State of ASR Report | 3Play Media

go.3playmedia.com
Like Comment
To view or add a comment, sign in
Dan Caddigan

SVP of Engineering @ 3Play Media • Advisor • Open Source Author • Speaker • Vintner • Warthog
3mo
Report this post
Ever wonder how generative AI is shaping the race to the best ASR? Wonder no more! 3Play Media just released our annual “State of ASR” report where we put the best head to head in an unbiased way. This is real data… no funny business to force yourself in the top right quadrant. We spend _a lot_ of effort to make sure we get this right. Check it out! #ASR #GenerativeAI IBM OpenAI Speechmatics AssemblyAI Microsoft Rev

Josh Miller

Co-CEO and Co-founder at 3Play Media
3mo

How good is Automatic Speech Recognition (ASR) technology... for real? 3Play Media's annual State of ASR Report is here! This in-depth analysis explores how ASR engines perform for captioning and transcription. Download the free report to read through our findings and insights on the latest advancements of top ASR engines. #ASR #StateOfASR #accessibility #captioning #transcription #3PlayMedia https://lnkd.in/e_MawqF6

2024 State of ASR Report | 3Play Media

go.3playmedia.com

1 Comment
Like Comment
To view or add a comment, sign in
Andrei Lopatenko 🇺🇦

VP AI & Engineering | Co-Founder | Keynote speaker | Ex-Google, Apple, WML
3mo Edited
Report this post
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs a work from MetaAI on building a multi modal (audio text model) that enables voice interactions with LLM tasks They start with instruction tuned LLama 2 and extend its text capabilities to the speech domain without loss of text based capabilities. The show that the multi modal audio text model created in such ways outperforms ASR LLM systems where ASR and LLM are separated models https://lnkd.in/gu3CCbRC
Like Comment
To view or add a comment, sign in
raghad alshobaki

Student at Tafila Technical University في TTU
1mo
Report this post
My presentation about Artificial intelligence in converting audio or video into text.
Like Comment
To view or add a comment, sign in
Muchiu (Henry) Chang, PhD. Cantab (Cambridge, UK)

Consultant in Patent Intelligence and Engineering Management
5mo
Report this post
There are six UN official languages and many more non-UN languages. Can AI handle them all? It's all for money 💰🤑💸. As to the AI hype, see some risks and limitations of AI. AI consumes electricity heavily. In our reality, AI has doubled the electricity bills of a C-level friend of ours. 😔 😥 😿 See a warning about LLM and AI from German government: https://lnkd.in/gMDaGDij An UN newly released report said that AI ONLY benefits small amount of states, companies and individuals, i.e., some few humans are making big money by using AI to harm many people. AI is based on math models, and models must be V&V before we can trust them. Using math models to simulate a physical process started from the Manhattan Project in WWII for nuclear bomb design. Then, in 1960s, it came the C4ISR, which is today's AI, whose original missions were just breaching enemy's security, dis-/mis-information, cognitive manipulation, cheat, surveillance, detection for kill, etc.. AI can do many things, but NOT everything, at least NOT what we are doing, by using our intellectual property (IP), a copyrighted multilingual metadata, for dataanalytics. Without metadata, NO data can be found/retrieved, even by AI. https://lnkd.in/g-aJFnXR

NVIDIA AI

1,058,833 followers
5mo

Break new ground in #speechrecognition with new Parakeet ASR models. These state-of-the-art ASR models, developed in collaboration with Suno, transcribe spoken English with exceptional accuracy. Get started today. https://nvda.ws/4aK9Lut

Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models | NVIDIA Technical Blog
Like Comment
To view or add a comment, sign in
Saurabh Solanki

Cybersecurity Analyst | Bug Hunter | Pentester | RedHat ˿̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼͇͈͉͍͎̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚ͅ͏͓͔͕͖͙͚͐͑͒͗͛ͣͤͥ͘͜͟͢͝͞͠͡"
2mo
Report this post
Real-Time Voice Cloning #VoiceCloning #AIVocice #Cloning Real-Time Voice Cloning (https://lnkd.in/dRBZ7xK3) is an open source tool for real-time voice cloning. Can "learn" someone's voice from a 5-second recording of speech, and then use the "learned" voice to say anything. The program is equipped with modern encoders that reproduce the voice from a 5-second audio file. The program then converts the recording into speech. The program has a simple interface that allows you to configure the encoder, synthesizer and vocoder according to your preferences. This enables efficient cloning of any voice by adjusting the necessary parameters. Detailed guide (https://lnkd.in/dAFK_CmA)
Like Comment
To view or add a comment, sign in
Byunghyun Kim

Postdoctoral Researcher at University of Illinois Urbana-Champaign
6mo
Report this post
𝐎𝐍𝐄-𝐏𝐄𝐀𝐂𝐄 ONE-PEACE is a general representation model across vision, audio, and language modalities, Without using any vision or language pretrained model for initialization, ONE-PEACE achieves leading results in vision, audio, audio-language, and vision-language tasks. Furthermore, ONE-PEACE possesses a strong emergent zero-shot retrieval capability, enabling it to align modalities that are not paired in the training data. code: https://lnkd.in/gCNGzq5e paper: https://lnkd.in/g337QgFp
Like Comment
To view or add a comment, sign in
Bryj Technologies, Inc.

10,856 followers
4mo
Report this post
GPT-4 Omni can reason across audio, vision, and text in real-time with authentic emotion. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, similar to human response time in a conversation. https://lnkd.in/gm-eFHU4
Like Comment
To view or add a comment, sign in
Eva Eduarda

Enterprise Mobility Solutions Expert @ Spectralink UK&I
5mo
Report this post
https://lnkd.in/ee7tTyqA Hear the AI-enhanced noise cancellation feature of Versity 97 in action! #Al #Versity97 #noisecancellation #bestinclass #critcalconnectivity

Spectralink Versity 97 Series Call Quality Audio

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Justin Donaldson

Fractional AI Officer, Founder @ 🤫 hushh, ex-Salesforce Principal Data Scientist/Engineer, Advisor for UW Continuing Education and AI/Aeronautical/Health startups.
1mo
Report this post
The ICML best paper this year is all about video generation *without* using stable diffusion : VideoPoet: A Large Language Model for Zero-Shot Video Generation https://lnkd.in/grpUkZHS https://lnkd.in/g9k2ezgU

arxiv.org
Like Comment
To view or add a comment, sign in

7,445 followers

View Profile Follow

3Play Media’s Post

More Relevant Posts

Spectralink Versity 97 Series Call Quality Audio

https://www.youtube.com/

Explore topics