You have probably figured out that I create my content as a podcast and also as a series of blog posts. I have found over the nearly 15 years of podcasting that I don’t speak well as a solo speaker without a script. While I can sound articulate when speaking on mic to another person, when I’m alone my speech is filled with “ums” and “ahs”. I even tried writing just bullet points to remind me of what I wanted to say, but I found that I was writing full sentences and then editing them down to bullet points. I realized that there are those who would rather read than listen, so why not just give them the full text?
But this process is certainly not for everyone. In fact, I don’t know anyone who does it this way. Over the years though, I’ve gotten lots of positive feedback for providing the script of my shows. I’ve heard from people with severely limited bandwidth to a guy who is deaf/blind and reads the show on his Braille display.
So what if you’ve got the audio content and want a transcript of what was said? Even if you’re not a podcaster, what if you have meetings where you’d like the full text for later reference? Transcription services are enormously expensive and take a lot of time because it’s a really annoying thing to do by hand. For years people have been trying to create automated ways to do it, and I think Otter.ai might finally deliver on that dream.
Remember the the review Andy Dolph did about QLab for timing audio and video, and he talked about music timing to go with a Christmas train? When he sent that in, he only sent the audio. I wrote back and asked him whether he had a script for it so I could make a blog post, and he said that he hadn’t written one, but he’d get one. A few minutes later he sent me a nearly perfect transcription of his recording.
Andy explained that the tool he used was a speech to text app from Otter.ai. Now before I tell you a single thing about this tool, hold onto your hats for the price. Otter.ai is FREE for 600 minutes per month. That’s crazy pants. The premium plan is only $100/year for 6000 minutes per month. Premium users get more options than free users and I’ll highlight a few as we get into it.
How Well Does it Work?
Now that you know Otter.ai is inexpensive, you’re probably thinking it’s not very good. This is where you’d be wrong. I made a short recording for the Mac Geek Gab asking Dave and John a question, so I used that recording as a test of Otter.ai. I dropped the m4a file into the web app at Otter.ai. It immediately started processing the file and a few minutes later I was notified that the transcript was complete.
The resultant text was 239 words, and other than when I stumbled over a word, and a couple of very technical words, it only made 3 mistakes. That is a 98.7% accuracy rate. What’s your accuracy rate when typing? Now I know if you were just rambling into a microphone you wouldn’t be as articulate as the practiced monologue I submitted, but still, that’s pretty amazing.
I made a video for the shownotes (which would make no sense to play for the audio podcast) where you can see Otter.ai reading back the transcript while you listen to me talk. You can see why it made a few mistakes here and there, like writing “P list” when I said plists. In the view I show in the video, you can listen and edit at the same time. They say that if you do edit, that helps the engine learn, which is cool. You can slow down the speech and speed it back up with keystrokes which is really useful too.
Creating video subtitles
I started thinking about how expensive it is for Don McAllister to have closed captions for the hearing impaired on his video tutorials in ScreenCastsOnline, and wondered if this might work for his production workflow. There’s not just the cost, but the turnaround speed also affects the price he pays for human transcription. He’s worked it into a parallel path pretty much but what if he only had to wait an hour instead of days to get a transcript. If this works, it could be a boon for making YouTube content accessible as well.
For closed captions, what you need is called an SRT file (which stands for SubRip Subtitle file). An SRT file is not just text, it’s text with timestamps so you can just drop it into the video file and it syncs up.
Otter.ai calls your recording/transcripts Conversations. If you select a conversation you have options to export. Free users can export to the clipboard or to a plain text file, but that’s it. The premium users not only get the 6000 min/month, they also get expanded export options including DOCX, PDF, and the SRT file you need for video subtitles. I guarantee you that would be well worth the price for Don if the quality is good enough.
I tested Otter.ai on one of my ScreenCastsOnline video tutorials by exporting the audio out of ScreenFlow. I was too lazy to listen to all 8734 words to watch for mistakes, but in the first 900 words, it made 6 mistakes, or 99.3% accuracy. And I have to point out that every single one of those mistakes was when I did not clearly say the word. I know from context what I said but I could easily see a human transcribing it exactly as Otter.ai did. I’m not counting words it couldn’t know, or capitalization it wouldn’t know about in my 99.3% accuracy rating. I also found that punctuation accuracy varied depending on the style of speech that I gave it.
I am a complete novice at subtitles, but I really wanted to see how this could work for video subtitles, especially if I could help bossman Don. You can pay month-to-month with Otter.ai (which is great if you only occasionally need the advanced capabilities) so I ponied up $10 to do an experiment. I exported my shiny new transcript from Otter.ai to an SRT file on my disk, using the default options. I poked around in the menus of ScreenFlow for how to import a caption file and how to get it to show up onscreen.
When I imported the SRT file, it only put in the first 23 seconds. I was baffled on why it didn’t import the entire 47 minutes of text. I did a ton of searching online on both the ScreenFlow web pages and on Otter.ai’s site but could find no clues. I went back to the export options for SRT files and found things about adding line breaks automatically, max number of lines and a few other things. I fiddled with those dials but didn’t solve the problem.
I looked at the imported text inside the caption field in ScreenFlow and discovered that there was a LOT more than 23 seconds worth of transcript packed into the caption for those 23 seconds. I found the end of the text it had packed into 23 seconds, and looked back at the transcript in Otter.ai. The giant segment of text stopped at the exact same point in the transcript where Otter.ai had made a paragraph break. Aha! I must have to get rid of the paragraph breaks! Now the entire 47 minutes worth of transcript was crammed into that 23-second video segment.
I shot off a note to Otter.ai tech support, confessing that I was a newb and went to bed. Their help system said to expect help in a day or two. In the morning, with a fresh brain, I looked at ScreenFlow again and immediately figured out what was wrong. When creating a video in ScreenFlow, you do a LOT of editing, so the video and audio tracks get all chopped up into short segments. 23 seconds was the length of the first audio segment. Clearly importing captions puts all of them into the first audio segment it finds. I grabbed the entire audio track in ScreenFlow and deleted it, and then imported the audio track I’d exported in the first place to upload to Otter.ai to get the transcript. Now I had one single 47-minute audio clip.
I imported the SRT file one more time, and it worked perfectly!. I now have a 47 minute, accessible to the hearing impaired video tutorial. And the total cost was $10. I don’t know what the best rates are, but I just saw a tweet from The Verge that transcription service Rev (which Don uses) is raising their price to $1.25/min so this one 47 min video would have cost $37 from them. And for my $10, I still have 5953 minutes of audio I can transcribe this month.
I’ve shot a render of the video off to Don and the team and I can’t wait to hear if this is as big of a deal as I think it is. They’re excited to do their own tests, and they’re going to compare Rev to Otter.ai to see how well it stacks up.
There’s an app for that
Ok, back to normal people and how they might want to use Otter.ai.
You don’t have to submit a pre-recorded audio file to Otter.ai. You can just talk directly to the web app via your internal or big-girl microphone, and ramble out your thoughts. When I tried this method, it seemed to have a lot more trouble figuring out when to put a period vs a comma than it did on my practiced speech. It made zero textual mistakes in about a minute of speech, which is a lot better than I can type!
While you’re babbling into your mic, Otter.ai will be transcribing real-time. It’s kind of distracting so I tried not to watch. When you’re done recording, there will be a little notice telling you it’s processing and then you’ll be able to see your transcript and listen along and edit as I described earlier.
I was poking around in the help files and discovered that they have apps for Android and iOS. The app can do pretty much everything that the web app can do, and maybe even more. You can edit, export and even create a cute little tag cloud.
They suggest that you can put your phone in the middle of the room at a meeting and let Otter.ai create a transcript for you. I’m not sure how well that would work, because they also suggest that your transcription accuracy will go up if you can get closer to the mic. They suggest other use cases for the tool, like simply recording your ideas with it so you get a text transcription.
Think about this. How many people have tried to record their elderly parents or friends telling their stories of childhood that they wanted to be preserved? With Otter.ai you could get the stories not just in their own voices but a text transcript as well. Maybe you want to record your own memoirs and want to think it out extemporaneously first and edit the text later. There are so many cool possibilities with what you can do with Otter.ai. And remember, this kind of transcription costs you zero dollars.
Sharing & Organization
You can share your conversations with other Otter.ai users. Pat Dengler and I tested sharing Conversations with each other and they simply showed up in a folder called Shared with Me. Surprisingly, I was able to edit her transcription. I could also just add little comments to the text which seems a bit more polite. Once you’ve added someone to a shared project, you can change their privileges from edit to comment to simply view.
When you share Conversations with someone, it creates a Group automatically. I tapped on the Group for Pat, and in there I could see not only what she had shared with me, but the Conversation I had shared with her.
Otter.ai has organization tools as well. You can create named folders and move any Conversation into them. I can see getting a mess in Otter.ai pretty quickly without these nice organization capabilities.
I flipped back and forth between my Mac on the Otter.ai web app, and my iPhone and iPads on their dedicated iOS apps and it worked perfectly.
Bottom Line
I’m super excited about Otter.ai and what it could mean to accessibility. Everyone wants to do the right thing and provide transcripts but until now it was prohibitively expensive. The ability to transcribe meetings, interviews and even your own notes for little to no cost is absolutely fantastic.
If I’ve gotten you excited about Otter.ai, I’ve included a referral link in the show notes. You get a free 1-month premium pass (and I do too) if you use this link: https://Otter.ai/referrals/YZU2LAAP.