Hi, this is Jill from the Northwoods.
I told you recently how I was using AI to help me make podcasts show notes better. Again, I’m not using it to write my content. I really want to do that myself.
But instead, I was using AI to get good show notes and get some good hot topics for social media.
But I failed to mention, I think, the very first step in all of this is I’m using Grammarly Go to help me write the show notes, which are summaries of my podcast.
The first thing I need is a good transcription.
At first, I tried other services and started reducing my confidence that I was any good at podcasting at all. When I saw the transcription, the words were quite different than what I actually said.
Then back in September of 2022, Auphonic, which is the app that I use, (and Allison uses) to help process the audio file, smooths it out when it comes to noise level, it comes to loudness levels, it makes the whole podcast better added a new feature.
They came out with integration with Whisper ASR (Automatic Speech Recognition) to do audio transcripts. This means that when you activate that, you get to have a subtitle file, you get to have a transcript, you get to have what is called a VTT file, it gives you an HTML file if you want to put it on your website, your transcript file, or a JSON file.
this allows you to exchange that data with other types of websites and applications too. That did a great job, and it did a little bit better than the apps I was trying to use.
Trying to clean up a 3,000-word transcription that’s not very good isn’t going to help me very much. I really need that good, solid transcript to start this whole AI process. But Whisper with Auphonic did a great job and it got me a long way.
Then I needed to go through and start doing my older podcasts. I have 148 of them under my belt.
And to spend transcription time, which Ophonics gives you two free hours a month, to put all my old podcasts into their app, would have used up a lot of my time.
So I was looking for some other solution that I could do so that I could start dropping those audio files into something and get good transcripts out of it.
It turns out that OpenAI own the rights to Whisper software, and it was produced by a fellow named Georgie Gurgenov. According to OpenAI, they use 680,000 hours to initially train this app to do transcription, which is part of this whole AI movement for sure. And back in September of 2022, OpenAI released the whole Whisper code into open source.
It can be found on GitHub, but ever since it was released to open source, people have been using it because it’s so good to make software applications.
Whisper Transcription in the Mac App Store
I found an app for the Mac called Whisper Transcription. There are a few of them out there. I looked at some others and the other ones didn’t seem to have as much functionality, and tried to charge you a lot more money for something that’s open source, but also didn’t have the attention that this software has.
Interesting to me is the overall score it gets on the Apple Store is 3. I noticed that the trend overall is much better. When I first got it earlier this year, some things just didn’t work. It had a batch update, but that batch update didn’t work. I’ll tell you that since May, there have been six updates to this project. Every couple of days, it seems there’s a new update and something else got fixed. This app is getting a lot better really quickly. The person is taking notes from the people who are using it and making it much better.
And after some testing, I found out it did a great job, just as good as what Auphonic was giving me. Makes sense since they’re using the same base code and now just adapting it to be better applications.
This particular app has a lot of good features. You can drop and drag files into it. You can determine what level of translation you want to get.
There’s what is called tiny in English, small, base, medium, and large models. They each basically take more time the larger the model gets, but they also get more accurate.
So my podcast, which lasts about 20 minutes, it takes about five minutes to get the transcript. For me, accuracy is more important, but you could also use, if you need a very quick translation, these smaller models.
The app itself indicates that all the transcription is done locally on your machine, so it’s not being sent up to the cloud and being used in some other way.
It has some good functionalities where once you get the transcript back, you can search for words, you can listen to the transcription as it’s playing so you can make sure that it’s accurate, and you can also do copy and replace, edit your document so it’s even more accurate once you’ve listened to it.
And now, since the batch process has been fixed, you can also put a lot of files in there and it just chugs away and gives you those transcripts.
The file types you can upload into Whisper Transcription are MP3, WAV files, M4A files, MP4 files, and MOV files.
And you can export it in a variety of different ways.
And one of the biggest reasons now that I switched to Whisper Transcription for every podcast, not just the old ones, is because it allows you to export into one giant paragraph.
With Auphonic, I got timestamps on individual lines or lines without timestamps at all, but each section of the words I’m saying was put on a separate line. This squishes everything into one giant paragraph. And it does a pretty good job of determining where sentences begin and end.
I had some trouble with it in other software I tried, but that might be my Captain Kirk-style talking too. Really pretty great.
The free version of Whisper transcription lets you do the tiny and base model. They’re very quick. They’re pretty darn accurate. I ran both of them and it did a good job.
There are in-app purchases for this app. To get a year subscription, it was $9.99, which meant that I had Pro features, which we’ll talk about in a moment, to make the app even better.
If you buy Whisper Pro, which is $27.99, you get it for life.
If you’re deciding whether or not you want to try the Pro features, I started out with just doing the single year and decided I loved it so much I got the Lifetime Pro membership.
Your in-app purchase also includes Family Sharing.
You can get the full transcript, which is all the sentences mushed together in a single paragraph, the SRT/VTT, CSV outputs, sentences, which are just each individual segment of what you’re saying, without the time stamps.
It even will do multiple speaker paragraphs, so if you’re doing another podcast with someone else, you can upload the two voices and try to split it out.
I haven’t used it, but I believe Allison has, and I’m not sure what kind of job it did for her.
You can export to HTML, PDF, Word document, or something called a DOTE file, which is distributed, open transcription environment. It looks like a file type that is just open source. If you’re looking for the timestamps and your text, you can get an SRC file, a VTT file, a CSV file, and a PDF export.
All of those include the timestamp along with the words you’re saying.
If you’re hoping to get the transcriptions without the timestamps, you can use the full transcription. That’s where it’s all squished together in a single paragraph. The sentences export, which puts every segment of what you’re saying but they’re not necessarily sentences.
The speaker paragraphs, that’s for when you have more than one person talking.
The DOTE export, which I said is an open source version, DOCX, which is the Word export, and the Whisper export, which puts it into a .whisper file, which is proprietary for this application. Again, it depends on what you’re doing with the software. The HTML, PDF, DOTE, and Word DOCX files are part of the PRO package.
It will even allow you to transcribe while you’re recording a podcast or while you’re doing a Zoom meeting. I haven’t done that yet. That sounds pretty helpful if you do need transcripts for your meetings.
I can tell that it chugs away on my M1 Mac and I know that Mac is really powerful, so when it starts using up a lot of resources, I can tell this is an intensive process.
If you need something that needs transcribing, whether you’re using it for AI or not, I hope it helps.
If you have any questions about it, please feel free to look me up on Allison’s Slack channel or her website and put a comment in the blog article related to this particular review.
Thanks so much, have a great week!
Allison here with a couple of comments. Jill turned me onto this software and I discovered a couple more things. In the Mac App Store, it’s called Whisper Transcription, but when you download it, it will be called MacWhisper.
I’m glad Jill detailed some of the export options because I actually dislike the massive shmushed paragraph export that she really likes. I prefer the “export paragraphs” option which makes small, easy-to-read paragraphs. When I see a giant paragraph my brain shuts down completely! It’s great that Mac Whisper, AKA Whisper Transcription has options on export, options for pricing, and even a free model.
I ran the app through my VoiceOver tests, and I didn’t find anything that was inaccessible. The flow was a little bit odd, like on the editing page the progress bar on playback revealed itself before the play button itself, but that could easily be my lack of skill with VoiceOver. One button wasn’t labeled, which visually two opposing arrows and it turned out to be a search and replace tool. Other than that it seemed to work well with VoiceOver. Remember, I’m a high-novice level user of VoiceOver so give it a free trial first if you’re interested.
In a very “meta” moment, I actually used Whisper Transcription to transcribe this article of Jill’s. That made for a great experiment in seeing how good it was at transcribing Jill’s voice. It didn’t make many outright mistakes, even with the small model that comes free with the service. I’d give Whisper Transcription a high review as well.
Ha, I had to read this through to the end, because I quickly suspected that the Jill portion was a transcription of her audio – and I was right! If you’re a regular listener of the Nosillacast you will have heard Jill many times, and the text had some of her signature audio cues: short sentences, putting “for sure” on the end of a statment, “really pretty great” as a statement, stuff like that. Don’t get me wrong: I love Jill’s speaking style! It’s just that I could hear her voice clear as a bell while reading the article. Thanks, Jill, for a great review that I will definitely pass along to my friends and family.
That’s wonderful, Kurt. Really made me smile to read this.