The biggest surprise of Google’s Pixel event is a transcription app. Here’s how it works
While Google’s latest smartphone, the Pixel 4, got top billing at a major launch event in New York on Tuesday, the unveiling of an AI-enhanced recording and transcribing app was perhaps the biggest surprise of the day.
Recorder is meant for recording meetings, music, lectures, and such. It can recognize and transcribe in real-time what you’re saying and identify other types of noise like music and applause. The recordings can be searched for specific words. For example, you can search for “rainbow” and receive results that show where that word was uttered in each recording.
Recorder will come with the new Pixel phone — Google’s flagship handset line for showing off the latest features of its Android app. That phone starts at $799, or $100 more than the baseline iPhone 11 model, and ships October 24. Starting in December, Google will roll Recorder out to older Pixel phone models as well.
While Recorder may sound like a fairly simple app, Sherry Lin, the product manager for Recorder, told CNN Business it wasn’t easy to make its speedy transcription work without killing the phone’s battery life. Google had to figure out how to pack into the handset a lot of AI that’s usually tucked away on a remote server.
“Honestly, when we started out we weren’t sure if we could even ship,” Lin said in an interview Tuesday.
As countless journalists and college students know, there are plenty of apps for recording audio on your smartphone, and some of them, such as Otter.ai, use AI to translate chatter into transcripts, allowing you to do things like search the resulting recordings. Typically, if you want to do more than simply record a conversation, you’ll need an internet connection because much of the AI involved in analyzing and transcribing, say, a riveting lecture about Hegelian dialectic, tends to happen on a faraway server, rather than on your smartphone.
To show off how Recorder does this work on the phone, Sabrina Ellis, vice president of product management at Google, noted Tuesday during an on-stage demo of the app that the phone was in airplane mode.
Lin said the reasons for keeping all of Recorder’s operations on the handset are twofold: to help protect the user’s privacy by keeping the audio and related text on the phone, and to allow speech to be translated into text more quickly than if it first had to take a trip to and from a remote server.
Making the app usable on a phone was tricky, however, in part because it relies on multiple pieces of AI that can run down the handset’s battery and bog down its main processor. These include an AI model that is specifically aimed at transcription (a retrained, retooled version of the model that powers Google Assistant), one that works on search, one for inserting punctuation into transcriptions, and one meant for classifying sounds other than speech.
Lin said that when she and her team started working on the app in earnest in March, the transcription model — the app’s biggest chunk of AI — drained the phone’s battery life in less than half an hour and made it heat up.
“We were like, ‘We’ll never ship unless we ship an air conditioning unit with the thing,'” she joked.
Early on, the software also froze the phone and was simply too large to send to consumers via Google Play, the company’s online app store.
To shrink the AI behind the app, Lin said the team “pruned” the transcription model and trained it on capturing long-form speech (this was done, essentially, by feeding the AI lengthy recordings of things like meetings, interviews, and lectures from YouTube) and ignoring background noise.
Lin said the app doesn’t use remote workers to listen to any user recordings — a longstanding industry practice with virtual assistants that has been changing in the wake of media scrutiny regarding privacy concerns. (An exception might be if a user reports a bug, such as a strange static sound, and gives explicit permission for the company to listen to a recording, she said).
According to Lin, the app defaults to saving all the recordings and transcriptions on the phone, and the data is subject to standard Android device encryption. The company can’t see any recording-related data unless you choose to export it to a Google product such as Google Drive or Gmail, she said.
One thing the Recorder team is now working on is figuring out who is speaking when there’s more than one voice on a recording, Lin said. Currently, the app records all audio as though a single person is talking, and she wants to figure out how to segment the transcribed speech by speaker.
“It’s one of those things where it’s so easy for humans to do and so hard for a computer system to tell,” she said.