Hello,
I'm a Swift newbie and building an app that helps with dictation for MacOS. Here is the overview of how the project is setup.
- SpeechRecognizerManager (ObservableObject Class): Handles all things to do with speech recognition - Permissions, Start transcribing, Stop transcribing and has a variable
transcribedText
which outputs in real-time as the user is talking.
- ListeningButton (Swift UI): Creates a button that handles all cases of listening using SpeechRecognizerManager. Permissions (onAppear), Start & Stop transcribing (On Toggle), Button's UI changes (On Toggle). It has two binding variables:
isListening
& transcribedText
. The goal for these variables is to always keep the parent view updated with the latest status.
- ContentView (Swift UI): I plan to call the ListeningButton to various views, testing with first view currently. I have access to both
isListening
& transcribedText
. However, I only get access to transcribedText
when the listening is stopped.
Requirement: Get access to transcribedText
in the parent view as the text is transcribed, in real-time.
Possible Solutions: Below are the solutions I can think of as a newbie. I'd like to know if there are any other solutions or one of these are recommended.
- Somehow get access to
SpeechRecognizerManager.transcribedText
in the parent view. I don't know if it is possible and how so. (Update: I figured out how to do this based on @dbtl88's correction of GPT-4 coding skills :) I still need to know if this is the best way or not.)
- Change the ListeningButton UI to constantly (using while isListening: true) set
transcribedText = SpeechRecognizerManager.transcribedText
. Is this recommended? Feels like a "hack", but I'm not sure.
- Make the Real-time streaming of
transcribedText
as part of the ListeningButton UI. I don't really prefer this as it limits the UI functionality in the Parent Views.
- Any other better way?
ListeningButton:
import SwiftUI
struct ListeningButton: View {
@StateObject var speechRecognizerManager: SpeechRecognizerManager = SpeechRecognizerManager()
// State variable to track hover status
@State private var isHovering : Bool = false
@State private var blink : Bool = false
@Binding var isListening : Bool
@Binding var transcribedText: String
let blinkTimer = Timer.publish(every: 0.7, on: .main, in: .common).autoconnect() // Timer to trigger the blink
var body: some View {
Button(action: toggleListening) {
Image(systemName: "mic")
.imageScale(.large)
.foregroundColor(isHovering ? .green : (colorScheme == .dark ? .black : .white)) // Adapt foreground color & change color on hover
.opacity(blink ? 0.1 : 1.0) // Blink effect by changing opacity
.padding() // Add padding to increase the circular area
.background(self.isListening ? Color.red : Color.blue) // Red when recording
.clipShape(Circle()) // Clip the background to a circle shape
.onHover { hovering in
isHovering = hovering
}
}
.onReceive(blinkTimer) { _ in
if isListening {
withAnimation(.easeInOut(duration: 0.5)) {
blink.toggle()
}
}
}
.onAppear {
requestSpeechRecognitionPermission()
blink = false
}
.onDisappear {
blinkTimer.upstream.connect().cancel() // Cancel the timer when the view disappears
}
}
private func toggleListening() {
if isListening {
speechRecognizerManager.stopTranscribing()
blinkTimer.upstream.connect().cancel() // Stop blinking when not listening
blink = false
transcribedText = speechRecognizerManager.transcribedText
} else {
speechRecognizerManager.startTranscribing()
transcribedText = speechRecognizerManager.transcribedText
blinkTimer.upstream.connect() // Start blinking when listening
}
isListening.toggle() // Update the listening state
}
private func requestSpeechRecognitionPermission() {
speechRecognizerManager.requestSpeechAuthorization()
}
}
Relevant code from the ContentView:
import SwiftUI
struct ContentView: View {
@State private var transcribedText: String = ""
@State private var isListening : Bool = false
var body: some View {
ScrollView {
ZStack (alignment:.bottomTrailing) {
ListeningButton(isListening: $isListening, transcribedText: $transcribedText)
VStack() {
Text(transcribedText)
.padding()
Text(isListening ? "Listening" : "Not Listening")
.padding()
}
}
}
Relevant code of SpeechRecognizerManager:
class SpeechRecognizerManager: ObservableObject {
...
@Published var transcribedText: String = ""
...
if let result = result {
DispatchQueue.main.async {
self?.transcribedText = result.bestTranscription.formattedString
}
isFinal = result.isFinal
}
Thanks!