我试图让AudioKit将麦克风传输到谷歌的语音到文本API,如 here所示,但我不完全确定如何去做. 要为Speech-to-Text引擎准备音频,您需要设置编码并将其作为块传递.在Google使用的示例中,他们使用
要为Speech-to-Text引擎准备音频,您需要设置编码并将其作为块传递.在Google使用的示例中,他们使用Apple的AVFoundation,但我想使用AudioKit,因此我可以执行一些预处理,例如削减低振幅等.
我认为正确的方法是使用Tap:
首先,我应该通过以下方式匹配格式:
var asbd = AudioStreamBasicDescription() asbd.mSampleRate = 16000.0 asbd.mFormatID = kAudioFormatLinearPCM asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked asbd.mBytesPerPacket = 2 asbd.mFramesPerPacket = 1 asbd.mBytesPerFrame = 2 asbd.mChannelsPerFrame = 1 asbd.mBitsPerChannel = 16 AudioKit.format = AVAudioFormat(streamDescription: &asbd)!
然后创建一个水龙头,例如:
open class TestTap { internal let bufferSize: UInt32 = 1_024 @objc public init(_ input: AKNode?) { input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in // do work here } } }
但是我无法确定通过方法streamAudioData
使用AudioKit将这些数据处理到Google Speech-to-Text API的正确方法,但也许我会以错误的方式解决这个问题?
更新:
我已经创建了一个Tap如下:
open class TestTap { internal var audioData = NSMutableData() internal let bufferSize: UInt32 = 1_024 func toData(buffer: AVAudioPCMBuffer) -> NSData { let channelCount = 2 // given PCMBuffer channel count is let channels = UnsafeBufferPointer(start: buffer.floatChannelData, count: channelCount) return NSData(bytes: channels[0], length:Int(buffer.frameCapacity * buffer.format.streamDescription.pointee.mBytesPerFrame)) } @objc public init(_ input: AKNode?) { input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in self.audioData.append(self.toData(buffer: buffer) as Data) // We recommend sending samples in 100ms chunks (from Google) let chunkSize: Int /* bytes/chunk */ = Int(0.1 /* seconds/chunk */ * AudioKit.format.sampleRate /* samples/second */ * 2 /* bytes/sample */ ) if self.audioData.length > chunkSize { SpeechRecognitionService .sharedInstance .streamAudioData(self.audioData, completion: { response, error in if let error = error { print("ERROR: \(error.localizedDescription)") SpeechRecognitionService.sharedInstance.stopStreaming() } else if let response = response { print(response) } }) self.audioData = NSMutableData() } } } }
在viewDidLoad:中,我正在设置AudioKit:
AKSettings.sampleRate = 16_000 AKSettings.bufferLength = .shortest
然而,谷歌抱怨:
错误:音频数据传输速度过快.请大约实时流式传输音频数据.
我已经尝试更改多个参数,如块大小无济于事.
我找到了解决方案 here.我的Tap的最终代码是:
open class GoogleSpeechToTextStreamingTap { internal var converter: AVAudioConverter! @objc public init(_ input: AKNode?, sampleRate: Double = 16000.0) { let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: sampleRate, channels: 1, interleaved: false)! self.converter = AVAudioConverter(from: AudioKit.format, to: format) self.converter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Normal self.converter?.sampleRateConverterQuality = .max let sampleRateRatio = AKSettings.sampleRate / sampleRate let inputBufferSize = 4410 // 100ms of 44.1K = 4410 samples. input?.avAudioNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(inputBufferSize), format: nil) { buffer, time in let capacity = Int(Double(buffer.frameCapacity) / sampleRateRatio) let bufferPCM16 = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))! var error: NSError? = nil self.converter?.convert(to: bufferPCM16, error: &error) { inNumPackets, outStatus in outStatus.pointee = AVAudioConverterInputStatus.haveData return buffer } let channel = UnsafeBufferPointer(start: bufferPCM16.int16ChannelData!, count: 1) let data = Data(bytes: channel[0], count: capacity * 2) SpeechRecognitionService .sharedInstance .streamAudioData(data, completion: { response, error in if let error = error { print("ERROR: \(error.localizedDescription)") SpeechRecognitionService.sharedInstance.stopStreaming() } else if let response = response { print(response) } }) } }