I’m trying to generate and collect data using Azure’s speech to text code. I want to generate timestamps, reduce redundancies in the output, and export to Excel. The code below runs with no errors:
using System; using System.Threading.Tasks; using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; namespace NEST { internal class NewBaseType { static async Task Main(string[] args) { // Creates an instance of a speech config with specified subscription key and region. // Replace with your own subscription key and service region (e.g., "westus"). var config = SpeechConfig.FromSubscription("subscriptionkey", "region"); // Generates timestamps config.OutputFormat = OutputFormat.Detailed; config.RequestWordLevelTimestamps(); //calls the audio file using (var audioInput = AudioConfig.FromWavFileInput("C:/Users/MichaelSchwartz/source/repos/AI-102-Process-Speech-master/transcribe_speech_to_text/media/narration.wav")) // Creates a speech recognizer from microphone. using (var recognizer = new SpeechRecognizer(config, audioInput)) { // Subscribes to events. recognizer.Recognizing += (s, e) => { Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}"); }; recognizer.Recognized += (s, e) => { var result = e.Result; Console.WriteLine($"Reason: {result.Reason.ToString()}"); if (result.Reason == ResultReason.RecognizedSpeech) { Console.WriteLine($"Final result: Text: {result.Text}."); } }; recognizer.Canceled += (s, e) => { Console.WriteLine($"n Canceled. Reason: {e.Reason.ToString()}, CanceledReason: {e.Reason}"); }; recognizer.SessionStarted += (s, e) => { Console.WriteLine("n Session started event."); }; recognizer.SessionStopped += (s, e) => { Console.WriteLine("n Session stopped event."); }; recognizer.Recognized += (s, e) => { var j = e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult); }; // Starts continuous recognition. // Uses StopContinuousRecognitionAsync() to stop recognition. await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false); do { Console.WriteLine("Press Enter to stop"); } while (Console.ReadKey().Key != ConsoleKey.Enter); // Stops recognition. await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false); } } } }
When I run it, I don’t see timestamp data. How do I generate timestamp data?
Also, is there a way to remove redundancies in the output? Example:
RECOGNIZING: Text=the RECOGNIZING: Text=the speech RECOGNIZING: Text=the speech translation RECOGNIZING: Text=the speech translation API RECOGNIZING: Text=the speech translation API transcribes RECOGNIZING: Text=the speech translation API transcribes audio
I just want the final result. Is there a way to remove the “RECOGNIZING:” data from the output while preserving accuracy? Thanks in advance!
Answer
For removing the "RECOGNIZING:"
, just delete this sentence:
recognizer.Recognizing += (s, e) => { Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}"); };
I didn’t see where you export the result and timestamps to Excel. You could use this code after you got the SpeechRecognitionResult
object:
var json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult); Console.WriteLine(json);