象牙窟: Speech Recognition

顯示具有 Speech Recognition 標籤的文章。顯示所有文章

2009年11月27日星期五

在 Windows 環境下使用 SAPI 開發語音程式（四）：語音合成

之前的文章描述怎麼使用 SAPI 達到語音辨識，相較之下，語音合成
（文字轉語音， TTS ）就簡單了許多。

一樣，還是一個 .hpp 和一個 .cpp 檔。


// SpeechSynthesizer.hpp
#ifndef _SPEECHSYNTHESIZER_HPP
#define _SPEECHSYNTHESIZER_HPP

#pragma warning(disable:4995)

#include < sapi.h >
#include < sphelper.h >
#include < string >
#include < cstring >


using namespace std;

class SpeechSynthesizer
{
public:
	SpeechSynthesizer();
	void initSAPI();
	void setSpeakerMike();
	void setSpeakerSam();
	void setSpeakerMary();
	void setVoiceRate(int rate);
	void setVoiceVolume(int vol);
	void setVoicePitch(int pitch);
	void setVoiceOutputWavFile(string filename);
	void setVoiceOutputDefault();
	void speak(string str);
	void readTextFile(string filename);


// protected:
	CComPtr < ISpVoice > cpVoice;
	int _pitch;
};

#endif

2009年11月1日星期日

在 Windows 環境下使用 SAPI 開發語音程式（三）：類別核心架構

接下來針對 SpeechRecognizer 這個 class 做一個介紹。

首先注意到它的 member functions 有 void thrRun(void*) 、
void activate() 、 void deactivate() ，以及 member variable
有 bool _active ，表示我希望程式執行的時候，這個物件是可以持續
跑的（參見 A Running Object 這篇文章）。

我打算在執行的時候，用 activate() 讓 SpeechRecognizer 物件跑起
來，然後這個物件就不斷地聽有沒有使用者的命令。如果聽到有定義
的命令，就執行相應的動作。

接著就是為了使用 SAPI ，另外需要的 member functions 和
member variables 。

首先是 initSAPI() ，

##CONTINUE##


void SpeechRecognizer::initSAPI()
{
	// 首先要初始化 COM 元件
	if(FAILED(::CoInitialize(NULL))){
		exitError(TEXT("init COM Failed!")); 
	}
	
	HRESULT hr;
	// 初始化語音識別引擎
	hr = _cpEngine.CoCreateInstance(CLSID_SpSharedRecognizer);
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpEngine.CoCreateInstance"));
	}

	// 初始化語音識別內容
	hr = _cpEngine->CreateRecoContext( &_cpRecoCtxt );
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpEngine->CreateRecoContext"));
	}

	// 設定當識別事件發生的時候，是用什麼方式通知程式
	// 絕大多數的說明文件都是採用 Win32 Message 來實作，
	// 但我不想被 Win32 API 的那套框架限制住，
	// 所以我用 NotyfyWin32Event 的方式
	hr = _cpRecoCtxt->SetNotifyWin32Event();
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpRecoCtxt->SetNotifyWin32Event"));
	}

	// SAPI 有自己定義一些事件，像是識別成功、識別失敗
	// 或是語音剛開始、語音結束等等
	// 要先告訴 SAPI 哪些事件是我們有興趣的
	hr = _cpRecoCtxt->SetInterest(SPFEI(SPEI_RECOGNITION), SPFEI(SPEI_RECOGNITION));
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpRecoCtxt->SetInterest"));
	}

	// 初始化文法（Grammar）
	hr = _cpRecoCtxt->CreateGrammar(0, &_cpGrammar);
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpRecoCtxt->CreateGrammar"));
	}
	
	// 載入定義好的文法檔案，一個 xml 檔
	hr = _cpGrammar->LoadCmdFromFile(L"test.xml", SPLO_DYNAMIC);
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpCmdGrammar->LoadCmdFromFile"));
	}

	// Set rules to active, we are now listening for commands
	hr = _cpGrammar->SetRuleState(NULL, NULL, SPRS_ACTIVE);
	if(FAILED(hr)){
		cleanupSAPI();
		exitError(TEXT("_cpGrammar->SetRuleState"));
	}
}

當 SAPI 的初始化和設定都完成以後，接下來就要定義不同的識別結
果要有哪些不同的反應。這部份定義在 executeCommand() 裡面，另
外對照文法的 xml 檔比較容易看懂。


void SpeechRecognizer::executeCommand(ISpPhrase *pPhrase)
{
	SPPHRASE* pElements;
	if(SUCCEEDED(pPhrase->GetPhrase(&pElements))){
		switch(pElements->Rule.ulId){
			case 1:
				cout << "robot" << endl;
				break;
			case 2:
				cout << "hello" << endl;
				break;
			default:
				cout << "The action of RULE " << pElements->Rule.ulId << " is not define." << endl;
				;
		}
	}
}


 
	
	
	
	
	 
		robot

	
	
		hello

	
	
		renbot

	
	
		Chien Hao

應該不難猜，當辨識成功之後，結果最後會傳到 executeCommand()
這個 function 中。其中 Rule.ulId 就是在文法檔中定義的 ID 。

cleanSAPI 顧名思義，當我們要結束程式前，需要先釋放 COM 還有 SAPI
的資源，這裡就不贅述了。

最後是很關鍵的 thrRun()：


void SpeechRecognizer::thrRun(void* ptr)
{
	SpeechRecognizer* pSR = (SpeechRecognizer*)ptr;

	HRESULT hr;
	while(pSR->_active){
		// 等待識別結果，有結果立刻返回
		// 沒有結果的話，每 100 ms 返回一次
		hr = pSR->_cpRecoCtxt->WaitForNotifyEvent(100);

		// 返回後，檢查是否有識別結果
		if(hr == 0){
			// 有結果的話，呼叫 executeCommand()
			// 並且將結果當參數傳過去
			CSpEvent event;
			event.GetFrom(pSR->_cpRecoCtxt);
			pSR->executeCommand(event.RecoResult());
		}
	}
	_endthread();
}

看以上程式碼以後，稍微歸納基本流程：
1. 初始化 COM 元件和 SAPI
2. 物件啟動，等待 Win32 event
3. 一旦有 event 出現，呼叫 executeCommand() ，並且將結果當參數傳入
4. 在 executeCommand() 中，根據識別結果產生不同動作

2009年10月24日星期六

在 Windows 環境下使用 SAPI 開發語音程式（二）：程式碼概觀

我個人在 coding 的時候有個癖好，就是喜歡極簡風。所以當我要寫
一個軟體模組，我都會盡量堅持一個 .cpp 和一個 .hpp 。

先把 code 秀出來好了。


// SpeechRecognizer.hpp
#ifndef _SPEECHRECOGNIZER_HPP
#define _SPEECHRECOGNIZER_HPP

#include < sphelper.h >
#include < windows.h >

class SpeechRecognizer
{
public:
	void activate();
	void deactivate();
	static void thrRun(void*);

	void initSAPI();
	void cleanupSAPI();
	void executeCommand(ISpPhrase *pPhrase);
	void exitError(LPTSTR lpszFunction);
//protected:
	// for running
	bool _active;

	// for SAPI
	CComPtr< ISpRecoContext > _cpRecoCtxt;
	CComPtr< ISpRecoGrammar > _cpGrammar;
	CComPtr< ISpRecognizer >	_cpEngine;
};
#endif

##CONTINUE##


// SpeechRecognizer.cpp

#include "SpeechRecognizer.hpp"
#include < process.h >
#include < iostream >

using namespace std;

void SpeechRecognizer::activate()
{
	initSAPI();
	_active = true;
	_beginthread(SpeechRecognizer::thrRun, 0, this);
}

void SpeechRecognizer::deactivate()
{
	_active = false;
	cleanupSAPI();
}

void SpeechRecognizer::thrRun(void* ptr)
{
	SpeechRecognizer* pSR = (SpeechRecognizer*)ptr;

	HRESULT hr;
	while(pSR->_active){
		hr = pSR->_cpRecoCtxt->WaitForNotifyEvent(100);
		if(hr == 0){
			CSpEvent event;
			event.GetFrom(pSR->_cpRecoCtxt);
			pSR->executeCommand(event.RecoResult());
		}
	}
	_endthread();
}

void SpeechRecognizer::initSAPI()
{
	if(FAILED(::CoInitialize(NULL))){ exitError(TEXT("init COM Failed!")); }
	
	HRESULT hr;
	
    // create a recognition engine
	hr = _cpEngine.CoCreateInstance(CLSID_SpSharedRecognizer);
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpEngine.CoCreateInstance")); }

	// create the command recognition context
	hr = _cpEngine->CreateRecoContext( &_cpRecoCtxt );
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpEngine->CreateRecoContext")); }

	// Let SR know that window we want it to send event information to, and using
	// what message
	hr = _cpRecoCtxt->SetNotifyWin32Event();
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpRecoCtxt->SetNotifyWin32Event")); }

	// Tell SR what types of events interest us.  Here we only care about command
	// recognition.
	hr = _cpRecoCtxt->SetInterest(SPFEI(SPEI_RECOGNITION), SPFEI(SPEI_RECOGNITION));
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpRecoCtxt->SetInterest")); }

	// Load our grammar, which is the compiled form of simple.xml bound into this executable as a
	hr = _cpRecoCtxt->CreateGrammar(0, &_cpGrammar);
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpRecoCtxt->CreateGrammar")); }

	hr = _cpGrammar->LoadCmdFromFile(L"test.xml", SPLO_DYNAMIC);
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpCmdGrammar->LoadCmdFromFile")); }

	// Set rules to active, we are now listening for commands
	hr = _cpGrammar->SetRuleState(NULL, NULL, SPRS_ACTIVE);
	if(FAILED(hr)){ cleanupSAPI(); exitError(TEXT("_cpGrammar->SetRuleState")); }
}

void SpeechRecognizer::cleanupSAPI()
{
	// Release grammar, if loaded
    if (_cpGrammar){
		_cpGrammar.Release();
    }
    // Release recognition context, if created
    if (_cpRecoCtxt){
        _cpRecoCtxt->SetNotifySink(NULL);
        _cpRecoCtxt.Release();
    }
    // Release recognition engine instance, if created
	if (_cpEngine){
		_cpEngine.Release();
	}
	CoUninitialize();
}

void SpeechRecognizer::executeCommand(ISpPhrase *pPhrase)
{
	SPPHRASE* pElements;
	if(SUCCEEDED(pPhrase->GetPhrase(&pElements))){
		switch(pElements->Rule.ulId){
			case 1:
				cout << "robot" << endl;
				break;
			case 2:
				cout << "hello" << endl;
				break;
			default:
				;
		}
	}
}
void SpeechRecognizer::exitError(LPTSTR lpszFunction)
{
    // Retrieve the system error message for the last-error code
    LPVOID lpMsgBuf;
    LPVOID lpDisplayBuf;
    DWORD dw = GetLastError(); 
    FormatMessage(
        FORMAT_MESSAGE_ALLOCATE_BUFFER | 
        FORMAT_MESSAGE_FROM_SYSTEM |
        FORMAT_MESSAGE_IGNORE_INSERTS,
        NULL,
        dw,
        MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
        (LPTSTR) &lpMsgBuf,
        0, NULL );
    // Display the error message and exit the process
    lpDisplayBuf = (LPVOID)LocalAlloc(LMEM_ZEROINIT, 
        (lstrlen((LPCTSTR)lpMsgBuf) + lstrlen((LPCTSTR)lpszFunction) + 40) * sizeof(TCHAR)); 
    StringCchPrintf((LPTSTR)lpDisplayBuf, 
        LocalSize(lpDisplayBuf) / sizeof(TCHAR),
        TEXT("%s failed with error %d: %s"), 
        lpszFunction, dw, lpMsgBuf); 
    MessageBox(NULL, (LPCTSTR)lpDisplayBuf, TEXT("Error"), MB_OK); 

    LocalFree(lpMsgBuf);
    LocalFree(lpDisplayBuf);
    ExitProcess(dw); 
}

跟一般程式不同的地方在於， SAPI 需要「文法」，才能真正擁有語音辨識的能力。

通常關於文法需要由另一個 .xml 檔案提供。




	robot
	hello

網頁

2009年11月27日 星期五

在 Windows 環境下使用 SAPI 開發語音程式（四）：語音合成

2009年11月1日 星期日

在 Windows 環境下使用 SAPI 開發語音程式（三）：類別核心架構

2009年10月24日 星期六

在 Windows 環境下使用 SAPI 開發語音程式（二）： 程式碼概觀

2009年11月27日星期五

2009年11月1日星期日

2009年10月24日星期六

在 Windows 環境下使用 SAPI 開發語音程式（二）：程式碼概觀