diff --git a/doc/simon/index.docbook b/doc/simon/index.docbook index e54fc9f1..50045b4a 100644 --- a/doc/simon/index.docbook +++ b/doc/simon/index.docbook @@ -1,4267 +1,4267 @@ Simon"> ]> The &kmyapplication; Handbook 2008-2012 Peter Grasch 2012-12-14 0.4 &FDLNotice; Peter H. Grasch peter.grasch@bedahr.org &kmyapplication; is an open source speech recognition solution. KDE kdeutils Kapp Simon recognition speech voice command control scenarios acoustic accessibility Introduction &kmyapplication; is the main front end for the Simon open source speech recognition solution. It is a Simond client and provides a graphical user interface for managing the speech model and the commands. Moreover, Simon can execute all sorts of commands based on the input it receives from the server: Simond. In contrast to existing commercial offerings, Simon provides a unique do-it-yourself approach to speech recognition. Instead of predefined, pre-trained speech models, Simon does not ship with any model whatsoever. Instead, it provides an easy to use end-user interface to create language and acoustic models from scratch. Additionally the end-user can easily download created use cases from other users and share his / her own. The current release can be used to set up command-and-control solutions especially suitable for disabled people. However, because of the amount of training necessary, continuous, free dictation is neither supported nor reasonable with current versions of Simon. Because of its architecture, the same version of Simon can be used with all languages and dialects. One can even mix languages within one model if necessary. Overview Architecture The main recognition architecture of Simon consists of three applications. &kmyapplication;This is the main graphical interface.It acts as a client to the Simond server. SimondThe recognition server. KSimondA graphical front-end for Simond. These three components form a real client / server solution for the recognition. That means that there is one server (Simond) for one or more clients (&kmyapplication;; this application). KSimond is just a front-end for Simond which means it adds no functionality to the system but rather provides a way to interact with Simond graphically. Additionally to the Simon, Simond and KSimond other, more specialized applications are also part of this integrated Simon distribution. SamProvides more in-depth control to your speech model and allows to test the acoustic model. SSC / SSCdThese two applications can be used to collect large amount of speech samples from different persons more easily. AfarasThis simple utility allows users to quickly check large corpora of speech data for erroneous samples. Please refer to the individual handbooks of those applications for more details. Architecture &kmyapplication; is used to create and maintain a representation of your pronunciation and language. This representation is then sent to the server Simond which compiles it into a usable speech model. &kmyapplication; then records sound from the microphone and transmits it to the server which runs the recognition on the received input stream. Simond sends the recognition result back to the client (&kmyapplication;). &kmyapplication; then uses this recognition result to execute commands like opening programs, following links, &etc; Simond identifies its connections with a user / password combination which is completely independent from the underlying operating system and its users. By default a standard user is set up in both Simon and Simond so the typical use case of one Simond server per Simon client will work out of the box. Every Simon client logs onto the server with a user / password combination which identifies a unique user and thus a unique speech model. Every user maintains his own speech model but may use it from different computers (different, physical Simon instances) simply by accessing the same Simond server. One Simond instance can of course also serve multiple users. If you want to open up the server to the Internet or use multiple users on one server, you will have to configure Simond. Please see the Simond manual for details. Required Resources for a Working &kmyapplication; Setup For background information about speech models, please refer to the Speech Recognition: Background section. To get Simon to recognize speech and react to it you need to set up a speech model. Speech models describe how your voice sounds, what words exist, how they sound and what word combination (sentences or structures) exist. A speech model basically consists of two parts: Language model: Describes all existing words and what sentences are grammatically correct Acoustic model: Describes how words sound You need both these components to get Simon to recognize your voice. In Simon, the language model will be created from your active scenarios and the acoustic model will be either built solely through your voice recordings (training) or with the help of a base model. Scenarios One scenario makes up one complete use case of Simon. To control Firefox, for example, the user just installs the Firefox scenario. In other words, scenarios tell Simon what words and phrases to listen for and what to do when they are recognized. Because scenarios do not contain information about how these words and phrases actually sound, they can be shared and exchanged between different Simon users without problems. To accommodate this community based repository pool, a category for Simon scenarios has been created on kde-files.org where the scenarios, which are just simple text files (XML format), can be exchanged easily. In most cases scenarios are tailored to work best with a specific base model to avoid issues with the phoneme set. For information on how to use scenarios in Simon, please refer to the Scenario section in the Use Simon chapter. Acoustic model As mentioned above, you need an acoustic model to activate Simon. You can either create your own or use and even adapt a base model. Base models are already generated, most often speaker independent, acoustic models that can be used with Simon. The following table shows what is required, depending on your Simon configuration: Ways to an acoustic model Training required Base model required Model creation backend required Static base model No Yes No Adapted base model Yes Yes Yes User-generated model Yes No Yes
Backends Simon uses external software to build acoustic models and to recognize speech. Usually, these backends can be split into two distinct components: The "model compiler" or "model generation" backend used to create or adapt acoustic models and the "recognizer" used to recognize speech with the help of these models. Not all operation modes of Simon will require a model compiler backend. Please refer to the next section about details on when this is the case. Two different backends are supported: Julius / HTK Models will be created with the HTK. Julius will be used as recognizer. To use this backend, please make sure that you have an up-to-date version of both these tools installed. CMU SPHINX This backend, also often simply referred to as "SPHINX backend", uses the PocketSphinx recognizer and the SphinxTrain model generation backend. Please refer to the CMU SPHINX website for more details. The CMU SPHINX backend requires that Simon is built with the optional SPHINX support. If you have not compiled Simon from source, please refer to your distribution for more information. If you are using base models, Simon will automatically select the appropriate backend for you. However, if you want to build your own models from scratch (user-generated model, see below) and have a certain preference, please refer to the Simond configuration for more information. Base models created for one backend are not compatible with any other backend. Please refer to the compatibility matrix for details. Types of base models There are three types of base models: Static base model Adapted base model User-generated model For information on how to use base models in Simon, please refer to the Base Models section in the Use Simon chapter. Static base model Static base models simply use a pre-compiled acoustic model without modifying it. Any training data collected through Simon will not be used to improve the recognition accuracy. This type of model does not require the model creation backend to be installed. Adapted base model By adapting a pre-compiled acoustic model you can improve accuracy by adapting it to your voice. Collected training data will be compiled in an adaption matrix which will then be applied to the selected base model. This type of model does require the model creation backend to be installed. User-generated model When using user-generated models, the user is responsible for training his own model. No base model will be used. The training data will be used to compile your own acoustic model allowing you to create a system which directly reflects your voice. This type of model does require the model creation backend to be installed. Requirements To build, adapt or use acoustic models of different types, certain software needs to be installed. Base model requirements CMU SPHINX Julius / HTK Static base model PocketSphinx Julius Adapted base model SphinxTrain, PocketSphinx HTK, Julius User-generated model SphinxTrain, PocketSphinx HTK, Julius
All four tools, HTK, Julius, PocketSphinx and SphinxTrain, can safely be installed at the same time. SPHINX support in Simon must be enabled during compile time and might not be available on your platform. Please refer to your distribution. The Simon Windows installer includes Julius, PocketSphinx and SphinxTrain but not the HTK. Please refer to the installation section for information on how to install it should you find the need for it.
Where to get base models Simon base models are packaged as .sbm files. If you happen to have raw model files for your backend, you can package them into a compatible SBM container within Simon. Please refer to the speech model configuration for details. Not all SBM models may work for you. Please refer to the model backends section for details. To keep this list of available base models up to date, please refer to the list in our online wiki. Phoneme set issues In order for base models to work, both your scenarios and your base model need to use the same set of phonemes. In practice, this often just means that you need to match scenarios to your base model. The name of &kmyapplication; base models will most likely start with a tag like "[EN/VF/JHTK]". Try to download scenarios that start with the same tag. You can not use scenarios designed for different phoneme set (different base model). If Simon recognizes this error, it will try to disable affected words by removing them from the created speech model. These words will be marked with a red background in the vocabulary of the scenario. To re-enable them, transcribe them with the proper phoneme set or use a user-generated model. Hint If you design a new scenario it is therefore a good idea to use the dictionary that was used to create the base model as shadow dictionary. This way Simon will suggest the correct phonemes when adding the words automatically.
Using &kmyapplication;: Typical user The following sections will describe how to use Simon. First run wizard On the first start of Simon, this assistant will guide you through the initial configuration of &kmyapplication;. First run: Welcome The configuration consists of five easy steps which are outlined below. You can skip each step and even the whole wizard if you want to - in that case, the system will be set up with default values. However, please note that without any configuration, there won't be any recognition. Scenarios In this step you can add or download scenarios. First run: Scenarios To download scenarios from the online repository, select OpenDownload to open the download dialog pictured below. First run: Get scenarios Especially for new users it is recommended to try some scenarios first to see how the system works before diving into configuring it exactly for your use case. After completing the assistant, you can change the scenario configuration with the use of the scenario management dialog. If you are planning to use a base model, make sure that you download matching scenarios. Base models In this step you can set up Simon to use base models. First run: Base models Again, you can download base models from an online repository through Open modelDownload. First run: Base model download To use a user-generated model, select Do not use a base model. After completing or aborting the first run wizard you can change configuration options defined here in the Simon configuration. Server Internally, Simon is a server / client application. If you want to take advantage of a network based installation, you can provide the server address here. First run: Server The default configuration is sufficient for a normal installation and will assume that you use a local Simond server that will be started automatically and stopped with Simon. After completing or aborting the first run wizard you can change configuration options defined here in the server configuration. Sound configuration Because Simon recognizes sound from one or more microphones, you have to tell Simon which devices you want to use for recognition and training. First run: Sound configuration Simon can use one or more input- and output devices for different tasks. You can find more information about Simon's multiple device capabilities in the Simon sound configuration section. If you don't have at least one working input device for recognition, you will not be able to activate Simon. After completing or aborting the first run wizard you can change configuration options defined here in the sound configuration. Volume calibration For Simon to work correctly, you need to configure your microphones volume to a sensible level. First run: Volume calibration For more details on this, please see the general section about Volume Calibration. The Simon Main Window A screenshot of &kmyapplication; Screenshot The Simon main window is split into four logical sections. On the top left, you can see the scenario section, to its right you find the training section, on the bottom left is the acoustic model and finally, on the right of that, the recognition section. The Simon main window can be hidden at any time by clicking on the Simon logo in the system tray (usually next to the system clock in the task bar) which will minimize Simon to the tray. Click it again to show the main window again. Main window: Scenarios A list of scenarios shows the currently loaded scenarios. You can manage this selection by clicking Manage scenarios which will open the scenario management dialog. To modify a scenario, select it from the list and open it by pressing Open "<name>". Main window: Training This section shows all training texts from all currently active scenarios. Selecting a training text will highlight the parent scenario in the scenario section. You can start to train the recognition by selecting a text and clicking on Start training. Please note that, depending on your selected model type, training may or may not improve your recognition accuracy. The acoustic model section (see below) in the &kmyapplication; main menu tells you if training will have an effect for your specific configuration. For more information, please refer to the base model section for background information on this subject. The gathered training corpus can be managed by selecting Manage training data which will open the sample management dialog. To help build a general, open speech corpus, please consider contributing your training corpus to the Voxforge project by selecting FileContribute samples to bring up the sample upload assistant. Main window: Acoustic model Here, &kmyapplication; shows information about the currently used base- and active model. Select Configure acoustic model to configure the base model. Main window: Recognition This section shows information about the recognition status. If &kmyapplication; is connected to the server, you can activate and deactivate the recognition by toggling the Activate button. If this control element is not available, make sure you are connected by selecting FileConnect from &kmyapplication;s menu. An integrated volume calibration widget monitors the configured recognition devices. The sound setup can be modified by selecting Configure audio to bring up the sound configuration. Scenarios This section describes how to import and remove scenarios to your Simon configuration. For general information about scenarios, please refer to the background chapter. If you want to create, edit or export scenarios, please refer to the advanced usage section. To modify your scenario configuration, first open the scenario management dialog by pressing Manage scenarios in the Simon main window. Manage scenarios To activate or deactivate a scenario you can use the arrow buttons between the two lists or simply double click the option you want to load / unload. More information about individual scenarios can be found in the tooltips of the list items. Import Scenario Scenarios can be imported from a local file in Simon's XML scenario file format but can also be directly downloaded and imported from the internet. - When downloading scenarios, the list of scenarios is retrieved from Simon Scenarios subsection of the OpenDesktop site KDE-files.org. + When downloading scenarios, the list of scenarios is retrieved from Simon Scenarios subsection of the OpenDesktop site KDE Store. Download scenarios If you create a scenario that might be valuable for other Simon users, please consider uploading it to this online repository and help other Simon users. Delete Scenario To delete a scenario, select the scenario and click the Delete button. Because scenarios are synchronized with the recognition server, you can restore deleted scenarios through the model synchronization backup. Recordings If you are using user-generated or adapted models, Simon builds its acoustic model based on transcribed samples of the users voice. Because of this, the recorded samples are of vital importance for the recognition performance. Volume It is important that you check your microphone volume before recording any samples. Simon Calibration The current version of &kmyapplication; includes a simple way of ensuring that your volume is configured correctly. Simon Volume Calibration By default the volume calibration is displayed before starting any recording in &kmyapplication;. To calibrate simply read the text displayed. The calibration will monitor the current volume and tell you to either raise or lower the volume but you have to do that manually in your systems audio mixer. During calibration, try to talk normally. Don't yell but don't be overly quiet either. Take into account that you should generally use the same volume setting for all your training and for the recognition too. You might speak a little bit louder (unconsciously) when you are upset or at another time of the day so try to raise your voice a little bit to anticipate this. It is much better to have a little quieter samples than to start clipping. In the &kmyapplication; settings, both the text displayed and the levels considered correct can be changed. If you leave the text empty, the default text will be displayed. In the options you can also deactivate the calibration completely. See the training section for more details. Audacity Calibration Alternatively you can use an audio editing tool like the free Audacity to monitor the recording volume. Too quiet: Volume: Too quiet Too loud: Volume: Too loud Perfect volume: Perfect volume Silence To help Simon with the automatic segmentation it is recommended to leave about one or two seconds of silence on the recording before and after reading the prompted text. Current Simon versions include a graphical notice on when to speak during recording. The message will tell the user to wait for about half a second: Please wait ... before telling the user to speak: Please speak This method of visual feedback proved especially valuable when recording with people who cannot read the prompted text for themselves and therefore need someone to tell them what they have to say. The colorful visual cue tells them when to start repeating what the facilitator said without the need of unreliable hand gestures. Content Generally we recommend to record roughly the same sentences that Simon should recognize later. (Obviously that does not apply to massive sample acquisitions where other properties like phonetic balance are more important) Care should be taken to avoid recordings like One One One to quickly ramp up the recognition rate property. Such recordings often decrease recognition performance because the pronunciation differs greatly from saying the word in isolation. Microphone For Simon to work well, a high quality microphone is recommended. However, even relatively cheap headsets (around 30 Euros) achieve very good results - magnitudes better than internal microphones. For maximum compatibility we recommend USB headsets as they usually support the necessary samplerate of 16 kHz, are very well supported from both Microsoft Windows as well as GNU/Linux and normally don't require special, proprietary drivers to operate. Sample Quality Assurance Simon will check each recording against certain criteria to ensure that the recorded samples are not erroneous or of poor quality. If Simon detects a problematic sample, it will warn the user to re-record the sample. Currently, Simon checks the following criteria: Sample peak volume If the volume is too loud and the microphone started to clip (Clipping on wikipedia), Simon will display a warning message urging the user to lower the microphone volume. Signal to noise ratio (SNR) Simon will automatically determine the signal to noise ratio of each recording. If the ratio is below a configurable threshold, a warning message will be displayed. The default value of 2300 % means that for Simon to accept a sample as correctly recorded the peak volume has to be 23 times louder than the noise baseline (lowest average over 50 ms). Often this can be a result of either a very low quality microphone, high levels of ambient noise or a low microphone gain coupled with a microphone boost option in the system mixer. SNR warning message triggered by an empty sample. This information dialog is displayed when clicking on the More information button on the recording widget. Empty sample triggering the SNR warning Contribute Samples The base models that can be used with Simon to augment or replace training are built from other peoples speech samples. In order to create high quality base models, a large amount of training samples are necessary. If you trained your local Simon installation, you gathered valuable voice samples that could improve the quality of the general model. Through &kmyapplication;'s "Contribute Samples" dialog you can upload those recordings to benefit the Voxforge project to create high quality open source base models. Contributing Samples: Connect After connecting to the server, &kmyapplication; will ask for some basic meta-information. This information obviously contains no personal information. Instead, it will later be used to group together samples of similar speaker groups to build more accurate acoustic models. Contributing Samples: Provide information The duration of the upload process itself will depend on your internet connection. Generally speaking, this only transmits relatively little data because the audio samples collected by &kmyapplication; are generally very small: around 0.1 MB per sample. Contributing Samples: Upload Manage training data To view and modify your personal training corpus, you can access the training data management dialog by selecting Manage training data in the &kmyapplication; main window or the training section of any opened scenario. Manage training samples Modifying samples To listen to or re-record a sample, select it from the list and select Open Sample. Modifying a training sample In this dialog you can also modify the sample's group after it was recorded. If you remove the opened sample and do not re-record it, &kmyapplication; will offer to remove it from the corpus. Removing a training sample Clear training data After a confirmation dialog, this will remove all personal training data of the user. Importing Training Samples Using the import training data field you can import previously gathered training samples from previous Simon versions or manual training. This feature is very specific. Please use it with caution and make sure that you know exactly what you are doing before you continue. You can either provide a separate prompts file or let Simon extract the transcriptions from the filenames. When using prompts based transcriptions your prompts file (UTF-8) needs to contain lines of the following content: [filename] [content]. Filenames are without file extensions and the content has to be uppercase. For example: demo_2007_03_20 DEMO to import the file demo_2007_03_20.wav containing the spoken word Demo. Because prompts files do not contain a file extension, Simon will try wav, mp3, ogg and flac (in that order). If one of those match, no other extension will be tested and only the first file will be imported (in contrast to file based transcription where all files would be imported). When using file based transcriptions, a file called this_is_a_test.wav must contain This is a test and nothing else. Numbers and special characters (., -,...) in the filename are ignored and stripped. Files recorded by Simon 0.2 will follow this naming scheme so you can safely import them using the file name extraction method. Files generated by previous Simon versions should not be imported using this function but you can use the prompts based import for that. Imported files and their transcription are then added to the training corpus. To import a folder containing training samples just select the folder to import and depending on your import type also the prompts file. Import training data wizard The folder will be scanned recursively. This means that the given folder and all its subfolders will be searched for .wav, .flac, .mp3 and .ogg files. All files found will be imported. When importing the sound files, all configured post processing filters are applied. If you import anything other than WAV files you are responsible for decoding them during the import process (for example through post processing filters) or the model creation will fail. Configuration Simon was designed with high configurability in mind. Because of this, there are plentiful parameters that can be fine-tuned to your specific requirements. You can access Simon's configuration dialog through the application's main menu: SettingsConfigure Simon.... General Configuration The general configuration page lists some basic settings. If you want to show the first run assistant again, deselect Disable configuration wizard. General Configuration Please note that the option to start Simon at login will work on both Microsoft Windows and when you are using KDE's Plasma on Linux. Support for other desktop environments like Gnome, XFCE, &etc; might require manually placing Simon in the session autostart (please refer to the respective manuals of your desktop environment). When the option to start Simon minimized is selected, Simon will minimize to the system tray immediately after starting. Deselecting the option to warn when there are problems with samples deactivates the sample quality assurance. Recordings Simon uses fairly sophisticated internal sound processing to enable complex multi-device setups. Device Configuration The sound device configuration allows you to choose which sound device(s) to use, configure them and define additional recording parameters. Use the Refresh devices button if you have plugged in additional sound devices since you started Simon. Sound device configuration: General Most of the time you will want to use 1 channel and 16kHz (which is also the default) because the recognition only works on mono input and works best at 16kHz (8kHz and 22kHz being other viable options). Some low-cost sound cards might not support this particular mode in which case you can enable automatic resampling in the device's advanced configuration. Only change the channel and the samplerate if you really know what you are doing. Otherwise the recognition will most likely not work. Sound device configuration: Advanced options You can use Simon with more than one sound device at the same time. Use Add device to add a new device to the configuration and Remove device to remove it from your configuration. The first device in your sound setup cannot be removed. For each device you can determine for what you want the device to be used: Training or recognition (last one only applicable for input devices). If you use more than one device for training, you will create multiple sound files for each utterance. When using multiple devices for recognition each one feeds a separate sound input stream to the server resulting in recognition results for each stream. If you use multiple output devices the playback of the training samples will play on all configured audio devices. When using different sample rates for your input devices, the output will only play on matching output devices. If you for example have one input device configured to use 16kHz and the other to use 48kHz, the playback of samples generated by the first one will only play on 16kHz outputs, the other one only on 48kHz devices. In the device's advanced configuration, you can also define the sample group tag of the produced training samples and set activation context conditions. If you set up this device to be used for recognition and (any of) it's activation requirements are not met, the device will not record. This can be used to augment or even replace the traditional voice activity detection with context information. For example, add a face detection condition to the recording devices activation requirements to only enable the recognition when you're looking at the webcam. Voice Activity Detection The recognition is done on the Simond server. See the architecture section for more details. The sound stream is not continuous but is segmented by the Simon client. This is done by something called voice activity detection. Voice activity detection Here you can configure this segmentation through the following parameters: Cutoff level Everything below this level is considered silence (background noise). Head margin Cache for as long as head margin to start consider it a real sample. During this whole time the input level needs to be above the cutoff level. Tail margin After the recording went below the cutoff level, Simon will wait for as long as tail margin to consider the current recording a finished sample. Skip samples shorter than Samples that are shorter than this value are not considered for recognition (coughs, &etc;). Training settings Training settings When the option Default to power training is selected, Simon will, when training, automatically start- and stop the recording when displaying and hiding (respectively) the recording prompt. This option only sets the default value of the option, the user can change it at any time before beginning a training session. The configurable font here refers to the text that is recorded to train the acoustic model (through explicit training or when adding a word). This option has been introduced after we have worked with a few clients suffering spastic disability. While we used the mouse to control Simon during the training, they had to read what was on the screen. At first this was very problematic as the regular font size is relatively small and they had trouble making out what to read. This is why we made the font and the font size of the recording prompt configurable. Here you can also define the required signal to noise ratio for Simon to consider a training sample to be correct. See the Sample Quality Assurance section for more details. On this configuration page you can also set the parameters for the volume calibration. It can be deactivated for both the add word dialog and the training wizard by unchecking the group box itself. The calibration itself uses the voice activity recognition to score your sound configuration. The prompted text can be configured by entering text in the input field below. If the edit is empty a default text will be used. Postprocessing All recorded (training) and imported (through the import training data) samples can be processed using a series of postprocessing commands. Postprocessing chains are an advanced feature and shouldn't be needed by the average user. Sound Configuration: Postprocessing The postprocessing commands can be seen as a chain of filters through which the recordings have to pass through. Using these filters one could define commands to suppress background noise in the training data or normalize the recordings. Given the program process_audio which takes the input- and output files as its arguments (⪚: process_audio in.wav out.wav) the postprocessing command would be: process_audio %1 %2. The two placeholders %1 and %2 will be replaced by the input filename and the output filename respectively. The switch to apply filters to recordings recorded with Simon enables the postprocessing chains for samples recorded during the training (including the initial training while adding the word). If you don't select this switch the postprocessing commands are only applied to imported samples (through the import training data wizard). Context Every sample recorded with Simon is assigned a sample group. When creating the acoustic model from the training samples Simon can take the current situation into account to only use a subset of all gathered training data. Sound Configuration: Context For example, in a system where multiple, very different speakers use one shared setup, context conditions can be set up to automatically build separate models for both users depending on the current situation. The above screenshot, for example, shows a setup where, given that all samples of "peter" were tagged "peters_samples" and all samples of "mathias" were tagged "mathias_samples" (refer to the device configuration for more information on how to set up sample groups), the active acoustic model will only contain the current user's own samples as long as the file /home/bedahr/.username contains either "peter" or "mathias". Another example use-case would be to switch to a more noise-resistant acoustic model when the user starts playing music. Speech Model Here you can adjust the parameters of the speech model. Base model You can optionally use base models to limit / circumvent the training or to avoid installing a model creation backend. Please refer to the general base model section for more details about base models. Base model configuration To use a user-generated model, select Do not use a base model. To use a static base model, select a base model and do not select Adapt base model using training samples. To instead use an adapted base model, check Adapt base model using training samples after selecting a base model. Simon base models are packaged in .sbm files. To add base models to the selection, you can either import local models (Open modelImport), download them from an online repository (Open modelDownload) or create new ones from raw files (Open modelCreate from model files). Importing a base model from the internet If you have raw model files produced by either supported model creation backend, you can package them into SBM container for use with &kmyapplication;. Creating base models from raw files You can also export your currently active model by selecting Export active model. The exported SBM file will contain your full acoustic model (ignoring the current context) that can be shared with other Simon users. Training data This section allows to configure the training samples. Training data configuration The samplerate set here is the target samplerate of the acoustic model. It has nothing to do with the recording samplerate and it is the responsibility of the user to ensure that the samples are actually made available in that format (usually by recording in that exact samplerate or by defining postprocessing commands that resample the files; see the sound configuration section for more details). Usually either 16kHz or 8kHz models are built / used. 16kHz models will have higher accuracy over 8kHz models. Going higher than 16kHz is not recommended as it is very cpu-intensive and in practice probably won't result in higher recognition rates. Moreover, the path to the training samples can be adjusted. However, be sure that the previously gathered training samples are also moved to the new location. If you use automatic synchronization the Simond would alternatively also provide Simon with the missing sample but copying them manually is still recommended for performance reasons. Language Profile In the language profile section you can select a previously built or downloaded language profile to aid with the transcription of new words. Language profile configuration Model Extensions Here you can configure the base URL that is going to be used for the automatic bomp import. The default points to the copy on the Simon listens server. Model Extensions Recognition Here you can configure the recognition and model synchronization with the Simond server. Server Using the server configuration you can set parameters of the connection to Simond. General The Simon main application connects to the Simond server (see the architecture section for more information). Configure Server: General To identify individual users of the system (one Simond server can of course serve multiple Simon clients), Simon and Simond use users. Every user has his own speech model. The username / password combination given here is used to log in to Simond. If Simond does not know the username or the password is incorrect, the connection will fail. See the Simond manual on how to setup users for Simond. The recognition itself - which is done by the server - might not be available at all times. For example it would not be possible to start the recognition as long as the user does not have a compiled acoustic and language model which has to be created first (during synchronization when all the ingredients - vocabulary, grammar, training - are present). Using the option to start the recognition automatically once it is available, Simon will request to start the recognition when it receives the information that it is ready (all required components are available). Using the Connect to server on startup option, Simon will automatically start the connection to the configured Simond servers after it has finished loading the user interface. Network Simon connects to Simond using TCP/IP. Configure Server: Network As of now (Simon 0.4), encryption is not yet supported. The timeout setting specifies, how long Simon will wait for a first reply when contacting the hosts. If you are on a very, very slow network and/or use connect on start on a very slow machine, you may want to increase this value if you keep getting timeout errors and can resolve them by trying again repeatedly. Simon supports to be configured to use more than one Simond. This is very useful if you for example are going to use Simon on a laptop which connects to a different server depending where you are. You could for example add the server you use when you are home and the server used when you are at work. When connecting, Simon will try to connect to each of the servers (in order) until it finds one server that accepts the connection. To add a server, just enter the host name or IP address and the port (separated by :) or use the dialog that appears when you select the blue arrow next to the input field. Synchronization and Model Backup Here you can configure the model synchronization and restore older versions of your speech model. Synchronization and Model Backup Simon creates the speech input files which are then compiled and used by the Simond server (see the section architecture for more details). The process of sending the speech input files, compiling them and receiving the compiled versions is called synchronization. Only after the speech model is synchronized the changes take effect and a new restore point is set. This is why per default Simon will always synchronize the model with the server when it changes. This is called Automatic Synchronization and is the recommended setting. However, if you want more control you can instruct Simon to ask you before starting the synchronization after the model has changed or to rely on manual synchronization all together. When selecting the manual synchronization you have to manually use the ActionsSynchronize menu item of the Simon main window every time you want to compile the speech model. The Simon server will maintain a copy of the last five iterations of model files. However, this only includes the source files (the vocabulary, grammar, &etc;) - not the compiled model. However, the compiled model will be regenerated from the restored source files automatically. After you have connected to the server, you can select one of the available models and restore it by clicking on Choose Model. Actions In the actions configuration you can configure the reactions to recognition results. Recognition The recognition of Simon computes not only the most likely result but rather the top ten results. Each of the results are assigned a confidence score between 0 and 1 (where 1 is 100% sure). Using the Minimum confidence you can set a minimum confidence for recognition results to be considered valid. If more than one recognition results are rated higher than the minimum confidence score, Simon will provide a popup listing the most likely options for you to choose from. Did you mean...? This popup can be disabled using the Display selection popup for ambiguous results check box. Dialog font Many plugins of Simon have a graphical user interface. The fonts of these interfaces can be configured centrally and independent of the systems font settings here. Input number with custom font Lists Here you can find the global list element configuration. This serves as a template for new scenarios but is also directly used for the popup for ambiguous recognition results. Text-to-speech Some parts of &kmyapplication;, most notably the dialog command plugin employ text-to-speech (or "TTS") to read text aloud. Backends Multiple external TTS solutions can be used to allow &kmyapplication; to talk. Multiple backends can be enabled at the same time and will be queried in the configured order until one is found that can synthesize the requested message. The following backends are available: RecordingsInstead of an engine to convert arbitrary text into speech, text-snippets can be pre-recorded and will be simply played back. JovieUses the Jovie TTS system. This requires a valid Jovie set-up. WebserviceThe webservice backend can be used to talk to any TTS engine that has a web front-end that returns .wav files. TTS: backend selection Recordings Instead of using an external TTS engine, you can also record yourself or other speakers reading the texts aloud. Simon can then play back these pre-recorded snippets when they are requested of its text-to-speech engine. These recorded sound bites are organized into "sets" of different speakers which can also be imported and exported to share them with other &kmyapplication; users. TTS: recordings Webservice Through the webservice backend, &kmyapplication; can use web-based TTS engines like MARY. TTS: webservice You can provide any URL. &kmyapplication; will replace any instance of "%1" within the configured URL with the text to synthesize. The backend expects the queried webservice to return a .wav file that will be streamed and outputted through &kmyapplication;'s sound layer - respecting the sound device configuration. Social desktop Scenarios can be uploaded and downloaded from within Simon. For this we use KDEs social desktop facilities and our own category for Simon scenarios on kde-files.org. If you already have an account on opendesktop.org you can input the credentials there. If you don't, you can register directly in the configuration module. The registration is of course free of charge. Webcam configuration In Webcam configuration, you can configure frames per second (fps) and select the webcam to use when multiple webcams are connected to your system. Webcam configuration Frames per second is the rate at which the webcam will produce unique consecutive images called frames. The optimal value of fps is between 5-15 for proper performance. Advanced: Adjusting the recognition parameters manually Simon is targeted towards end-users. Its interface is designed to allow even users without any background in speech technology to design their own language and acoustic models by providing reasonable default values for simple uses. In special cases (severe speech impairments for example), special configuration might be needed. This is why the raw configuration files for the recognition are also respected by Simon and can of course be modified to suit your needs. Julius There are basically two parts of the Julius configuration that can be adjusted: adin.jconfThis is the configuration of the Simon client of the sound stream sent from Simon to the Simond. This file is directly read by the adinstreamer. Simon ships with a default adin.jconf without any special parameters. You can change this system wide configuration which will affect all users if there are different user accounts on your machine who all use Simon. To just change the configuration of one of those users copy the file to the user path (see below) and edit this copy. julius.jconfThis is a configuration of the Simond server and directly influences the recognition. This file is parsed by libjulius and libsent directly. Simond ships with a default julius.jconf. Whenever there is a new user added to the Simond database, Simond will automatically copy this system wide configuration to the new user. After that the user is of course free to change it but it won't affect the other users. This way the template (the system wide configuration) can be changed without affecting other users. The path to the Julius configuration files will depend on your platform: Julius Configuration Files File Microsoft Windows GNU/Linux adin.jconf (system) (installation path)\share\apps\simon\adin.jconf `kde4-config --prefix`/share/apps/simon/adin.jconf adin.jconf (user) %appdata%\.kde\share\apps\simon\adin.jconf ~/.kde/share/apps/simon/adin.jconf julius.jconf (template) (installation path)\share\apps\simond\default.jconf `kde4-config --prefix`/share/apps/simond/default.jconf julius.jconf (user) %appdata%\.kde\share\apps\simond\models\(user)\active\julius.jconf ~/.kde/share/apps/simond/models/(user)/active/julius.jconf
Advanced: Creating new scenarios with &kmyapplication; The following chapter is aimed towards more experienced users who want to design their own scenarios. For general usage instruction, please refer to the chapter Using &kmyapplication;: Typical user. Introduction To add a new scenario, you first create a new scenario "shell" by adding a new scenario object and then open it in the &kmyapplication; main window. To instead modify an existing scenario, you of course just have to open it. A &kmyapplication; scenario contains the following components: Vocabulary Grammar Training texts Context Commands Before describing how to configure these elements in Simon, the next section provides background information that will help you understand the basic principles of speech modelling. This fundamental knowledge is necessary to design sensible scenarios. Speech recognition: background Before explaining exactly how you can create new scenarios with Simon, this section introduces some fundamental basics to speech recognition in general. Speech recognition systems take voice input (often from a microphone) and try to translate it into written text. To do that, they rely on statistical representations of human voice. To put it into simple terms: The computer learns how words - or more correctly the sounds that make up those words - sound. A speech model consists of two distinct parts: Language Model Acoustic Model Language Model The language model defines the vocabulary and the grammar you want to use. Vocabulary The vocabulary defines what words the recognition process should recognize. Every word you want to be able to use with Simon should be contained in your vocabulary. One entry in the vocabulary defines exactly one word. In contrast to the common use of the word word, in Simon word means one unique combination of the following: Wordname(The written word itself) Category(Grammatical category; for example: Noun, Verb, &etc;) Pronunciation(How the word is pronounced; Simon accepts any kind of phonetic as long as it does not use special characters or numbers) That means that plurals or even different cases are different words to Simon. This is an important design decision to allow more control when using a sophisticated grammar. In general, it is advisable to keep your vocabulary as sleek as possible. The more words, the higher the chance that Simon might misunderstand you. Example vocabulary (please note that the categories here are deliberately set to Noun / Verb to help the understanding; please refer to the grammar section why this might not be the best idea): Sample Vocabulary Word Category Pronunciation Computer Noun k ax m p y uw t er Internet Noun ih n t er n eh t Mail Noun m ey l close Verb k l ow s
Active Dictionary The vocabulary used for the recognition is referred to as active dictionary or active vocabulary. Shadow Dictionary As said above, the user should keep his vocabulary / dictionary as lean as possible. However, as a word in your vocabulary has to also have information about its pronunciation, it would also be good to have a large dictionary where you could look up the pronunciation and other characteristics of the words. Simon provides this functionality. We refer to this large reference dictionary as shadow dictionary. This shadow dictionary is not created by the user but can be imported from various sources. As Simon is a multi-language solution we do not ship shadow dictionaries with Simon. However, it is very easy to import them yourself using the import dictionary wizard. This is described in the Import Dictionary section. Language profile Additionally to a shadow dictionary, Simon can use a language profile to provide help with transcribing words. A language profile consists of rules how words are pronounced in the target language. It can be likened to the way that humans can often pronounce a word they never heard just because they know some implicit "pronunciation rules" of the language. Just as with humans, this process is not perfect but can provide a solid starting ground. This automatic deduction of a phoneme transcription from a written word is called "grapheme to phoneme conversion". Simon requires the Sequitur G2P grapheme to phoneme converter to be installed and set up for language profiles to work. If you have selected a pre-built language profile or built your own, Simon will automatically transcribe new words with it when they are not found in your shadow dictionary.
Grammar The grammar defines which combinations of words are correct. Let's look at an example: You want to use Simon to launch programs and close those windows when you are done. You would like to use the following commands: Computer, Internet to open a browser Computer, MailTo open a mail client Computer, closeTo close the current window Following English grammar, your vocabulary would contain the following: Sample Vocabulary Word Category Computer Noun Internet Noun Mail Noun close Verb
To allow the sentences defined above Simon would need the following grammar: Noun Noun for sentences like Computer Internet Noun Verb for sentences like Computer close While this would work, it would also allow the combinations Computer Computer, Internet Computer, Internet Internet, &etc; which are obviously bogus. To improve the recognition accuracy, we can try to create a grammar that better reflects what we are trying to do with Simon. It is important to remember that you define your own language when using Simon. That means that you are not bound to grammar rules that exist in whatever language you want to use Simon with. For a simple command and control use-case it would for example be advisable to invent new grammatical rules to eliminate the differences between different commands imposed by grammatical information not relevant for this use case. In the example above it is for example not relevant that close is a verb or that Computer and Internet are nouns. Instead, why not define them as something that better reflects what we want them to be: Improved Sample Vocabulary Word Category Computer Trigger Internet Command Mail Command close Command
Now we change the grammar to the following: Trigger Command This allows all the combinations described above. However, it also limits the possibilities to exactly those three sentences. Especially in larger models a well-thought-out grammar and vocabulary can mean a huge difference in recognition results.
Acoustic Model The acoustic model represents your pronunciation in a machine readable format. Let's look at the following sample vocabulary: Sample Vocabulary Word Category Pronunciation Computer Noun k ax m p y uw t er Internet Noun ih n t er n eh t Mail Noun m ey l close Verb k l ow s
The pronunciation of each word is composed of individual sounds which are separated by spaces. For example, the word close consists of the following sounds: k l ow s The acoustic model uses the fact that spoken words are composed of sounds much like written words are composed of letters. Using this knowledge, we can segment words into sounds (represented by the pronunciation) and assemble them back when recognizing. These building blocks are called phonemes. Because the acoustic model actually represents how you speak the phonemes of the words, training material is shared among all words that use the same phonemes. That means if you add the word clothes to the language model, your acoustic model already has an idea how the clo part is going to sound as they share the same phonemes (k, l, ow) at the beginning. To train the acoustic model (in other words to tell him how you pronounce the phonemes) you have to train words from your language model. That means that Simon displays a word which you read out loud. Because the word is listed in your vocabulary, Simon already knows what phonemes it contains and can thus learn from your pronunciation of the word.
Scenarios This section extends the previous one about basic scenario management and tells you how to create, edit and export scenarios. Manage scenarios Scenario hierarchies You can create scenario hierarchies by dragging and dropping active scenarios on top of each other. Manage scenarios: Scenario hierarchies Scenario hierarchies serve two purposes: The context system respects scenario hierarchies: If the parent scenario gets deactivated, all child scenarios will become deactivated as well. If you attempt to export a scenario that has children, &kmyapplication; will allow you to export them in a joint scenario package. This way, you can share multiple logically co-dependent scenarios (⪚ one "Office" scenario that contains sub-scenarios for "Word", "Excel", etc.). Adding a new Scenario To add a new scenario, select the Add button. A new dialog will be displayed. Add scenario When creating a new scenario, please give it a descriptive name. For the later upload on KDE files we would kindly ask you to follow a certain naming scheme although this is of course not a requirement: [<language>/<base model>] <name>. If, for example you create a scenario in English that works with the Voxforge base model and controls Mozilla Firefox this becomes: [EN/VF] Firefox. If your scenario is not specifically tailored to one phoneme set (base model), just omit the second tag like this: [EN] Firefox. The scenario version is just an incremental version number that makes it easier to distinguish between different revisions of a scenario. If your scenario needs a specific feature of Simon (for example because you use a new plugin), you can define minimum and maximum version numbers of Simon here. The license of your scenario can be set through the drop down. You can of course also add an arbitrary license text directly in the input field. You can then add your name (or alias) to the list of scenario authors. There you will also be asked for contact information. This field is purely provided as a convenient way to contact a scenario author for changes, problems, fanmail &etc; If you don't feel comfortable providing your email address you can simply enter a dash - denoting that you are not willing to divulge this information. Edit Scenario To edit scenarios, just select Edit from the Manage scenarios dialog. The dialog works exactly the same as the add scenario dialog. Export Scenario Scenarios can be exported to a local file in Simon's XML scenario file format and directly uploaded to the Simon Scenarios subsection of the OpenDesktop site KDE-files.org. To upload to OpenDesktop sites, you need an account on the site. Registration is very easy and of course free of charge. Simon allows you to upload new content directly from within Simon (Export > Publish). Upload scenario wizard: 1 of 4 Upload scenario wizard: 2 of 4 Upload scenario wizard: 3 of 4 Upload scenario wizard: 4 of 4 To use this functionality, simply enter your account credentials in the social desktop configuration in the Simon configuration. Vocabulary The vocabulary module defines the set of words of the scenario. Simon's Vocabulary Per default, the active vocabulary is shown. To display the shadow vocabulary select the tab Shadow Vocabulary. Every word states it recognition rate which at the moment is just a counter of how often the word has been recorded (alone or together with other words). Shadow Vocabulary Adding Words To add new words to the active vocabulary, use the add word wizard. Adding words to Simon is basically a two step procedure: Defining the word Initial training Defining the Word Firstly, the user is asked which word he wants to add. Select the word to add When the user proceeds to the next page, Simon automatically tries to find as much information about the word in the shadow dictionary as possible. If the word is listed in the shadow dictionary, Simon automatically fills out all the needed fields (Category and Pronunciation). Fields automatically filled out by the Shadow Dictionary All suggestions from the shadow dictionary are listed in the table Similar words. Per default only exact word matches are shown. However, this can be changed by checking the Include similar words check box below the suggestion table. Using similar words you can quickly deduce the correct pronunciation of the word you are actually trying to add. See below for details. Of course this really depends on your shadow dictionary. If the shadow dictionary does not contain the word you are trying to add, the required fields have to be filled out manually. Some dictionaries that can be imported with Simon (SPHINX, HTK) do not differentiate between upper and lower case. Suggestions based on those dictionaries will always be uppercase. You are of course free to change these suggestions to the correct case. Some dictionaries that can be imported with Simon (SPHINX, PLS and HTK) provide no grammatical information at all. These will assign all the words to the category Unknown. You should change this to something appropriate when adding those words. Manually Selecting a Category The category of the word is defined as the grammatical category the word belongs to. This might be Noun, Verb or completely new categories like Command. For more information see the grammar section. The list contains all categories used in both your active and your shadow lexicon and in your grammar. You can add new categories to the drop-down menu by using the green plus sign next to it. Manually Providing the Phonetic Transcription The pronunciation is a bit trickier. Simon does not need a certain type of phonetics so you are free to use any method as long as it uses only ASCII characters and no numbers. However, if you want to use a shadow dictionary and want to use it to its full potential you should use the same phonetics as the shadow dictionary. If you do not know how to transcribe a word yourself you can easily use your shadow dictionary to help you with the transcription - even if the word is not listed in it. Let's say we want to add the word Firefox (to launch firefox) which is of course not listed in our shadow dictionary. (I imported the English voxforge HTK lexicon available from voxforge as a shadow dictionary.) Firefox is not listed in our shadow dictionary so we do not get any suggestion at all. Adding an unknown word However, we know that firefox sounds like fire and fox put together. So let's just open the vocabulary (you can keep the wizard open) by selecting Vocabulary from your Simon main toolbar. Switch to the shadow vocabulary by clicking on the tab Shadow Vocabulary. Use the Filter box above the list to search for Fire: Adding an unknown word: Search for the Pronunciation We can see, that the word Fire is transcribed as f ay r. Now filter for fox instead of Fire and we can see that Fox is transcribed as f ao k s. We can assume, that firefox should be transcribed as f ay r f ao k s. Using this approach of deducing the pronunciation from parts of the word has the distinct advantage that we not only get a high quality transcription but also automatically use the same phoneme set as the other words which were correctly pulled out of the shadow dictionary. We can now enter the pronunciation and change the category to something appropriate. Completely defined word Training the Word To complete the wizard we can now train the word twice. If you don't want to do this or for example use a static base model, you can skip these two pages. Because you are about to record some training samples, Simon will display the volume calibration to make sure that your microphone is set up correctly. For more information please refer to the volume calibration section Simon will try to prompt you for real-world examples. To do that, Simon will automatically fetch grammar structures using the category of the word and substitute the generic categories with example words from your active lexicon. For example: You have the grammar structure Trigger Command and have the word Computer of the category Trigger in your vocabulary. You then add a new word Firefox of the category Command. Simon will now automatically prompt you for Computer Firefox as it is - according to your grammar - a valid sentence. If Simon is unable to find appropriate sentences using the word (&ie;: No grammar, not enough words in your active lexicon, &etc;) it will just prompt you for the word alone. Although Simon ensures that the automatically generated examples are valid, you can always override its suggestions. Just switch to the Examples tab on the Define Word page. Editing word examples You are free to change those examples to anything you like. You can even go so far and use words that are not yet in your active lexicon as long as you add them before you synchronize the model, although this is not recommended. All that is left is to record the examples. Recording Make sure you follow the guidelines listed in the recording section. Editing a word To edit a word, simply select it from the vocabulary, and click on Edit. Edit word There you can change name, category and pronunciation of the selected word. Removing a word To remove a word from your language model, select it in the vocabulary view and click on Remove. Recording The dialog offers four choices: Move the word to the Unused category. Because you (hopefully) don't use the category Unused in your grammar, the word will no longer be considered for recognition. In fact, it will be removed from the active vocabulary before compiling the model because no grammar sentence references it. If you want to use the category Unused in your grammar, you can of course use a different category for unused words. Just set the category through the Edit word dialog. To use the word again, just set the right category again. No data will be lost. Move the word to the shadow lexicon This will remove the selected word from the active lexicon (and thus from the recognition) but will keep a copy in the shadow vocabulary. All the recordings containing the word will be preserved. To use the word again, add it again to the active vocabulary. When adding a new word with the same name the values of the moved word will be suggested to you. Therefore, no data will be lost. Delete the word but keep the samples Removes the word completely but keeps the associated samples. Whenever you add another word with the same word name the samples will be re-associated. Be careful with this option as the new word you add again might be transcribed differently and this difference cannot be taken into account automatically (Simon will then try to force the new transcription on the old recordings during the model compilation). Do not use this option if the samples you recorded for this word were erroneous. Remove the word completely Just remove the word. All the recordings containing the word will be removed too. This option leaves no trace of neither the word itself nor the associated samples. Because samples are global (not assigned to scenarios), even samples recorded from training sessions of other scenarios might be removed as well if they contain the word. Use this option carefully. Special Training Please see the special training section in the training section. Importing a Dictionary Simon provides the functionality to import large dictionaries as a reference. This reference dictionary is called shadow dictionary. When the user adds a new word to the model, he has to define the following characteristics to define this word: Wordname Category Phonetic definition These characteristics are taken out of the shadow dictionary if it contains the word in question. A large, high quality shadow dictionary can thus help the user to easily add new words to the model without keeping track of the phoneme set or - in many cases - even let him forget that the phonetic transcription is needed at all. Import dictionary: Introduction Since version 0.3 you can also import dictionaries directly to the active dictionary. This option is mostly there to make it easier to move to Simon from custom solutions and to encourage importing of older models (for example one used with Simon 0.2). You will almost never want to import a very large dictionary as active dictionary. You can find a list of available dictionaries that work with Simon on the Simon wiki. Simon is able to import five different types of dictionaries: HADIFIX HTK PLS SPHINX Julius HADIFIX Dictionary Simon can import HADIFIX dictionaries. One example of a HADIFIX dictionary is the German HADIFIX BOMP. Hadifix dictionaries provide both categories and pronunciation. Due to a special exemption in their license the Simon listens team is proud to be able to offer you to download the excellent HADIFIX BOMP directly from within Simon. Import dictionary: Automatic BOMP import Using the automatic bomp import you can, after providing name and email address for the team of the University Bonn, directly download and import the dictionary from the Simon listens server. HTK Dictionary Simon can import HTK lexica. One example of a HTK lexicon is the English Voxforge dictionary. Hadifix dictionaries provide pronunciation information but no categories. All words will be assigned to the category Unknown. PLS Dictionary Simon can import PLS dictionaries. One example of a PLS dictionary is the German GPL dictionary from Voxforge. PLS dictionaries provide pronunciation information but no categories. All words will be assigned to the category Unknown. SPHINX Dictionary Simon can import SPHINX dictionaries. One example of a SPHINX dictionary is this dictionary for Mexican Spanish. SPHINX dictionaries provide pronunciation information but no categories. All words will be assigned to the category Unknown. Julius Dictionary Simon can import Julius vocabularies. One example of a Julius vocabularies are the word lists of Simon 0.2. Julius dictionaries provide pronunciation information as well as category information. Create language profile Here, you can build a language profile from your shadow dictionary. Create language profile After selecting Create profile, &kmyapplication; will analyze your current shadow dictionary and try to deduce the transcription rules from it. This is generally a very length process and can, depending on the size of your shadow dictionary, take up to several hours. The created profile will be selected automatically after the process completes. Grammar Simon provides an easy to use text based interface to change the grammar. You can simply list all the allowed sentences (without any punctuation marks, obviously) like described above. Grammar When selecting a sentence on the left, the right pane will automatically show possible real sentences with the words of your vocabulary on the right. The example section will list at most 35 examples so if more than that amount of sentences match the selected grammar entry, the list might not be complete. Import a Grammar Additionally to simply entering your desired grammar sentence by sentence, Simon is able to automatically deduce allowed grammar structures by reading plain text using the Import Grammar wizard. Import Grammar Simon can read and import text files but also provides an input field if you want to simply type the text into Simon. Say we have a vocabulary like in the general section above: Improved Sample Vocabulary Word Category Computer Trigger Internet Command Mail Command close Command
We want Simon to recognize the sentence Computer Internet!. So we either enter the text using the Import text option or create a simple text file with this content Computer Internet! (any punctuation mark would work) and save it as simongrammar.txt to use the Import files option. Import Grammar: Enter text Import Grammar: Text file Import Grammar: Select files Simon will then read the entered text or all the given text files (in this case the only given text file is simongrammar.txt) and look up every single word in both active and shadow dictionary (the definition in the active dictionary has more importance if the word is available in both). It will then replace the word with its category. In our example this would mean that he would find the sentence Computer Internet. Simon would find out that Computer is of the category Trigger and Internet of the category Command. Because of this Simon would learn that Trigger Command is a valid sentence and add it to its grammar. The import automatically segments the input text by punctuation marks (., -, !, &etc;) so any natural text should work. The importer will automatically merge duplicate sentence structures (even across different files) and add multiple sentence (all possible combinations) when a word has multiple categories assigned to it. The import will ignore sentences where one or more words could not be found in the language model unless you tick the Also import unknown sentences check box in which case those words are replaced with Unknown.
Renaming Categories The rename category wizard allows you to rename categories in both your active vocabulary, your shadow dictionary and the grammar. Rename Category Merging Categories The merge category wizard allows you to merge two categories into one new category in both your active vocabulary, your shadow dictionary and the grammar. Merge Category This functionality is especially useful if you want to simplify your grammar structures.
Training Using the Training-module, you can improve your acoustic model. The interface lists all installed training texts in a table with three columns: NameA descriptive name for the text. PagesThe number of pages the text consists of. Each page represents one recording. Recognition RateAnalogue to the vocabulary; represents how likely Simon will recognize the words (higher is better). The recognition rate of the training text is the average recognition rate of all the words in the text. Training To improve the acoustic model - and thus the recognition rate - you have to record training texts. This means that Simon gets essentially two needed parts: Samples of your speech Transcriptions of those samples The active dictionary is used to transcribe the words (mapping them from the actual word to its phonetic transcription) that make up the text so every word contained in the training text you want to read (train) has to be contained in your active dictionary. Simon will warn you if this is not the case and provide you with the possibility to add all the missing words in one go. Training: Warning about missing words The procedure is the same as if you would add a single word but the wizard will prompt you for details and recordings for all the missing words automatically. This procedure can be aborted at any time and Simon will provide both a way to add the already completely defined words and to undo all changes done so far. When the user has added all the words he is prompted for (all the words missing) the changes to the active dictionary / vocabulary are saved and the training of the previously selected text starts automatically. The training (reading) of the training text works exactly the same as the initial training when adding a new word. Make sure you follow the guidelines listed in the recording section. Training in progress Storage Directories Training texts are stored in two different locations: Linux: ~/.kde/share/apps/simon/texts Windows: %appdata%\.kde\share\apps\simon\texts The texts of the current user. Can be deleted and added with Simon (see below). Linux: `kde4-config --prefix`/share/apps/simon/texts Windows: (install folder)\share\apps\simon\texts System-wide texts. They will appear on every user account using Simon on this machine and cannot be deleted from within Simon because of the obvious permission restrictions on system-wide files. This folder can be used by system administrators to provide a common set of training texts for all the users on one system. The XML files (one for each text) can just be moved from one location to the other but this will most likely require admin privileges. Adding Texts Import-training-texts-wizard The add texts wizard provides a simple way to add new training texts to Simon. When importing text files, Simon will automatically try to recognize individual sentences and split the text into appropriate pages (recordings). The algorithm treats text between normal punctuation (., !, ?, ..., ",...) and line breaks as sentences. Each sentence will be on its own page. Simon supports two different sources for new training texts. Add training texts Import-training-texts-wizard: Add training texts Simply enter the training text in an input field. Local text files Import-training-texts-wizard: Local text files Simon can import normal text files to use them as training texts. On-The-Fly Training In addition to training texts, Simon also allows to train individual words or word combinations from your dictionary on-the-fly. This feature is located in the vocabulary menu of Simon. Special Training: Selecting the Words Select the words to train from the vocabulary on the left and simply drag them to the selection list to the right (you could also select them in the table on the left and add them by clicking Add to Training). Start the training by selecting Train selected words. The training itself is exactly the same as if it were a pre-composed training text. Special Training: Training the Words If there are more than 9 words to train Simon will automatically split the text evenly across multiple pages. Of course you are free to add words from the shadow lexicon to the list of words to train but Simon will prompt you to add the words before the training starts just like he would if you would train a text that contains unknown words (see above). Context &kmyapplication; includes a context layer that allows you to let Simon automatically adjust its configuration depending on its context. For example, you could set up &kmyapplication; to only allow commands like "New tab" if Mozilla Firefox is running and the currently active window. There are three major areas that contextual information can influence: Scenario selection Sample groups Active microphones Scenario selection Scenarios can specify to only be active during certain contextual situations. If these situations are not met, &kmyapplication; will temporarily deactivate the affected scenario. Scenario context The local context conditions of this scenario are shown in the list of Activation Requirements and can be added, edited and deleted through the respective buttons. The context conditions respect a possible hierarchy of scenarios: The activation requirements of all direct or indirect parent scenarios also apply to the child scenario(s). This condition "inheritance" is shown on the right side. The &kmyapplication; main window also shows a list of currently used scenarios. Scenarios that are deactivated because of their activation requirements (context conditions) are listed in light gray and italic. The screenshot below, for example, shows a temporarily deactivated Amarok scenario. Identifying deactivated scenarios The same visual hints (gray, italic font for unmet activation criteria) also apply to the individual context conditions in the context menu. Sample groups Every sample recorded with Simon is assigned a sample group. Sample groups can be configured to only be used for the building of the acoustic models if certain contextual conditions are met. If this is not the case, all samples tagged with the deactivated sample group will be temporarily removed from the training corpus. For more information, an example use-case and instructions on how to work with sample groups, please refer to the section on sample groups. Context conditions In Simon, context is monitored through a set of context condition plugins. In general, context conditions are combined through an "and" association. For example, if the activation of resource is bound by two conditions A and B, it will only be activated if both A and B see their conditions met. To instead model alternatives ("A or B or both"), use an Or Condition Association. All conditions can optionally be inverted. Inverting a condition means that it will evaluate to true if it would otherwise evaluate to false and vice versa. Active window True, if the title of the currently active foreground window matches the provided window title. Active window condition D-Bus The D-Bus condition plugin allows to monitor 3rd party applications that export state information on D-Bus. The monitored application needs to provide two methods: One signal to notify of changes and another method that returns the current state. D-Bus condition The screenshot above, for example, configures a D-Bus condition that will evaluate to true while the music player "Tomahawk" is playing and to false otherwise. Face detection The face detection condition will evaluate to true, if &kmyapplication;'s vision layer has identified a person sitting in front of the configured webcam. Face detection condition File content This condition plugin will return true, if the given file contains the provided content. The file will be monitored for changes. File content condition Lip detection The lip detection condition will evaluate to true, if &kmyapplication;'s vision layer has identified a person sitting in front of the configured webcam and is speaking something (lip movements). Lip detection condition The lip detection training will try to determine the optimal value of sensitivity of the detection by monitoring your lip movements. For better accuracy of lip detection condition, stop training when the sensitivity value on the slider during training becomes almost constant. Or condition association The or condition association allows you to configure a meta-condition that reports to be satisfied as soon as one or more of its child conditions evaluates to true. Or condition associations can have an arbitrary number of child conditions that may even also be or condition associations. Or condition association Process opened Is satisfied if there is a running process with the provided executable name. Process opened condition Commands When Simon is active and recognizes something, the recognition result is given to the loaded command plug-ins (in order) for processing. Simon's Command Dialog The command system can be compared with a group of factory workers. Each one of them knows how to perform one task (⪚ Karl knows how to start a program and Joe knows how to open a folder, &etc;). Whenever Simon recognizes something it is given to Karl who then checks if this instruction is meant for him. If he doesn't know what to do with it, it is handed over to Joe and so on. If none of the loaded plugins know how to process the input it is ignored. The order in which the recognition result is given to the individual commands (people) is configurable in the command options (Commands > Manage plugins). Simon's Action Configuration Each plugin can be associated with a trigger. Using triggers, the responsibility of each plugin can be easily be divided. Using the factory workers abstraction from above it could be compared to stating the name of who you mean to process your request. So instead of Open my home folder you say Joe, open my home folder and Joe (the plugin responsible for opening folders) will instantly know that the request is meant for him. In practice you could have commands like the executable command Firefox to open the popular browser and the place command Google to open the web search engine. If you assign the trigger Start to the executable plugin and the trigger Open to the place command you would have to say Start Firefox (instead of just Firefox if you don't use a trigger for the executable plugin) and Open Google to open the search engine (instead of just Google). Triggers are of course no requirement and you can easily use Simon without defining any plugin triggers (although many plugins come with a default trigger of Computer set which you would have to remove). But even if you use just one trigger for all your commands (like Computer to say Computer, Firefox and Computer, Google like) it has the advantage of greatly limiting the number of false-positives. Simon's command dialog displays the complete phrase associated with a command in the upper right corner of the command configuration. You can load multiple instances of one plugin even in one scenario. Each instance can of course also have a different plugin trigger. Each Command has a name (which will trigger its invocation), an icon and more fields depending on the type of the plugin (see below). Some command plugins might provide a configuration of the plugin itself (not the commands it contains). These configuration pages will be plugged directly into the action configuration dialog (below the General menu item) when you load the associated plugin. Plugins that provide a graphical user interface (like for example the input number command plugin) can be configured by configuring Voice commands. You can, for example, change the associated word that will trigger the button, but also change the displayed icon, &etc; If you remove all voice interface commands from a graphical element, the element will be hidden automatically. Voice interface commands are added just like normal commands through the command configuration. Configure voice interface commands To add a new interface command to a function, just select the action you want to associate with a command, click Create from Action template and adapt the resulting command to your needs. Some plugins (for example the desktop grid or the calculator) might also provide a menu item in the Actions menu. Command plugged into main window Scenarios can optionally define one command that will immediately be run when the scenario is initialized. If you require more than one command to run automatically, consider the use of a composite command. Autorun command Command triggers can contain placeholders in the form of "%<index>", referring to any one word, or "%%<index>" describing one or more left out words. For example the recognition result "Next window" will be matched by the triggers "Next %1", "Next %%1" and "%%1" but not by the triggers "%1", "Next window %1", "%%1 Next window". Executable Commands Executable commands are associated with an executable file (Program) which is started when the command is invoked. Executable Commands Arguments to the commands are supported. If either path to the executable or the parameters contain spaces they must be wrapped in quotes. Given the executable file C:\Program Files\Mozilla Firefox\firefox.exe the local html file C:\test file.html the correct line for the Executable would be: "C:\Program Files\Mozilla Firefox\firefox.exe" "C:\test file.html". The working folder defines where the process should be launched from. Given the working folder C:\folder, the command "C:\Program Files\Mozilla Firefox\firefox.exe" file.html would cause Firefox to search for the file C:\folder\file.html. The working folder usually does not need to be set and can be left blank most of the time. Importing Programs For even easier configuration Simon provides an import dialog which allows you to select programs directly from the KDE menu. This option is not available on Microsoft Windows. Import Programs The dialog will list all programs that have an entry in your KDE menu in their respective category. Sub-Categories are not supported and are thus listed on the same level as top-level categories. Just select the program you wish to start with Simon and press Ok. The correct values for the executable and the working folder as well as an appropriate command name and description will automatically be filled out for you. Place Commands With place commands you can allow Simon to open any given URL. Because Simon just hands the address over to the platforms URL handler, special Protocols like remote:/ (on &Linux;/&kde;) or even &kde;'s Web-Shortcuts are supported. Instead of folders, files can also be set as the commands URL which will cause the file to be opened with the application which is associated with it when the command is invoked. Places To associate a specific URL with the command you can manually enter it in the URL field (select Manual first) or import it with the import place wizard. Importing Places The import place dialog allows you to easily create the correct URL for the command. To add a local folder, select Local Place and choose the folder or file with the file selector. Import Places: Local To add a remote URL (HTTP, FTP, &etc;) choose Remote URL. Import Places: Remote Please note that for URLs with authentication information the password will be stored in clear text. Shortcut Commands Using shortcut commands the user can associate commands with key-combinations. The command will simulate keyboard input to trigger shortcuts like &Ctrl;C or &Alt;F4. The plugin can press, release or press and release the configured key combination. Defining Shortcut Commands To select the shortcut you wish to simulate just toggle the shortcut button and press the key combination on your keyboard. Simon will capture the shortcut and associate it with the command. Due to technical limitations there are several shortcuts on Microsoft Windows that cannot be captured by Simon (this includes ⪚ &Ctrl;&Alt;Del and &Alt;F4). These special shortcuts can be selected from a list below the aforementioned shortcut button. This selection box is not visible in the screenshot above as the list is only displayed in the Microsoft Windows version of Simon. Text-Macro Commands Using text-macro commands, the user can associate text with a command. When the command is invoked, the associated text will be written by simulating keystrokes. Defining Text-Macro Commands List Commands The list command is designed to combine multiple commands (all types of commands are supported) into one list. The user can then select the n-th entry by saying the associated number (1-9). This is very useful to limit the amount of training required and provides the possibility to keep the vocabulary to a minimum. Defining List Commands List commands are especially useful when using commands with difficult triggers or commands that can be grouped under a general theme. A typical example would be a command Startmenu to present a list of programs to launch. That way the specific executable commands can still retain very descriptive names (like OpenOffice.org Writer 3.1) without the user having to include these words in his vocabulary and consider them in the grammar just to trigger them. Commands of different types can of course be mixed. List Command Display When invoked, the command will display the list centered on the screen. The list will automatically expand to accompany its items. Defining List Commands The user can invoke the commands contained in the list by simply saying their associated number (In this example: One to launch Mozilla Firefox). While a list command is active (displayed), all input that is not directed at the list itself (other commands, &etc;) will be rejected. The process can be canceled by pressing the Cancel button or by saying Cancel. If there are more than 9 items Simon will add Next and Back options to the list (Zero will be associated with Back and Nine with Next). List Command with many entries Configuring list elements By default the list command uses the following trigger words. To use list commands to their full potential, make sure that your language and acoustic model contains and allows for the following sentences: Zero One Two Three Four Five Six Seven Eight Nine Cancel Of course you can also configure these words in your Simon configuration: Commands > Manage plugins > General > Lists for the scenario wide list configuration. Settings > Configure Simon... > Actions > Lists for the global configuration. When creating a new scenario, the scenario configuration will be initialized with a copy of this list configuration. List commands are internally also used by other plugins like for example the desktop grid. The configuration of the triggers also affects their displayed lists. Composite Commands Composite commands allow the user to group multiple commands into a sequence. When invoked the commands will be executed in order. Delays between commands can be inserted. Composite commands can also work as "transparent wrappers" by selecting Pass recognition result through to other commands. In that case, the recognition result will be treated as "unprocessed" even if the composite command was executed. For example, suppose you have a command to turn on the light in one scenario. Additionally to turning on the light, you now want to add some kind of reporting to the activity by invoking a script through a program plugin. You could then set up a reporting scenario that contains a transparent composite command with the same trigger as the command to turn on the light and make sure that this scenario is set before the original one in the scenario list. You can then activate and deactivate the reporting simply by loading and unloading this scenario. Defining Composite Commands Using the composite command the user can compose complex macros. The screenshot above - for example - does the following: Start Kopete (Executable Command) Wait 2000ms for Kopete do be started Type Mathias (Text-Macro Command) which will select Mathias in my contact list Press Enter (Shortcut Command) Wait 1000ms for the chat window to appear Write Hi! (Text-Macro Command); the text associated to this command contains a newline at the end so that the message will be send. Press &Alt;F4 (Shortcut Command) to close the chat window Press &Alt;F4 (Shortcut Command) to close the kopete main window Desktop grid The desktop grid allows the user to control his mouse with his voice. The Desktopgrid The desktop grid divides the screen into nine parts which are numbered from 1-9. Saying one of these numbers will again divide the selected field into 9 fields again numbered from 1-9, &etc; This is repeated 3 times. After the fourth time the desktop grid will be closed and Simon will click in the middle of the selected area. The exact click action is configurable but defaults to asking the user. Therefore you will be presented with a list of possible click modes. When selecting Drag and Drop, the desktop grid will be displayed again to select the drop point. Desktopgrid: Click selection While the desktop grid is active (displayed), all input that is not directed at the desktop grid itself (other commands, &etc;) will be rejected. Say Cancel at any time to abort the process. The desktop grid plugin registers a configuration screen right in the command configuration when it is loaded. Configuring the Desktopgrid The trigger that invokes the desktop grid is of course completely configurable. Moreover the user can use real or fake transparency. If your graphical environment allows for compositing effects (desktop effects) then you can safely use real transparency which will make the desktop grid transparent. If your platform does not support compositing Simon will simulate transparency by taking a screenshot of the screen before displaying the desktop grid and display that picture behind the desktop grid. If the desktop grid is configured to use real transparency and the system does not support compositing it will display a solid gray background. However, nearly all up-to-date systems will support compositing (real transparency). This includes: Microsoft Windows 2000 or higher (XP, Vista, 7) GNU/Linux using a composite manager like Compiz, KWin4, xcompmgr, &etc; By default the desktop grid uses numbers to select the individual fields. To use the desktop grid, make sure that your language and acoustic model contains and allows for the following sentences: One Two Three Four Five Six Seven Eight Nine Cancel To configure these triggers, just configure the commands associated with the plugin. Desktopgrid: Configuring list elements Input Number Using the input-number plugin the user can input large numbers easily. Using the Dictation or the Text-Macro plugin one could associate the numbers with their digits and use that as input method. However, to input larger numbers there are two ways that both have significant disadvantages: Adding the words eleven, twelve, &etc; While this seems like the most elegant solution as it would enable the user to say fivehundredseventytwo we can easily see that it would be quite a problem to add all these words - let alone train them. What about twothousandninehundredtwo? Where to stop? Spell out the number using the individual digits While this is not as elegant as stating the complete number it is much more practical. However, many applications (like the great mouseless browsing firefox addon) rely on the user to input large numbers without too much time passing between the individual keystrokes (mouseless browsing for example will wait exactly 500ms per default before it considers the input of the number complete). So if you want to enter 52 you would first say Five (pause) Two. Because of the needed pause, the application (like the mouseless browsing plugin) would consider the input of Five complete. The input number plugin - when triggered - presents a calculator-like interface for inputting a number. The input can be corrected by saying Back. It features a decimal point accessible by saying Comma. When saying Ok the number will be typed out. As all the voice-input and the correction is handled by the plugin itself the application that finally receive the input will only get couple of milliseconds between the individual digits. Input Number Plugin While the input number plugin is active (the user currently inputs a number), all input that is not directed at the input number plugin (other commands, &etc;) will be rejected. Say Cancel at any time to abort the process. As there can no command instances be created of this plugin it is not listed in the New Command dialog. However, the input number plugin registers a configuration screen right in the command configuration when it is loaded. Input Number Plugin The trigger defines what word or phrase that will trigger the display of the interface. By default the input number plugin uses numbers to select the individual digits and a couple of control words. To use the input number plugin, make sure that your language and acoustic model contains and allows for the following sentences: Zero One Two Three Four Five Six Seven Eight Nine Back Comma Ok Cancel To configure these triggers, just configure the commands associated with the plugin. Input Number Plugin Dictation The dictation plugin writes the recognition result it gets using simulated keystrokes. Assuming you didn't define a trigger for the dictation plugin it will accept all recognition results and just write them out. The written input will be considered as processed input and thus not be relayed to other plugins. This means that if you loaded the dictation plugin and defined no trigger for it, all plugins below it in the Selected Plug-Ins list in the command configuration will never receive any input. As there can no command instances be created of this plugin it is not listed in the New Command dialog. The dictation plugin can be configured to append texts after recognition results to for example add a space after each recognized word. Configure dictation Artificial Intelligence The Artificial Intelligence is a just-for-fun plugin that emulates a human conversation. Using the text to speech system, the computer can talk with the user. The plugin uses AIMLs for the actual intelligence. Most AIML sets should be supported. The popular A. L. I. C. E. bot and a German version work and are shipped with the plugin. AI Plugin The plugin registers a configuration screen in the command configuration menu where you can choose which AIML set to load. Simon will look for AIML sets in the following folder: GNU/Linux: `kde4-config --prefix`/share/apps/ai/aimls/ Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\share\apps\ai\aimls\ To add a new set just create a new folder with a descriptive name and copy the .aiml files into it. To adjust your bots personality have a look at the bot.xml and vars.xml files in the following folder: GNU/Linux: `kde4-config --prefix`/share/apps/ai/util/ Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\share\apps\ai\util\ As there can no command instances be created of this plugin it is not listed in the New Command dialog. It is recommended to not use any trigger for this plugin to provide a more natural feel for the conversation. Calculator The calculator plugin is a simple, voice controlled calculator. Calculator plugin The calculator extends the Input Number plugin by providing additional features. When loading the plugin, a configuration screen is added to the plugin configuration. Calculator plugin: Configuration There you can also configure the control mode of the calculator. Setting the mode to something else than Full calculator will hide options from the displayed widget. Calculator plugin: Minimal However, the hidden controls will, in contrast to simply removing all associated command from the functions, still react to the configured voice commands. When selecting Ok, the calculator will by default ask you what to do with the generated result. You can for example output the calculation, the result, both, &etc; Besides always selecting this from the displayed list after selecting the Ok button, this can also be set in the configuration options. Calculator plugin: Output mode selection Filter Using the filter plugin, you can intercept recognition results from being passed on to further command plugins. Using this plugin you can for example disable the recognition by voice. The filter command plugin registers a configuration screen in the command configuration where you can change what results should be filtered. Filter plugin: Configuration The pattern is a regular expression that will be evaluated each time a recognition results receives the plugin for processing. The plugin also registers voice interface commands for activating and deactivating the filter. In total, the filter therefore has three states: Inactive The default state. All recognition results will be passed through. Half-active (if Two stage activation is selected) If the next command is the "Deactivate filter" command, the filter will enter the "Inactive" state. If, however, the next result is something else and Relay results in stage one of two stage activation is selected, this result will be passed on to other plugins. The filter will reset to "Active" afterwards. Active When activated, the filter will eat all results that match the configured pattern. By default this means every result that Simon recognizes will be accepted by the filter and therefore not relayed to any of the plugins following the filter plugin. If Two stage activation is enabled and the filter plugin receives the command to directly enter the "Inactive" state, this command is ignored. In other ways: If two stage activation is enabled, the filter can only be disabled by going through the intermediate stage. Pronunciation Training The pronunciation training, when combined with a good static base model, can be a powerful tool to improve your pronunciation of a new language. Pronunciation training Essentially, the plugin will prompt you to say specific words. The recognition will then recognize your pronunciation of the word and compare it to your speech model which should be a base model of native speakers for this to work correctly. Then Simon will display the recognition rate (how similar your version was to the stored base model). The closer to the native speaker, the higher the score. The plugin adds an entry to your Commands menu to launch the pronunciation training dialog. The training itself consists of multiple pages. Each page contains one word fetched from your active vocabulary. They are identified by a category which needs to be selected in the command configuration before starting the training. Pronunciation training: Configuration Keyboard The keyboard plugin displays a virtual, voice controlled keyboard. Keyboard The keyboard consists of multiple tabs, each possibly containing many keys. The entirety of tabs and keys are collected in sets. You can select sets in the configuration but also create new ones from scratch in the keyboard command configuration. Keyboard: Configuration Keys are usually mapped to single characters but can also hold long texts and even shortcuts. Because of this, keyboard sets can contain special keys like a select all key or a Password key (typing your password). Next to the tabs that hold the keys of your set, the keyboard may also show special keys like &Ctrl;, &Shift;, &etc; Those keys are provided as voice interface commands and are displayed regardless of what tab of the set is currently active. As with all voice triggers, removing the associated command, hides the buttons as well. Moreover, the keyboard provides a numpad that can be shown by selecting the appropriate option in the keyboard configuration. Keyboard: Keypad Next to the number keys and the delete key for the number input field (Number backspace), the numpad provides two options on what to do with the entered number. When selecting Write number, the entered number will be written out using simulated key presses. Selecting Select number tries to find a key or tab in the currently active set that has this number as a trigger. This way you can control a complete keyboard just using numbers. Keyboard: Number based The keys on the num pad are configurable voice interface commands. Dialog The dialog plugin enables users to engage in a scripted dialog with &kmyapplication;. Dialog design &kmyapplication; treats dialogs as a succession of different states. Each state can have a text and several associated options. Dialog design Dialogs can have more than one text variants - one of which will be randomly picked when the dialog is displayed. This can help to make dialogs feel more natural by providing several, alternative formulations. The texts can use bound values and template options. Dialog design: Options Dialog options capsule the logic of the conversation. They are the active components of the dialog. Dialog design: Adding an option Similar to commands, dialog options have a name (trigger) that, when recognized while the dialog is active and in the option's parent state, will cause this option to activate. Alternatively, options can also be configured to trigger automatically after a set time period. This time is relative to when the state is entered. Dialog options, when shown through the graphical output module can show an arbitrary text (that will most likely be equivalent to the trigger but doesn't have to be) and, optionally, an icon. If the text-to-speech output module is used, the text (not the trigger) will be read aloud unless this is disabled by selecting the Silent option. Every state can also optionally have an avatar that will be displayed when using the graphical output module. Dialog design: Avatar Dialog: Bound values The text of dialog states can contain variables - so called "bound values" - that will be filled in during runtime. For example, the dialog text "This is a $variable$" would replace "$variable$" with the result of a bound value called "variable". Bound values There are four types of bound values: Static Bound values: Static Static bound values will always be resolved to the same text. They are useful to provide configuration options to be filled in to personalize the dialog (⪚, the name of the user). QtScript Bound values: QtScript QtScript bound values resolve to the result of the entered QtScript code. Command arguments Bound values: Command arguments If the dialog trigger command (the &kmyapplication; command that initiates the dialog) uses placeholders, they can be accessed through command argument bound values. The Argument number refers to the index of the placeholder you want to access. For example, if your dialog is started with the command "Call %1", and "name" is a command argument bound value, then launching the dialog by recognizing "Call Peter", will turn the dialog text "Are you sure you want to call $name$?" into "Are you sure you want to call Peter?". Plasma data engine Bound values: Plasma data engine This type of bound value can readily access a wide array of high-level information through plasma data engines. Template options Dialog texts can further be parametrized through template options. Template options These boolean values choose between different or optional text snippets. For example, the template option "formal" above, would change the dialog text "Would you please {{{formal}}be quiet{{elseformal}}shut up{{endformal}}" to "Would you please be quit" or "Would you please shut up" depending on if the template option is set to true or false. The else-path can be omitted if it is not required (⪚ "Would you {{formal}}please {{endformal}}be quiet"). Avatars Every state can potentially show a different avatar. These images can range from the picture of a (simulated) speaker to an image of something topically appropriate. Dialog: Avatars To use an avatar, first add it here and later define where to use it in the dialog design section. Output Dialogs can be displayed graphically, use text-to-speech or combine both approaches. Dialog: Output The Separator to options will be spoken between the dialog text and the current state's options (if there are any). If there are no options to this state or all are configured to be silent, this will not be said. The option to listen to the whole announcement again is triggered when saying one of the configured Repeat on trigger. Additionally, the text-to-speech output can optionally be configured to repeat the listing of the available options (including the configured separator) when the user says a command that does not match any of the available dialog options. Akonadi The Akonadi plugin allows &kmyapplication; to plug into KDE's PIM infrastructure. Akonadi command configuration The plugin fulfills two major purposes: Execute Simon commands at scheduled times The Akonadi plugin can monitor a specific collection (calendar) and react on entries whose summary start with a specific prefix. Per default, this prefix is "[simon-command]", meaning that events of the form "[simon-command] <plugin name>//<command name>" will trigger the appropriate &kmyapplication; command at the "start time" of the event. The name of the plugins and commands are equivalent to the ones shown in the command dialog and do not necessarily need to reference commands in the same scenario as the Akonadi plugin instance. Show reminders for events in the given calendar If configured to do so, the Akonadi plugin can show reminders for calendar events with a set alarm flag. These reminders will be shown through the &kmyapplication; dialog engine. D-Bus With the D-Bus command plugin, Simon can call exported methods in 3rd party applications directly. The screenshot below, for example, calls the "Pause" method of the MPRIS interface of the Tomahawk music playing software. D-Bus command JSON Similar to the D-Bus command plugin, the JSON plugin also allows to contact 3rd party applications to directly invoke functionality (instead of simulating user activity). JSON command VRPN With the VRPN command plugin, Simon can act as a VRPN server and export voice controlled buttons. VRPN plugin configuration The plugin configuration allows you to set the port the server should operate on and to define an arbitrary list of buttons. Each of these button objects will have exactly one "button" (in VRPN, a button may theoretically have more than one clickable item). After setting up the buttons, you can now configure Simon commands to act on them. You can set the commands to either Press & Release (consecutively), Press, Release or Toggle the button they manipulate. VRPN command configuration For example, the command shown in the screenshot above would press and release ("click") the VRPN button at index 0 of the button device accessible as "ButtonB@localhost".
Questions and Answers In an effort to keep this section always up-to-date it is available at our online wiki. Credits and License &kmyapplication; Program copyright 2006-2009 Peter Grasch peter.grasch@bedahr.org, Phillip Goriup, Tschernegg Susanne, Bettina Sturmann, Martin Gigerl Documentation Copyright © 2009 Peter Grasch peter.grasch@bedahr.org &underFDL; &underGPL; Installation Please see our wiki for install instructions. &documentation.index;