|
|||||
| MAKING PLANS FOR NIGEL: Defining interfaces between computational representations of linguistic structure and output systems: Adding intonation, punctuation and typography systems to the PENMAN system. | |||||
5.0 OutlineThis chapter is similar in function to the last, but it is about phonology instead of graphology. The phonological interfacial code was introduced in chapter two, (it and Halliday’s notation for intonation and rhythm appear in Appendix II). The main section (5.1) describes the output system in much more general terms than the description of GEKKO. There is also a brief discussion (5.2) of the broader implications for work on PENMAN. 5.1 The output program: how it worksIt is important to note that the discussion below is not a critical survey of the literature on intonation, it is purely descriptive of the output system, part of the text-to-speech system being developed by King and Vonwiller. The recent literature on intonation has been drawn on extensively in development of this speech synthesiser, but mainly for the empirical research, not for theoretical models. A good source for a summary of the recent literature, its main themes and theories, is Silverman (1987). For the purposes of this thesis it is not important to know exactly what the synthesiser does. In fact, it follows from the interface principles that the internal organisation of the synthesiser should not have to be known; and it must be possible to hook up to different synthesizers. In converting the interfacial code into sound, the motivation for the interfacial code has come from work on grammar and meaning. The process of generating the sound can be thought of in purely pragmatic terms. As long as the machine works it is unimportant how it does so. The design of the speech synthesiser is not motivated systemically, so it will not be discussed on the same terms as other components of the NIGEL system. There is no room here to describe the output program in detail; what follows is a brief summary of the process that will be used to convert code containing an orthographic string, together with specifications of its Tonality, Tonicity, Tone and which are content words. The first step is to work out the rhythm of the utterance; the next is to locate the tonic syllable; then the correct sort of tone contour is chosen from a table and “stretched” to fit the tone group. These steps are not enough, however, to generate natural sounding speech: it is necessary to take account of a number of syntagmatic aspects of the actual speech signal. The most important of these syntagmatic aspects are discussed below, following the summary of the way that the paradigmatic variables, tonicity, tonality etc. are synthesised together. The simplest way of assigning the rhythm to an utterance is to insert a foot boundary before every content word. Silverman (1987) writes that such an approach results in “careful sounding” speech. This can be illustrated using one of Halliday’s examples. In the interfacial code used here the text would look like this (it is the second sentence with which we are concerned, the first is provided for context): 5.1 In this job, Anne, we’re working with silver. //Now Csilver [N Cneeds to have Clove]//. In Halliday’s notation, with ‘/’ representing a foot-boundary, and the word containing the tonic syllable in bold-face, 5.1 becomes 5.2: 5.2 //^ Now/ silver / needs to have /love// A foot boundary has been placed in front of each content word, and the tonic is the last stressed syllable in the New element. However, this process can not generate correct rhythm in all cases: as was noted in chapter two the information structure is somewhat indeterminate - it is often not clear where the New element starts - but there are some cases in which Given / New structure conditions rhythm. Halliday provides two versions of 5.1 to illustrate this. 5.3 (a) I’ll tell you about silver. // it [N Cneeds to have Clove]//. Phonetically: // ^ it / needs to have / love // (b) I’ll tell you what silver needs to have. // it Cneeds to have [NEW Clove]// Phonetically: // ^ it needs to have / love //
So, in cases like example 5.3, an algorithm that simply places foot boundaries before content words would fail to capture the distinction between (a) - for which the algorithm would work - and (b) - for which it would not, in which the content word “needs” does not get a foot boundary. The more sophisticated algorithm, capable of making this distinction, is still under development. Having worked out which word contains the tonic syllable and the rhythm of an utterance the next step is to take an intonation contour from a table of intonation contours and to fit it to the utterance. This is done by a process of scaling - by stretching the contour so that it ‘fits’. As with the rest of the treatment of phonology, this chapter is back-grounding pre- and post-tonics. Pre- and post-tonics are assigned to each non-tonic foot. These are generated somewhat like the major intonation contour. In Vonwiller’s system appropriate contours are assigned and may be internally randomised, so that the output does not become monotonous. It would be reasonable to expect a very general system to be able to chose if it ‘wanted’ to sound monotonous, or to sound animated. For the time being the meaningful specification of pre- and post-tonics is not supported. Syntagmatic organisationSilverman points out (1989, 5.3) that a simple linear fitting of intonation contours would result in unnatural sounding speech, so it is necessary to shape the contour so that it conforms to the syntagmatic, phonetic rules of intonation. A list of some of these syntagmatic factors appears below; the list is by no means exhaustive, but it is representative of the major syntagmatic factors that the output program will take into account.
The approach to producing intonation taken here is to treat it as an engineering problem. This program is driven from above and from without; the meaningful input to and the meaning making success of the program are the important aspects. So far this thesis has sketched the interfacial codes that will be used for phonological and graphological text. The focus has been different for graphology and phonology: for graphology the graphological system itself and the typographic output system has received close attention while the grammatical systems that are realised in graphology have not been discussed. The discussion of phonology has touched on grammatical systems, but has been less concise about the operation of the output program. 5.2 Looking aheadIn contrast to the treatment of graphology, which concentrated on describing all aspects of typographic text, the treatment of phonology has been rather narrower. One dimension of this narrowness has been the rank-scale; phonological organisation into larger units than the tone group was not considered. One such larger unit is the prosodic paragraph discussed by Silverman (1987). Another is the turn, a unit in the organisation of dialogue. It is expected that the basic notation for prosodic information developed for graphology can be applied to phonology, to allow NIGEL to specify such larger phonological units. It is not clear whether the indexing system for crossed brackets will be needed, in adapting the notation to phonology, but it seems likely that the indices may be needed if verbal art is to be considered. For instance, sounded poetry can explicitly mark textual organisation and metrical organisation using the phonology, in the way that the graphological categories of line and sentence can ‘cross’ as in example 4.4. Silverman’s work (1987) on synthesising prosodic paragraphs shows that the prosodic paragraph - a series of tone-groups which are grouped together with declination in tone, and marked off by a final drop in tone - is a unit. It serves to group text in much the same way as typographic paragraphs. The prosodic paragraph must consist of a number of whole tone-groups; so a simple bracketing without any indexing seems appropriate: so we might expect NIGEL to produce text of the form [\PP\ // IU // IU // IU // IU //], a number of information units grouped together, where PP is for prosodic paragraph. (Work by Brazil (1985) on Key (which is more like the musical notion of key-signature than Halliday’s grammatical Key) provides a framework for this aspect of intonation.) There has been no discussion in this thesis so far of dialogue. Systems like NIGEL are, however, frequently used interactively, for instance, in allowing people to access data-bases using natural language. It has already been noted, above, that NIGEL lacks a suitable interaction base to allow the generation of co-operative texts but there are a number of systemic models of dialogic discourse (see Martin 1990 for a discussion). One such system is Martin’s which suggests the units dialogue, exchange and turn as a rank-scale. Whatever model is adopted for NIGEL it is expected that just as for the typographic specification of whole documents it should be possible to specify certain aspects of the phonetic realisation of phonology for long spans of text. An example of this would be specifications for the selection of male as opposed to female voice (or the other way around), and high volume, for the whole length of telephone dialogues. In the future, it might be expected that voice quality, loudness and accent could be chosen in an informed way, in much the same way as fonts and styles are chosen through graphology.
|
|||||
|
|||||