MAKING PLANS FOR NIGEL: Defining interfaces between computational representations of linguistic structure and output systems: Adding intonation, punctuation and typography systems to the PENMAN system. 

Chapter 1 : Introduction

inactive previous inactive up next 
PDF version

1.0  What the thesis is about

This thesis describes the design of a mechanism by which the meaningful choices of a computational systemic English grammar, the NIGEL system (Mann & Matthiessen 1983/5.a,b; Bateman & Matthiessen 1990; Penman documentation), are to be converted into both speaking and writing, as sound and image.  The aim is to make the designs as general as possible.  In particular, the focus is on intonation, punctuation and typography and how they can be specified in a formal way.  As well, attention will be paid to how these might be meaningfully controlled by a computational English language generator.

This introduction is largely background: it is necessary to first explain the basic terminology, and the computational system itself.  The form of the thesis reflects the actual path by which work has proceeded.  The first task was to understand the architecture of the computational system and its relation to linguistic theory before proceeding to develop systems that extended the capabilities of the NIGEL system (names of computer programs are in all-caps throughout).

The computational system in question is the ISI (Information Sciences Institute of the University of southern California) PENMAN system, which is a suite of computer programs.  The PENMAN system has a large systemic grammar, called NIGEL, as well as semantic systems and a number of support tools for developing and testing systemic grammars.  The NIGEL grammar is a generating grammar; it is not a parser (see Kasper 1988).  The current version of NIGEL has no provision for phonological output; its output is graphic text.  The current mechanism by which NIGEL text is graphically realised - the way in which it is punctuated - is theoretically less than adequate; its design is pragmatically dictated, rather than motivated by systemic theory.  So, for the purposes of this project the old punctuation system was redesigned.

In the thesis the terms graphology and graphetics have special meanings.  Graphology is not used to refer to the study of handwriting, as practiced by forensic scientists, rather it is taken as being directly analogous to phonology.  Graphetics is to graphology as phonetics is to phonology (cf Crystal & Davy 1969, Cummings & Simmons 1983).  A special sort of graphetic system is to be used: the computer will output text typographically; handwriting is another example of a graphetic system.

This typographic output is in the form of a text-description language, which describes both the letters and symbols of a text and its spatial organisation.  The letters and symbols can be specified in a number of sizes, fonts, and styles.  Spatial organisation consists of such aspects of a text as the way paragraphs are justified, the way that paragraphs are marked off, and where the typographic text is positioned on the page.

The text description language used in the preliminary stages of this project is called Rich Text Format (RTF) (Word 4 manual).  RTF was chosen because it can be displayed or printed by the readily available and widely used Microsoft WORD program for Apple Macintosh computers. It is a trivial computational task to convert between various text description languages, so the choice of RTF has no particular implications.

The typographic system that will be used is very general, but it is merely a subset of a ‘universal’ graphetic system.  For this reason the term typography, rather than graphetics, is used to refer to the output from the NIGEL program.

The phonological output will be provided by a commercial speech synthesiser, built by LSI (Loughborough Sound Images) and used by Assoc. Prof. R. King and Dr. J. Vonwiller from the Electrical Engineering department at the University of Sydney.

1.1  Why systemic theory?

The choice of systemic theory for this work was automatic: the NIGEL grammar is a systemic grammar.  Having chosen to work on the system, there is no choice but to utilise the theory which informed its design.  Not surprisingly, systemic theory is well suited to the work described here.  What is surprising is that NIGEL has been allowed to develop along lines which are not perfectly suited to implementing some parts of systemic linguistic theory: this ‘surprise’ will be explained later.  It is important to remember that it is the theory that drives the work on this thesis, not the architecture of the NIGEL system.

The development of NIGEL is summarized by Matthiessen (1988):

The Nigel grammar is a systemic-functional grammar for text generation, developed at the Information Sciences Institute (ISI)/ USC. It is essentially that type of systemic grammar outlined in Halliday (1969).  (Davey’s (1978) systemic generation grammar for his Proteus system followed Hudson’s (1971) systemic grammar more closely).  The first version of the Nigel grammar was developed at UC Irvine by Mark James and Michael Halliday; it was then taken over as part of the Penman project at ISI, led by Bill Mann (see eg. Matthiessen 1981, 1983.a,b; Mann 1983).  One version of the grammar was used by Terry Patten (1988) in his SLANG system, which also incorporates a systemic semantic level of the type proposed by Halliday (1973).

Systemic functional grammars and similar systems have been very widely used in text generation - together with McDonald’s MUMBLE, NIGEL is the most widely used generation grammar system.  Winograd (1983: ch.6), writing on “feature and function grammars” shows how Functional Unification Grammar, Lexical Functional Grammar and their relatives resemble the kind of Systemic Functional Grammar (SFG) that had been developed by Halliday (eg. 1967; 1969) by the late 1960s.  In Schieber (1986) the similarities between these grammars are recognised; they are all “unification” based, see also Kasper (1988).

The line between theory and application is not clearly drawn in systemics. Fawcett (1988 ) puts it that:

The fact that we do not have a priesthood of theoreticians implies that we are all, if only potentially, theoreticians. By this I mean that every time we try to use a piece of systemic descriptive apparatus and we find that it doesn’t quite do the job as we feel it should - we become theoreticians.

This aspect of systemic theory has caused it to advance by a process of evolution, rather than by revolution (Matthiessen 1989).  Continuing in this tradition, the work in phonology and graphology presented here will not generate its own new theory: it will look to previous accounts and use a process of adaptation and refinement, rather than one of replacement.  By adopting this approach new theoretical developments will fill gaps not yet filled by past work, the idea is to not ‘reinvent the wheel’.

1.2  Basic principles of systemics

There are some basic notions of systemic theory that are referred to in this thesis; they will be mentioned below.  However, there is no space to give the reasons for their importance.  Good introductions to the material below can be found in Matthiessen (1988) Martin (1990), Halliday (1978,1985.a) Halliday & Martin (1981) Butler (1985).

Stratification

The work presented here uses a model of English which consists of four strata.  There is some debate about the number of strata (cf Matthiessen 1989) but there is no space to discuss it here.  Matthiessen summarises the relationship between strata:

Strata are related symbolically; patterns at one stratum are re-expressed or realised by patterns at another stratum.  The strata are ordered in relation to one another so that higher strata are realized by lower ones (Halliday, 1961 and many other places) for example, grammar is realized by phonology. (Matthiessen (1989))

The four strata are from ‘top’ to ‘bottom’: semantics; grammar; and phonology/graphology; and phonetics/graphetics.  Figure 1.1 is a map of this stratificational architecture.  The widening of the picture at the top represents the way that language interacts with context, the narrowing towards phonetics shows the way that the diverse and infinite ‘world’ is transduced by language into a finite system of sounds.  The double-line boundary between grammar and phonology represents the arbitrariness of the realisational relationship that applies across that boundary.  In contrast to this, grammar is not arbitrary in relation to semantics (Halliday 1961, Matthiessen 1989).  The break between phonology and phonetics represents the boundary between language and expression systems.  This boundary will be highlighted throughout the thesis.

FIGURE 1.1
The stratal model of language.

Meanings are expressed by realisations of context in semantics, semantics in grammar, grammar in phonology or graphology, and graphology or phonology in graphetics or phonetics.

Meta-functions

Most linguistic work has concentrated on ideational meaning (cf Halliday, 1977).  Ideational meaning has two subtypes: experiential, expressed through transitivity in the clause - the representation of events, their participants and circumstances; and logical; which organises relationships between clauses.

Systemic Functional Linguistics does not concentrate only on ideation - propositions about who did what to what, where, when and how - and the logical relations between those statements.  There are two other functional modes of meaning taken into account - the interpersonal and the textual.  Interpersonal resources set up social relations between interactants in linguistic events, speakers and hearers.  Textual resources allow information to be  managed, to build discourse.

The computational perspective

From a linguistic, theoretical viewpoint computational grammars offer certain advantages: they force grammarians to express their grammars explicitly and formally; and they enable large scale testing, with much larger amounts of data than could be treated by hand.  In general, computational grammars help in finding out if theoretical and descriptive notions actually work.  The NIGEL grammar in particular has been useful to systemic linguistics in developing more rigourous grammars, and in formalising the nature of the systemic mode of description itself.

The work in this thesis continues in the tradition of simultaneous practical design and theoretical refinement.  On one hand, the practical side of the work is in making the NIGEL system talk, and in making it write more fluently - taking more control over the ‘look’ of the final written product.  On the other hand, theoretical conceptualisations of the nature of written and spoken language have been given high priority.  Theory has guided the design of the grammatical systems, and of the computational environment which supports them.  Thus the theoretical side of the work has supplied a guiding hand to the descriptive side - and the results of the practical side of the work are expected to generate modifications to the grammar.  Ultimately this may even entail modifications to the systemic theory itself.

1.3  Interface: a definition

The term interface is not used here to refer to the arena of interaction of a person and a machine.  Interface here refers the arena of interaction between two (linguistic) strata.  One of these strata is a grammar - it can be either a theoretical grammar or a computational, automatic grammar - while the other stratum is a physical mechanism which produces some sort of sound or graphic output - again the system in question could be either a theoretical model or a computer system.  The interface in focus here is represented by the break between the phonology / graphology stratum and the phonetic / graphetic stratum in figure  1.1.

The term interface has been used in a similar sense within stratificational theory, by Herrick (1983), to refer to the arena of the interaction of two strata.  Bateman and Mann (1988) also use the term: “we will develop an interface between abstract specifications of intonational resources and particular control parameters for text-to-speech systems”.  Their interface is more specialised than the one under consideration here, but the underlying principle is the same.  The “abstract specifications of intonational resources” on the grammatical face interface with “control parameters for text-to speech systems” on the mechanical face.

The interface between the ‘emic’ and ‘etic’ strata is not the only interface in NIGEL: there is an interface between the lexico-grammar and semantic strata which is already implemented (cf figure 1.2 below).

1.4   An overview of the Penman system

The Penman system was developed at the Information Sciences Institute (ISI), the University of Southern California (USC). The version discussed here runs on an Apple Macintosh II computer.  At the heart of the system is a systemic grammar of English called NIGEL.

The text generation system consists of the following parts: the grammar; the ‘environment’; and the traversal algorithm.  These three parts make up a suite of computer programs.  They can be controlled both directly by users and automatically by large databases.

How it works

It is necessary to understand some aspects of the way in which the NIGEL grammar works in order to follow the argument of this thesis.  The following description is based on a paper by Matthiessen (1988), Nigel - A Systemic Functional Generation Grammar.

The key to PENMAN’s operation is the environment.  The environment can be seen as consisting of three bases of support for the generation process - the ideation base or ‘knowledge’ base, the interaction base or ‘user model’ and the text base or ‘discourse model’.  The latter two of these are particularly relevant to intonational meaning.

As one would expect from the metaphor, NIGEL in some sense ‘exists in’ its environment.  NIGEL can ‘communicate’ with the environment through the chooser and inquiry interface outlined below.

The generation process can conveniently be looked broken down into two parts: (i) the process of global text generation, which involves text planning, information management and keeping track of interactants; and (ii) more local text generation, which is driven by NIGEL making demands on the environment.   Figure 1.2 is a conceptual map of Penman; it can be thought of as a look ‘into’ the vase model.  The bold text and lines (i) in the diagram locate the two processes outlined above; while the rest of the diagram (not in bold) (ii) shows the resources of the PENMAN system.

Figure 1.2
A conceptual map of PENMAN locating resources and processes with respect to the stratal model.

  1. Global generation

    The organisation of clauses into whole texts is handled by the global generation process - where the initiative is taken by the environment. For the purposes of this thesis the most important part of the generation process is the ‘local’ generation, which generates clauses.  Here the generation process is guided by the organisation of the grammar.

  2. Local generation

    This description will refer to diagram 1.3, which is a collection of Matthiessen’s (1988) diagrams. It shows a fragment of the NIGEL grammar, and the “chooser and inquiry” interface between it and NIGEL’s environment.  The stratal boundary between the grammar and the semantic stratum is shown on the diagram with a bold horizontal line.

graphics1

Figure 1.3
Part of the grammar driven text-generation system.

The grammar as a resource

The grammar is a resource. It is the piece of the system that can be used for either generation or parsing. The resource is modular with respect to the procedural part of the program.  That is to say, it is possible to change the grammar without changing the generation program and vice versa.  The grammar is essentially a system network with associated system networks, a fragment of which is shown in figure 1.3 - the fragment is from the interpersonal part of the grammar that generates the mood structure of a clause.

The traversal and choosers

A system network is somewhat like a flow chart: it is a directed graph. The path taken through the graph is conditioned by decisions taken at various nodes in the graph.  The basic unit in a network is the system.  For example, the dotted rectangle labelled system on the diagram encloses one system.  This system, with the square bracket notation represents an “either/or” choice between the two output features, indicative and imperative; the feature clause is an entry condition to this system.  (Throughout the thesis, when features are referred to in the text they will be in the same sans serif font used in the diagrams (Avant Garde).)

The traversal program will pick one of these output features depending on what the PENMAN system is to say.  The decision is made by a system of inquiries and choosers.  Each system has a chooser associated with it.  The chooser consists of one or more inquiries, which it presents to NIGEL’s environment to obtain the information needed to make an appropriate choice.  The inquiries in Figure 1.3 are shown with English glosses, but the actual implemented inquiries are couched in the terms of PENMAN’s semantic systems (see eg. Nebel & Sondheimer 1986).  So, the small-scale generation of texts involves NIGEL making demands on the environment.

The entry condition to the indicative / imperative system is a simple one, the only required feature is clause. There is also a complex entry condition on the diagram; this is an “or” entry condition: either declarative or imperative features are sufficient to enter the tagging system.  The other type of entry condition, not seen here is the “and” type which requires a number of features to have been chosen before it can be entered.  There is a complete key to systemic notation in Appendix I.

Realisation and the blackboard

Each time a choice is made in traversal any realisation statements associated with a chosen feature are executed.  The structure described by realisation statements is built up in a part of the program called the blackboard.  Realisation statements appear in the boxes under features in the diagram.  Each of these describes some aspect of the clause, either syntactic or lexical, being generated.  Take for example the feature indicative; in this case the structure is syntactic - indicative has a realisation statement that adds a subject slot to the utterance, and says that the Mood of the clause will consist (initially) of a subject.  The blackboard in Figure 1.3 shows a blackboard for a clause such as:

1.1
          
This
is a cat              
isn’t      
it
Mood
Residue
Moodtag
Subject
Finite
Tagfinite
Tagsubj.

Another possible path through the network, choosing the features clause, indicative, interrogative and yes/no would generate the blackboard structure Mood(Finite^Subject); meaning that the Mood consists of a Finite element and a Subject, in that order. This is the mood structure of the following:

1.2
            
can
I  
help you
Mood
Residue
Finite
Subject

All three meta-functions build structure in the blackboard: as well as the interpersonal structure of mood, there is the transitivity structure, which shows the roles of participants, circumstances, and the process-type of the clause; and the theme/rheme structure, which shows the textual structure of the clause.  A full discussion of these aspects of clause structure can be found in Halliday (1985.a).

When the structure, complete with lexical items, has been generated it must be output. If the architecture of the system is to be strictly stratal then this should be done by a process of realisation.  The architecture of the old NIGEL system is not stratal, but it will be made so in the new version developed here; the description of this process of realisation is the subject of chapters two and three.

The traversal program repeatedly cycles through the grammatical network, starting at clause level, generating the gross structure of a clause, and then re-entering the network until everything is exhaustively described.

The placing of the interface with respect to NIGEL

The new output programs will be located, in terms of the vase picture, in the phonetic/graphetic stratum.  The nature of the realisational relationship between the ‘emic’ and the ‘etic’ strata will be clarified below in section 1.5.

Important differences with other theories

It is important not to make the mistake of thinking of systemic grammars as big fancy phrase structure rules (Matthiessen 1988). The mode of organisation of NIGEL is as a paradigmatic resource - it is organised according to meaning rather than syntagmatic distribution.

The grammar is more accurately described as a lexico-grammar. Perhaps a more accurate description still would be morpho-lexico-grammar, as the lexicon and morphology are both handled in the grammar itself.  Thus systemic theory conflates parts which are separated and modularised in other theories.  Generative linguistics separates out morphology and lexicon from syntactic grammar (cf Horrocks 1987 ch.1).  By contrast, systemic approach is to treat morphology as lowest-ranking grammar and lexis the most delicate grammar, and to specify structural fragments as realisations in paradigmatic contexts defined by the system network, through realisation statements and the blackboard (cf Halliday 1961; Matthiessen 1988).

1.5  Design parameters and principles for the interfaces

There are three primary desirable attributes for an interface between a grammar and its substantive realisation as sound or image, listed below.

  1. Stratal Realisational Principle (SRP): a strict stratificational model should be adhered to.  This principle means that the output module should make no meanings of its own, rather it should realise graphological structure, and that the output system itself should play no part in the building of lexico-grammatical or graphological structure.  In line with this principle, the strategy taken in this project was not to build a whole new meaning making resource ‘below’ or ‘beside’ the existing grammar; instead the original NIGEL lexico-grammar was extended.

  2. Stratal opacity principle (SOP): the interface should be opaque.  The output program should be able to function without ‘looking back’ into the grammar and the grammar should not be able to look into output program (nor should they need to do such ‘looking’).  All information should be passed in a predictable way.

  3. Stratal unidirectionality principle (SUP): the interface should be one-way - from the grammar to the output module.  The output module should not need to communicate anything to the grammar.

If an interface satisfies these principles then the ‘lower’ side of the interface, the synthesising process, should be describable as a procedural algorithm - it will not need to interact with other strata. All of the input to the module comes from the interface, so it should possible to write a computer program to do the output. The output modules are referred to as output programs, to emphasise the deterministic nature of the speech- and typography-synthesising processes.

These three principles have great practical value. They describe a maximally modular system. Their value lies in two areas.

  1. From a computational point of view modularity allows development of various components independently: modular systems are much more tractable than non-modular ones.

  2. From a systemic-theoretical viewpoint, modularity is equally desirable. Systemicists have been building modular theories since Halliday (1961), for many of the same reasons that computer systems designers build modular programs.  Indeed, no linguistic work could proceed without some sort of modular approach.  The systemic modularity is many-dimensional and functionally motivated; this design follows the theoretical modularity of stratification outlined above.

Clean interfaces

We might call an interface relationship that satisfies the first two points (SRP and SOP) above a “clean interface”.  The metaphor of cleanliness is used to give a sense of the high value to be placed on such interfaces.  Interfaces which are not clean tend to make for confusion and difficulty in engineering: it is very difficult to work with a system whose modules interact in an unconstrained way.  So, to the systems engineer, interfacial relationships which are not clean are ugly.

Text output is interpreted from the NIGEL program’s internal grammatical representation after the program has finished working out the function structure of a nascent utterance.  This situation is illustrated in figure 1.4; there is no clear distinction between the traversal program and an output module.  The output program is part of the grammatical traversal. The interface, complete with punctuation algorithm was added as an afterthought, without much theoretical guidance (Matthiessen, personal communication).

The old NIGEL output system could not be described as a clean interface since it violates the first two ‘clean’ principles.  Firstly the interface is not between strata, rather it is part of the grammatical traversal program.  Secondly, it is not opaque - the punctuating process ‘digs around’ in the grammar and the blackboard to find the information that is output as punctuation (thus violating the SOP). For example; the grammar is interrogated to establish whether a clause is followed by another in a clause complex, and whether it is interrogative.  While it is essentially one way - with the punctuation system not affecting the process of grammatical traversal - thus satisfying the SUP, the old output system is not clean.  In fact it is poorly documented and difficult to understand, modify and maintain.  This thesis outlines the general form of the interface that is to replace it.

graphics2

FIGURE 1.4
‘Before & after’ pictures of NIGEL’s output interface.

There is a one-to-one correspondence between any given internal lexico-grammatical structure and a derived punctuation.  There are no systems in the lexico-grammar itself which are sensitive to the way something is to be punctuated - thus the meaning-making potential of punctuation is not being exploited.  This may seem a trivial point, but it has theoretical repercussions which are far reaching.  Punctuation is discussed in chapter three.

If a phonological output program for NIGEL which worked the same way as the current text output program were to be added, then there would be a similar one-to-one mapping: each lexico-grammatical product would produce one phonetic product.  NIGEL is a grammar of written English, and as such it does not produce those grammatical structures which can be directly realized as phonological prosodies.  A phonological output program added on in this way would have two deficiencies: a lack of the relevant lexico-grammatical structures; and a ‘dirty’ interface.

This thesis describes the work that has been done in extending the NIGEL lexico-grammar by adding both a phonological and a graphological stratum, and designing a clean interface between the lexico-grammar and an output program.

Should there be one interface or two?

There will be two interfaces, specialised for the two modes of output. The two will both share some of the extensions to the grammar - this is because both phonology and graphology express some of the same structure generated by phono-grammatical systems, notably the wording.  Examples of the similarities and differences are plentiful later in the thesis.  The picture of the proposed ‘clean’ NIGEL architecture is in Figure 1.4.

1.6  Prosody

Two of the main concerns of this thesis, intonation and punctuation, both organise language prosodically. In the terms of this thesis prosody is any supra-segmental mode of linguistic organisation, approach that is in the tradition of Firth (1948) and Halliday; Palmer (1970) commentating on the work of J.R. Firth puts it that (paraphrased): prosody is anything that is not segmental.  

The approach taken here differs from the concern of the generative phonology school with supra-segmental organisation: there are some aspects of syntagmatic organisation which are common to a number of segmental units, e.g. Tone (Kenstowicz & Kisseberth 1979 provide an introduction).  The auto-segmental phonology school tends to pay a lot of attention to the way that prosodies are tied to segmental units, and their phonology has a segmentation in the supra-segmental tier.  In contrast to this, the model adopted here takes a more pragmatic (in the non-linguistic sense) approach to supra-segmental organisation.  There is no theoretic emphasis on the way in which prosodies are ‘tied’ to segmental strings; that is left as an engineering problem.  All of the rule-based procedural phonology is left to that component of the system which does not make any meanings - the output program.

For the purposes of this thesis the notion of prosody needs to include writing as well as speaking.  Firth (1948) described the question mark as a prosody: although it is realised by a particle the question really belongs (typically) to a whole sentence.  It is possible to conceive of other syntagmatic ways of realising this prosodic ‘question-ness’, for example one could draw blue circles around questions - and red and green circles respectively around sentences which would have been marked with full-stops or exclamation points (cf. the Spanish ‘upside-down’ question mark which comes before questions, in addition to the “?” that comes at the end).

Firth (1948) used the term piece to refer to text spans of any length.  Thus one can talk of letters, words, syllables, phonemes or sentences as pieces of text.  The term will be used in this thesis in the same way, for spans of both written and spoken language.

1.7  Generality

It was stated above that the interfaces to phonological and graphological output engines are to be as general as possible.  The intention is to design interfaces which are capable of the greatest possible extension.  A very general graphological interface might be designed for the realisation of lexico-grammatical structure in different fonts for example, while a very general phonological interface might be able to specify voice quality.  Neither of these possibilities is to be realised in the immediate future - but it is considered important not to build the system in such a way that it could not handle these things at a later date.

There are two dimensions along which the project is not designed to be general. Neither (i) cross-linguistic generality nor (ii) cross-theoretical generality is claimed for this work.

  1. There is on-going work on a German version of NIGEL at GMD/IPSI, Germany and within the multi-language generation project at The University of Sydney work has started on Chinese and Japanese NIGEL grammars.  In the mid 1980s a NIGEL compatible grammar of Japanese was developed by John Bateman and others at Kyoto University in Japan (see Bateman et al, 1987, Bateman & Matthiessen, in press). Work will start on other languages soon. We are now in a position to extend the work on interfaces described in this thesis to these projects.

  2. There is no generality sought across different linguistic theories. There are some things built into the interface which are specific to a Hallidayian English grammar. An example of this is the Given/New structure which is described in the next chapter.

Paralinguistic phenomena

While it is well known that punctuation is linguistic, there are other modes of graphetic organisation, which have not been treated linguistically.  Halliday (1985.b p.30) outlines the difference between prosodic linguistic and paralinguistic features for spoken language:

The difference between… [prosodic and paralinguistic features] …is as follows.  Prosodic features are part of the linguistic system; they carry systematic contrasts in meaning, just like other resources in the grammar, and what distinguishes them from these other resources is that they spread across extended portions of speech, like an intonation contour, for example.  Paralinguistic features also extend over stretches of varying length; but they are not systematic - they are not part of the grammar, but rather additional variations by which the speaker signals the import of what he is saying.

This is a global perspective on what is systemic and what is not - it is impossible to systematise paralinguistic features in a very general way.  Halliday lists such features as “tambre (breathy, creaky etc. voice qualities), tempo, loudness, facial and bodily gestures” as paralinguistic.  However, from a local perspective these may be quite systematic.

A richest possible phonology

People make use of ‘paralinguistic’ features in a systemic way: all of the paralinguistic aspects of speaking listed above can be used systematically.  However, in a global sense there is no ‘standard’ grammar which would adequately explain the way that, for example, variations in tempo are used by speakers.  A very general phonological component in a generation system might allow such control by operating in a locally systemic way.

A richest possible graphology

The above quote comparing the prosodic and paralinguistic comes from a section in Halliday (1985.b) entitled “What writing leaves out”, the implication being that there are fewer prosodic aspects of writing than there are of speaking.  However Halliday does go on to describe the prosodic function of punctuation.  This thesis will explain how punctuation can be produced by the NIGEL system in the same way as intonation.  It will try to ‘push back’ the boundary between the prosodic and the paralinguistic.  Paralinguistic aspects of written text include its spatial organisation and the type-faces that it is printed in; since these can be shown to be systematic they can be treated as prosodic, the model developed here allows these locally-systemic prosodies to be incorporated in the generation system.

The major source of examples in this thesis is one text, the liner notes to a compact disc - James Brown’s In the Jungle Groove.  It  is reproduced in figure 1.5.  In this text major shifts in the discourse structure are signalled with large capitals at the beginning of paragraphs.  As well as the large-scale typographic distinctions, there are some systematic distinctions made between sub-sentence units: proper names have three different realisations, song names are in italics, album names in all-caps, and other names have the standard initial capitalisation.

While there is some literature on the function of paralinguistic graphetic features it tends to be quite anecdotal.  Crystal & Davy (1969) talk about typography and punctuation in terms of genre; Cummings and Simmons (1983) look at the graphetic organisation of literature; see also a number of articles in “Visible Language” issues  XX 1 and XXXIII 1, notably Berry (1989).  For example, type-face choice is often cited as systematic, but it is typically described very vaguely (cf Hofstadter 1983, Knuth 1982).  The NIGEL system provides a good place to start developing and testing models of typographic systems, capable of signalling what Halliday (1985.b p.30 quoted above) described as  “the import of what he is saying”.  Although typography is a paralinguistic matter in a global sense it will be shown in chapter three that it can be seen as a prosodic linguistic feature in a local sense: that is, typography can be described systematically for groups of texts, and the description of typography can be done in the same terms as the description of punctuation.

The recent introduction of typographic systems to small computers, effectively giving users the physical resources of a typesetter and an art-department, has been a significant inspiration to this work.  These typographic systems allow specification of typography in formal codes.  The codes essentially provide a discrete state interface between typography and users.  Such an interface make it possible to design a very general graphological stratum which can, to return to the Jungle Groove example, specify that all song names are to be realised in italics and that major shifts in a text are to be marked off with a large capital letter.

The non-discrete and complicated nature of acoustic phonetics does not lend itself to computational control in such a way; instead it is very hard to engineer understandable speech synthesis with even the rudiments of intonation.  This means that the graphological systems in NIGEL can be to some extent far in advance of the phonological.  While speech synthesis technology is a long way from allowing us to specify voice quality and volume in a discrete language we can specify font-choice, size, and case.  There is unfortunately not enough room in this thesis to dwell on the ways that choices in font-choice and voice quality are used systematically; the focus is on how those choices may be represented.

There are some modes of graphetic organisation which will remain in the domain of the paralinguistic: the examples given in figure 1.6 show an advertisement, in which one letter appears in a different type-face to the rest of its word; and two versions of Carroll’s “the mouses tail”.  In both of these, while an intuitive understanding of the meaning of the graphetic organisation is obvious, the iconicity of the tail, and the ‘difference’ of the letter “i”, there is no underlying system by which any poem, or any word could receive the same graphetic realisation.  This is in contrast to the In The Jungle Groove text, in which there is a graphetic, typographic system.

Figure 1.5
The Liner-notes to James Brown’s “In the Jungle Groove”.

Figure 1.6
Three examples of paralinguistic graphetic organisation

1.8  Other similar work

Bateman and Mann (1988) proposed a similar project to this one; looking at “the abstract specification of intonation”, the most obvious difference being the attention paid to graphology as well as phonology by this thesis.  The proposal was not implemented, due to lack of funding.  Silverman (1987) provides a good survey of more general work on intonation in speech synthesis.

1.9  Outline of the thesis

In summary of sections 1.1 to 1.8, this thesis could be described in the terms of a problem and its solution.  The problem is essentially a three part lack in the old PENMAN system:

  1. The old version of NIGEL has inadequate system architecture; that is, it does not have a clean interface between graphology and an output program.

  2. In its original form NIGEL has no phonological output system.

  3. The old NIGEL system also lacks some lexico-grammatical systems.  There are no systems for making meanings through anything but wording, no systems for meaningful intonation, and no systems for the meaningful manipulation of punctuation or typography.

Restatements of these general problems, and their solutions, are embedded in the rest of the thesis.

The second chapter discusses Hallidayian phonology, and what is involved in adding phonology to the lexico-grammar, as well as looking briefly at the speech synthesiser and specifying the form that phonological output will take.  Chapter three follows with a description of the graphology.  There was no model that could simply be adopted, so the graphological work is my own synthesis of Halliday’s (1985.b) and Waller’s (1980) with standard systemic theory.  The discussion here concentrates on describing only the graphological system, without looking at the lexico-grammar.

Chapters four and five discuss the interfaces themselves, graphological and phonological respectively.  Both interfaces use the same representational formalism.  The focus in the graphological chapter is on the interfacial code itself, and with actual examples of typography produced from it.  The focus in chapter five is on an overview of the speech synthesiser.  The conclusion (chapter six) presents the agenda for further development, in coding new parts of NIGEL and some theoretical considerations.

can
I  
help you
Mood
Residue
Finite
Subject

All three meta-functions build structure in the blackboard: as well as the interpersonal structure of mood, there is the transitivity structure, which shows the roles of participants, circumstances, and the process-type of the clause; and the theme/rheme structure, which shows the textual structure of the clause.  A full discussion of these aspects of clause structure can be found in Halliday (1985.a).

When the structure, complete with lexical items, has been generated it must be output. If the architecture of the system is to be strictly stratal then this should be done by a process of realisation.  The architecture of the old NIGEL system is not stratal, but it will be made so in the new version developed here; the description of this process of realisation is the subject of chapters two and three.

The traversal program repeatedly cycles through the grammatical network, starting at clause level, generating the gross structure of a clause, and then re-entering the network until everything is exhaustively described.

The placing of the interface with respect to NIGEL

The new output programs will be located, in terms of the vase picture, in the phonetic/graphetic stratum.  The nature of the realisational relationship between the ‘emic’ and the ‘etic’ strata will be clarified below in section 1.5.

Important differences with other theories

It is important not to make the mistake of thinking of systemic grammars as big fancy phrase structure rules (Matthiessen 1988). The mode of organisation of NIGEL is as a paradigmatic resource - it is organised according to meaning rather than syntagmatic distribution.

The grammar is more accurately described as a lexico-grammar. Perhaps a more accurate description still would be morpho-lexico-grammar, as the lexicon and morphology are both handled in the grammar itself.  Thus systemic theory conflates parts which are separated and modularised in other theories.  Generative linguistics separates out morphology and lexicon from syntactic grammar (cf Horrocks 1987 ch.1).  By contrast, systemic approach is to treat morphology as lowest-ranking grammar and lexis the most delicate grammar, and to specify structural fragments as realisations in paradigmatic contexts defined by the system network, through realisation statements and the blackboard (cf Halliday 1961; Matthiessen 1988).

1.5  Design parameters and principles for the interfaces

There are three primary desirable attributes for an interface between a grammar and its substantive realisation as sound or image, listed below.

  1. Stratal Realisational Principle (SRP): a strict stratificational model should be adhered to.  This principle means that the output module should make no meanings of its own, rather it should realise graphological structure, and that the output system itself should play no part in the building of lexico-grammatical or graphological structure.  In line with this principle, the strategy taken in this project was not to build a whole new meaning making resource ‘below’ or ‘beside’ the existing grammar; instead the original NIGEL lexico-grammar was extended.

  2. Stratal opacity principle (SOP): the interface should be opaque.  The output program should be able to function without ‘looking back’ into the grammar and the grammar should not be able to look into output program (nor should they need to do such ‘looking’).  All information should be passed in a predictable way.

  3. Stratal unidirectionality principle (SUP): the interface should be one-way - from the grammar to the output module.  The output module should not need to communicate anything to the grammar.

If an interface satisfies these principles then the ‘lower’ side of the interface, the synthesising process, should be describable as a procedural algorithm - it will not need to interact with other strata. All of the input to the module comes from the interface, so it should possible to write a computer program to do the output. The output modules are referred to as output programs, to emphasise the deterministic nature of the speech- and typography-synthesising processes.

These three principles have great practical value. They describe a maximally modular system. Their value lies in two areas.

  1. From a computational point of view modularity allows development of various components independently: modular systems are much more tractable than non-modular ones.

  2. From a systemic-theoretical viewpoint, modularity is equally desirable. Systemicists have been building modular theories since Halliday (1961), for many of the same reasons that computer systems designers build modular programs.  Indeed, no linguistic work could proceed without some sort of modular approach.  The systemic modularity is many-dimensional and functionally motivated; this design follows the theoretical modularity of stratification outlined above.

Clean interfaces

We might call an interface relationship that satisfies the first two points (SRP and SOP) above a “clean interface”.  The metaphor of cleanliness is used to give a sense of the high value to be placed on such interfaces.  Interfaces which are not clean tend to make for confusion and difficulty in engineering: it is very difficult to work with a system whose modules interact in an unconstrained way.  So, to the systems engineer, interfacial relationships which are not clean are ugly.

Text output is interpreted from the NIGEL program’s internal grammatical representation after the program has finished working out the function structure of a nascent utterance.  This situation is illustrated in figure 1.4; there is no clear distinction between the traversal program and an output module.  The output program is part of the grammatical traversal. The interface, complete with punctuation algorithm was added as an afterthought, without much theoretical guidance (Matthiessen, personal communication).

The old NIGEL output system could not be described as a clean interface since it violates the first two ‘clean’ principles.  Firstly the interface is not between strata, rather it is part of the grammatical traversal program.  Secondly, it is not opaque - the punctuating process ‘digs around’ in the grammar and the blackboard to find the information that is output as punctuation (thus violating the SOP). For example; the grammar is interrogated to establish whether a clause is followed by another in a clause complex, and whether it is interrogative.  While it is essentially one way - with the punctuation system not affecting the process of grammatical traversal - thus satisfying the SUP, the old output system is not clean.  In fact it is poorly documented and difficult to understand, modify and maintain.  This thesis outlines the general form of the interface that is to replace it.

graphics2

FIGURE 1.4
‘Before & after’ pictures of NIGEL’s output interface.

There is a one-to-one correspondence between any given internal lexico-grammatical structure and a derived punctuation.  There are no systems in the lexico-grammar itself which are sensitive to the way something is to be punctuated - thus the meaning-making potential of punctuation is not being exploited.  This may seem a trivial point, but it has theoretical repercussions which are far reaching.  Punctuation is discussed in chapter three.

If a phonological output program for NIGEL which worked the same way as the current text output program were to be added, then there would be a similar one-to-one mapping: each lexico-grammatical product would produce one phonetic product.  NIGEL is a grammar of written English, and as such it does not produce those grammatical structures which can be directly realized as phonological prosodies.  A phonological output program added on in this way would have two deficiencies: a lack of the relevant lexico-grammatical structures; and a ‘dirty’ interface.

This thesis describes the work that has been done in extending the NIGEL lexico-grammar by adding both a phonological and a graphological stratum, and designing a clean interface between the lexico-grammar and an output program.

Should there be one interface or two?

There will be two interfaces, specialised for the two modes of output. The two will both share some of the extensions to the grammar - this is because both phonology and graphology express some of the same structure generated by phono-grammatical systems, notably the wording.  Examples of the similarities and differences are plentiful later in the thesis.  The picture of the proposed ‘clean’ NIGEL architecture is in Figure 1.4.

1.6  Prosody

Two of the main concerns of this thesis, intonation and punctuation, both organise language prosodically. In the terms of this thesis prosody is any supra-segmental mode of linguistic organisation, approach that is in the tradition of Firth (1948) and Halliday; Palmer (1970) commentating on the work of J.R. Firth puts it that (paraphrased): prosody is anything that is not segmental.  

The approach taken here differs from the concern of the generative phonology school with supra-segmental organisation: there are some aspects of syntagmatic organisation which are common to a number of segmental units, e.g. Tone (Kenstowicz & Kisseberth 1979 provide an introduction).  The auto-segmental phonology school tends to pay a lot of attention to the way that prosodies are tied to segmental units, and their phonology has a segmentation in the supra-segmental tier.  In contrast to this, the model adopted here takes a more pragmatic (in the non-linguistic sense) approach to supra-segmental organisation.  There is no theoretic emphasis on the way in which prosodies are ‘tied’ to segmental strings; that is left as an engineering problem.  All of the rule-based procedural phonology is left to that component of the system which does not make any meanings - the output program.

For the purposes of this thesis the notion of prosody needs to include writing as well as speaking.  Firth (1948) described the question mark as a prosody: although it is realised by a particle the question really belongs (typically) to a whole sentence.  It is possible to conceive of other syntagmatic ways of realising this prosodic ‘question-ness’, for example one could draw blue circles around questions - and red and green circles respectively around sentences which would have been marked with full-stops or exclamation points (cf. the Spanish ‘upside-down’ question mark which comes before questions, in addition to the “?” that comes at the end).

Firth (1948) used the term piece to refer to text spans of any length.  Thus one can talk of letters, words, syllables, phonemes or sentences as pieces of text.  The term will be used in this thesis in the same way, for spans of both written and spoken language.

1.7  Generality

It was stated above that the interfaces to phonological and graphological output engines are to be as general as possible.  The intention is to design interfaces which are capable of the greatest possible extension.  A very general graphological interface might be designed for the realisation of lexico-grammatical structure in different fonts for example, while a very general phonological interface might be able to specify voice quality.  Neither of these possibilities is to be realised in the immediate future - but it is considered important not to build the system in such a way that it could not handle these things at a later date.

There are two dimensions along which the project is not designed to be general. Neither (i) cross-linguistic generality nor (ii) cross-theoretical generality is claimed for this work.

  1. There is on-going work on a German version of NIGEL at GMD/IPSI, Germany and within the multi-language generation project at The University of Sydney work has started on Chinese and Japanese NIGEL grammars.  In the mid 1980s a NIGEL compatible grammar of Japanese was developed by John Bateman and others at Kyoto University in Japan (see Bateman et al, 1987, Bateman & Matthiessen, in press). Work will start on other languages soon. We are now in a position to extend the work on interfaces described in this thesis to these projects.

  2. There is no generality sought across different linguistic theories. There are some things built into the interface which are specific to a Hallidayian English grammar. An example of this is the Given/New structure which is described in the next chapter.

Paralinguistic phenomena

While it is well known that punctuation is linguistic, there are other modes of graphetic organisation, which have not been treated linguistically.  Halliday (1985.b p.30) outlines the difference between prosodic linguistic and paralinguistic features for spoken language:

The difference between… [prosodic and paralinguistic features] …is as follows.  Prosodic features are part of the linguistic system; they carry systematic contrasts in meaning, just like other resources in the grammar, and what distinguishes them from these other resources is that they spread across extended portions of speech, like an intonation contour, for example.  Paralinguistic features also extend over stretches of varying length; but they are not systematic - they are not part of the grammar, but rather additional variations by which the speaker signals the import of what he is saying.

This is a global perspective on what is systemic and what is not - it is impossible to systematise paralinguistic features in a very general way.  Halliday lists such features as “tambre (breathy, creaky etc. voice qualities), tempo, loudness, facial and bodily gestures” as paralinguistic.  However, from a local perspective these may be quite systematic.

A richest possible phonology

People make use of ‘paralinguistic’ features in a systemic way: all of the paralinguistic aspects of speaking listed above can be used systematically.  However, in a global sense there is no ‘standard’ grammar which would adequately explain the way that, for example, variations in tempo are used by speakers.  A very general phonological component in a generation system might allow such control by operating in a locally systemic way.

A richest possible graphology

The above quote comparing the prosodic and paralinguistic comes from a section in Halliday (1985.b) entitled “What writing leaves out”, the implication being that there are fewer prosodic aspects of writing than there are of speaking.  However Halliday does go on to describe the prosodic function of punctuation.  This thesis will explain how punctuation can be produced by the NIGEL system in the same way as intonation.  It will try to ‘push back’ the boundary between the prosodic and the paralinguistic.  Paralinguistic aspects of written text include its spatial organisation and the type-faces that it is printed in; since these can be shown to be systematic they can be treated as prosodic, the model developed here allows these locally-systemic prosodies to be incorporated in the generation system.

The major source of examples in this thesis is one text, the liner notes to a compact disc - James Brown’s In the Jungle Groove.  It  is reproduced in figure 1.5.  In this text major shifts in the discourse structure are signalled with large capitals at the beginning of paragraphs.  As well as the large-scale typographic distinctions, there are some systematic distinctions made between sub-sentence units: proper names have three different realisations, song names are in italics, album names in all-caps, and other names have the standard initial capitalisation.

While there is some literature on the function of paralinguistic graphetic features it tends to be quite anecdotal.  Crystal & Davy (1969) talk about typography and punctuation in terms of genre; Cummings and Simmons (1983) look at the graphetic organisation of literature; see also a number of articles in “Visible Language” issues  XX 1 and XXXIII 1, notably Berry (1989).  For example, type-face choice is often cited as systematic, but it is typically described very vaguely (cf Hofstadter 1983, Knuth 1982).  The NIGEL system provides a good place to start developing and testing models of typographic systems, capable of signalling what Halliday (1985.b p.30 quoted above) described as  “the import of what he is saying”.  Although typography is a paralinguistic matter in a global sense it will be shown in chapter three that it can be seen as a prosodic linguistic feature in a local sense: that is, typography can be described systematically for groups of texts, and the description of typography can be done in the same terms as the description of punctuation.

The recent introduction of typographic systems to small computers, effectively giving users the physical resources of a typesetter and an art-department, has been a significant inspiration to this work.  These typographic systems allow specification of typography in formal codes.  The codes essentially provide a discrete state interface between typography and users.  Such an interface make it possible to design a very general graphological stratum which can, to return to the Jungle Groove example, specify that all song names are to be realised in italics and that major shifts in a text are to be marked off with a large capital letter.

The non-discrete and complicated nature of acoustic phonetics does not lend itself to computational control in such a way; instead it is very hard to engineer understandable speech synthesis with even the rudiments of intonation.  This means that the graphological systems in NIGEL can be to some extent far in advance of the phonological.  While speech synthesis technology is a long way from allowing us to specify voice quality and volume in a discrete language we can specify font-choice, size, and case.  There is unfortunately not enough room in this thesis to dwell on the ways that choices in font-choice and voice quality are used systematically; the focus is on how those choices may be represented.

There are some modes of graphetic organisation which will remain in the domain of the paralinguistic: the examples given in figure 1.6 show an advertisement, in which one letter appears in a different type-face to the rest of its word; and two versions of Carroll’s “the mouses tail”.  In both of these, while an intuitive understanding of the meaning of the graphetic organisation is obvious, the iconicity of the tail, and the ‘difference’ of the letter “i”, there is no underlying system by which any poem, or any word could receive the same graphetic realisation.  This is in contrast to the In The Jungle Groove text, in which there is a graphetic, typographic system.

Figure 1.5
The Liner-notes to James Brown’s “In the Jungle Groove”.

Figure 1.6
Three examples of paralinguistic graphetic organisation

1.8  Other similar work

Bateman and Mann (1988) proposed a similar project to this one; looking at “the abstract specification of intonation”, the most obvious difference being the attention paid to graphology as well as phonology by this thesis.  The proposal was not implemented, due to lack of funding.  Silverman (1987) provides a good survey of more general work on intonation in speech synthesis.

1.9  Outline of the thesis

In summary of sections 1.1 to 1.8, this thesis could be described in the terms of a problem and its solution.  The problem is essentially a three part lack in the old PENMAN system:

  1. The old version of NIGEL has inadequate system architecture; that is, it does not have a clean interface between graphology and an output program.

  2. In its original form NIGEL has no phonological output system.

  3. The old NIGEL system also lacks some lexico-grammatical systems.  There are no systems for making meanings through anything but wording, no systems for meaningful intonation, and no systems for the meaningful manipulation of punctuation or typography.

Restatements of these general problems, and their solutions, are embedded in the rest of the thesis.

The second chapter discusses Hallidayian phonology, and what is involved in adding phonology to the lexico-grammar, as well as looking briefly at the speech synthesiser and specifying the form that phonological output will take.  Chapter three follows with a description of the graphology.  There was no model that could simply be adopted, so the graphological work is my own synthesis of Halliday’s (1985.b) and Waller’s (1980) with standard systemic theory.  The discussion here concentrates on describing only the graphological system, without looking at the lexico-grammar.

Chapters four and five discuss the interfaces themselves, graphological and phonological respectively.  Both interfaces use the same representational formalism.  The focus in the graphological chapter is on the interfacial code itself, and with actual examples of typography produced from it.  The focus in chapter five is on an overview of the speech synthesiser.  The conclusion (chapter six) presents the agenda for further development, in coding new parts of NIGEL and some theoretical considerations.

This
is a cat              
isn’t      
it
Mood
Residue
Moodtag
Subject
Finite
Tagfinite
Tagsubj.

Another possible path through the network, choosing the features clause, indicative, interrogative and yes/no would generate the blackboard structure Mood(Finite^Subject); meaning that the Mood consists of a Finite element and a Subject, in that order. This is the mood structure of the following:

1.2
            
can
I  
help you
Mood
Residue
Finite
Subject

All three meta-functions build structure in the blackboard: as well as the interpersonal structure of mood, there is the transitivity structure, which shows the roles of participants, circumstances, and the process-type of the clause; and the theme/rheme structure, which shows the textual structure of the clause.  A full discussion of these aspects of clause structure can be found in Halliday (1985.a).

When the structure, complete with lexical items, has been generated it must be output. If the architecture of the system is to be strictly stratal then this should be done by a process of realisation.  The architecture of the old NIGEL system is not stratal, but it will be made so in the new version developed here; the description of this process of realisation is the subject of chapters two and three.

The traversal program repeatedly cycles through the grammatical network, starting at clause level, generating the gross structure of a clause, and then re-entering the network until everything is exhaustively described.

The placing of the interface with respect to NIGEL

The new output programs will be located, in terms of the vase picture, in the phonetic/graphetic stratum.  The nature of the realisational relationship between the ‘emic’ and the ‘etic’ strata will be clarified below in section 1.5.

Important differences with other theories

It is important not to make the mistake of thinking of systemic grammars as big fancy phrase structure rules (Matthiessen 1988). The mode of organisation of NIGEL is as a paradigmatic resource - it is organised according to meaning rather than syntagmatic distribution.

The grammar is more accurately described as a lexico-grammar. Perhaps a more accurate description still would be morpho-lexico-grammar, as the lexicon and morphology are both handled in the grammar itself.  Thus systemic theory conflates parts which are separated and modularised in other theories.  Generative linguistics separates out morphology and lexicon from syntactic grammar (cf Horrocks 1987 ch.1).  By contrast, systemic approach is to treat morphology as lowest-ranking grammar and lexis the most delicate grammar, and to specify structural fragments as realisations in paradigmatic contexts defined by the system network, through realisation statements and the blackboard (cf Halliday 1961; Matthiessen 1988).

1.5  Design parameters and principles for the interfaces

There are three primary desirable attributes for an interface between a grammar and its substantive realisation as sound or image, listed below.

  1. Stratal Realisational Principle (SRP): a strict stratificational model should be adhered to.  This principle means that the output module should make no meanings of its own, rather it should realise graphological structure, and that the output system itself should play no part in the building of lexico-grammatical or graphological structure.  In line with this principle, the strategy taken in this project was not to build a whole new meaning making resource ‘below’ or ‘beside’ the existing grammar; instead the original NIGEL lexico-grammar was extended.

  2. Stratal opacity principle (SOP): the interface should be opaque.  The output program should be able to function without ‘looking back’ into the grammar and the grammar should not be able to look into output program (nor should they need to do such ‘looking’).  All information should be passed in a predictable way.

  3. Stratal unidirectionality principle (SUP): the interface should be one-way - from the grammar to the output module.  The output module should not need to communicate anything to the grammar.

If an interface satisfies these principles then the ‘lower’ side of the interface, the synthesising process, should be describable as a procedural algorithm - it will not need to interact with other strata. All of the input to the module comes from the interface, so it should possible to write a computer program to do the output. The output modules are referred to as output programs, to emphasise the deterministic nature of the speech- and typography-synthesising processes.

These three principles have great practical value. They describe a maximally modular system. Their value lies in two areas.

  1. From a computational point of view modularity allows development of various components independently: modular systems are much more tractable than non-modular ones.

  2. From a systemic-theoretical viewpoint, modularity is equally desirable. Systemicists have been building modular theories since Halliday (1961), for many of the same reasons that computer systems designers build modular programs.  Indeed, no linguistic work could proceed without some sort of modular approach.  The systemic modularity is many-dimensional and functionally motivated; this design follows the theoretical modularity of stratification outlined above.

Clean interfaces

We might call an interface relationship that satisfies the first two points (SRP and SOP) above a “clean interface”.  The metaphor of cleanliness is used to give a sense of the high value to be placed on such interfaces.  Interfaces which are not clean tend to make for confusion and difficulty in engineering: it is very difficult to work with a system whose modules interact in an unconstrained way.  So, to the systems engineer, interfacial relationships which are not clean are ugly.

Text output is interpreted from the NIGEL program’s internal grammatical representation after the program has finished working out the function structure of a nascent utterance.  This situation is illustrated in figure 1.4; there is no clear distinction between the traversal program and an output module.  The output program is part of the grammatical traversal. The interface, complete with punctuation algorithm was added as an afterthought, without much theoretical guidance (Matthiessen, personal communication).

The old NIGEL output system could not be described as a clean interface since it violates the first two ‘clean’ principles.  Firstly the interface is not between strata, rather it is part of the grammatical traversal program.  Secondly, it is not opaque - the punctuating process ‘digs around’ in the grammar and the blackboard to find the information that is output as punctuation (thus violating the SOP). For example; the grammar is interrogated to establish whether a clause is followed by another in a clause complex, and whether it is interrogative.  While it is essentially one way - with the punctuation system not affecting the process of grammatical traversal - thus satisfying the SUP, the old output system is not clean.  In fact it is poorly documented and difficult to understand, modify and maintain.  This thesis outlines the general form of the interface that is to replace it.

graphics2

FIGURE 1.4
‘Before & after’ pictures of NIGEL’s output interface.

There is a one-to-one correspondence between any given internal lexico-grammatical structure and a derived punctuation.  There are no systems in the lexico-grammar itself which are sensitive to the way something is to be punctuated - thus the meaning-making potential of punctuation is not being exploited.  This may seem a trivial point, but it has theoretical repercussions which are far reaching.  Punctuation is discussed in chapter three.

If a phonological output program for NIGEL which worked the same way as the current text output program were to be added, then there would be a similar one-to-one mapping: each lexico-grammatical product would produce one phonetic product.  NIGEL is a grammar of written English, and as such it does not produce those grammatical structures which can be directly realized as phonological prosodies.  A phonological output program added on in this way would have two deficiencies: a lack of the relevant lexico-grammatical structures; and a ‘dirty’ interface.

This thesis describes the work that has been done in extending the NIGEL lexico-grammar by adding both a phonological and a graphological stratum, and designing a clean interface between the lexico-grammar and an output program.

Should there be one interface or two?

There will be two interfaces, specialised for the two modes of output. The two will both share some of the extensions to the grammar - this is because both phonology and graphology express some of the same structure generated by phono-grammatical systems, notably the wording.  Examples of the similarities and differences are plentiful later in the thesis.  The picture of the proposed ‘clean’ NIGEL architecture is in Figure 1.4.

1.6  Prosody

Two of the main concerns of this thesis, intonation and punctuation, both organise language prosodically. In the terms of this thesis prosody is any supra-segmental mode of linguistic organisation, approach that is in the tradition of Firth (1948) and Halliday; Palmer (1970) commentating on the work of J.R. Firth puts it that (paraphrased): prosody is anything that is not segmental.  

The approach taken here differs from the concern of the generative phonology school with supra-segmental organisation: there are some aspects of syntagmatic organisation which are common to a number of segmental units, e.g. Tone (Kenstowicz & Kisseberth 1979 provide an introduction).  The auto-segmental phonology school tends to pay a lot of attention to the way that prosodies are tied to segmental units, and their phonology has a segmentation in the supra-segmental tier.  In contrast to this, the model adopted here takes a more pragmatic (in the non-linguistic sense) approach to supra-segmental organisation.  There is no theoretic emphasis on the way in which prosodies are ‘tied’ to segmental strings; that is left as an engineering problem.  All of the rule-based procedural phonology is left to that component of the system which does not make any meanings - the output program.

For the purposes of this thesis the notion of prosody needs to include writing as well as speaking.  Firth (1948) described the question mark as a prosody: although it is realised by a particle the question really belongs (typically) to a whole sentence.  It is possible to conceive of other syntagmatic ways of realising this prosodic ‘question-ness’, for example one could draw blue circles around questions - and red and green circles respectively around sentences which would have been marked with full-stops or exclamation points (cf. the Spanish ‘upside-down’ question mark which comes before questions, in addition to the “?” that comes at the end).

Firth (1948) used the term piece to refer to text spans of any length.  Thus one can talk of letters, words, syllables, phonemes or sentences as pieces of text.  The term will be used in this thesis in the same way, for spans of both written and spoken language.

1.7  Generality

It was stated above that the interfaces to phonological and graphological output engines are to be as general as possible.  The intention is to design interfaces which are capable of the greatest possible extension.  A very general graphological interface might be designed for the realisation of lexico-grammatical structure in different fonts for example, while a very general phonological interface might be able to specify voice quality.  Neither of these possibilities is to be realised in the immediate future - but it is considered important not to build the system in such a way that it could not handle these things at a later date.

There are two dimensions along which the project is not designed to be general. Neither (i) cross-linguistic generality nor (ii) cross-theoretical generality is claimed for this work.

  1. There is on-going work on a German version of NIGEL at GMD/IPSI, Germany and within the multi-language generation project at The University of Sydney work has started on Chinese and Japanese NIGEL grammars.  In the mid 1980s a NIGEL compatible grammar of Japanese was developed by John Bateman and others at Kyoto University in Japan (see Bateman et al, 1987, Bateman & Matthiessen, in press). Work will start on other languages soon. We are now in a position to extend the work on interfaces described in this thesis to these projects.

  2. There is no generality sought across different linguistic theories. There are some things built into the interface which are specific to a Hallidayian English grammar. An example of this is the Given/New structure which is described in the next chapter.

Paralinguistic phenomena

While it is well known that punctuation is linguistic, there are other modes of graphetic organisation, which have not been treated linguistically.  Halliday (1985.b p.30) outlines the difference between prosodic linguistic and paralinguistic features for spoken language:

The difference between… [prosodic and paralinguistic features] …is as follows.  Prosodic features are part of the linguistic system; they carry systematic contrasts in meaning, just like other resources in the grammar, and what distinguishes them from these other resources is that they spread across extended portions of speech, like an intonation contour, for example.  Paralinguistic features also extend over stretches of varying length; but they are not systematic - they are not part of the grammar, but rather additional variations by which the speaker signals the import of what he is saying.

This is a global perspective on what is systemic and what is not - it is impossible to systematise paralinguistic features in a very general way.  Halliday lists such features as “tambre (breathy, creaky etc. voice qualities), tempo, loudness, facial and bodily gestures” as paralinguistic.  However, from a local perspective these may be quite systematic.

A richest possible phonology

People make use of ‘paralinguistic’ features in a systemic way: all of the paralinguistic aspects of speaking listed above can be used systematically.  However, in a global sense there is no ‘standard’ grammar which would adequately explain the way that, for example, variations in tempo are used by speakers.  A very general phonological component in a generation system might allow such control by operating in a locally systemic way.

A richest possible graphology

The above quote comparing the prosodic and paralinguistic comes from a section in Halliday (1985.b) entitled “What writing leaves out”, the implication being that there are fewer prosodic aspects of writing than there are of speaking.  However Halliday does go on to describe the prosodic function of punctuation.  This thesis will explain how punctuation can be produced by the NIGEL system in the same way as intonation.  It will try to ‘push back’ the boundary between the prosodic and the paralinguistic.  Paralinguistic aspects of written text include its spatial organisation and the type-faces that it is printed in; since these can be shown to be systematic they can be treated as prosodic, the model developed here allows these locally-systemic prosodies to be incorporated in the generation system.

The major source of examples in this thesis is one text, the liner notes to a compact disc - James Brown’s In the Jungle Groove.  It  is reproduced in figure 1.5.  In this text major shifts in the discourse structure are signalled with large capitals at the beginning of paragraphs.  As well as the large-scale typographic distinctions, there are some systematic distinctions made between sub-sentence units: proper names have three different realisations, song names are in italics, album names in all-caps, and other names have the standard initial capitalisation.

While there is some literature on the function of paralinguistic graphetic features it tends to be quite anecdotal.  Crystal & Davy (1969) talk about typography and punctuation in terms of genre; Cummings and Simmons (1983) look at the graphetic organisation of literature; see also a number of articles in “Visible Language” issues  XX 1 and XXXIII 1, notably Berry (1989).  For example, type-face choice is often cited as systematic, but it is typically described very vaguely (cf Hofstadter 1983, Knuth 1982).  The NIGEL system provides a good place to start developing and testing models of typographic systems, capable of signalling what Halliday (1985.b p.30 quoted above) described as  “the import of what he is saying”.  Although typography is a paralinguistic matter in a global sense it will be shown in chapter three that it can be seen as a prosodic linguistic feature in a local sense: that is, typography can be described systematically for groups of texts, and the description of typography can be done in the same terms as the description of punctuation.

The recent introduction of typographic systems to small computers, effectively giving users the physical resources of a typesetter and an art-department, has been a significant inspiration to this work.  These typographic systems allow specification of typography in formal codes.  The codes essentially provide a discrete state interface between typography and users.  Such an interface make it possible to design a very general graphological stratum which can, to return to the Jungle Groove example, specify that all song names are to be realised in italics and that major shifts in a text are to be marked off with a large capital letter.

The non-discrete and complicated nature of acoustic phonetics does not lend itself to computational control in such a way; instead it is very hard to engineer understandable speech synthesis with even the rudiments of intonation.  This means that the graphological systems in NIGEL can be to some extent far in advance of the phonological.  While speech synthesis technology is a long way from allowing us to specify voice quality and volume in a discrete language we can specify font-choice, size, and case.  There is unfortunately not enough room in this thesis to dwell on the ways that choices in font-choice and voice quality are used systematically; the focus is on how those choices may be represented.

There are some modes of graphetic organisation which will remain in the domain of the paralinguistic: the examples given in figure 1.6 show an advertisement, in which one letter appears in a different type-face to the rest of its word; and two versions of Carroll’s “the mouses tail”.  In both of these, while an intuitive understanding of the meaning of the graphetic organisation is obvious, the iconicity of the tail, and the ‘difference’ of the letter “i”, there is no underlying system by which any poem, or any word could receive the same graphetic realisation.  This is in contrast to the In The Jungle Groove text, in which there is a graphetic, typographic system.

Figure 1.5
The Liner-notes to James Brown’s “In the Jungle Groove”.

Figure 1.6
Three examples of paralinguistic graphetic organisation

1.8  Other similar work

Bateman and Mann (1988) proposed a similar project to this one; looking at “the abstract specification of intonation”, the most obvious difference being the attention paid to graphology as well as phonology by this thesis.  The proposal was not implemented, due to lack of funding.  Silverman (1987) provides a good survey of more general work on intonation in speech synthesis.

1.9  Outline of the thesis

In summary of sections 1.1 to 1.8, this thesis could be described in the terms of a problem and its solution.  The problem is essentially a three part lack in the old PENMAN system:

  1. The old version of NIGEL has inadequate system architecture; that is, it does not have a clean interface between graphology and an output program.

  2. In its original form NIGEL has no phonological output system.

  3. The old NIGEL system also lacks some lexico-grammatical systems.  There are no systems for making meanings through anything but wording, no systems for meaningful intonation, and no systems for the meaningful manipulation of punctuation or typography.

Restatements of these general problems, and their solutions, are embedded in the rest of the thesis.

The second chapter discusses Hallidayian phonology, and what is involved in adding phonology to the lexico-grammar, as well as looking briefly at the speech synthesiser and specifying the form that phonological output will take.  Chapter three follows with a description of the graphology.  There was no model that could simply be adopted, so the graphological work is my own synthesis of Halliday’s (1985.b) and Waller’s (1980) with standard systemic theory.  The discussion here concentrates on describing only the graphological system, without looking at the lexico-grammar.

Chapters four and five discuss the interfaces themselves, graphological and phonological respectively.  Both interfaces use the same representational formalism.  The focus in the graphological chapter is on the interfacial code itself, and with actual examples of typography produced from it.  The focus in chapter five is on an overview of the speech synthesiser.  The conclusion (chapter six) presents the agenda for further development, in coding new parts of NIGEL and some theoretical considerations.

can
I  
help you
Mood
Residue
Finite
Subject

All three meta-functions build structure in the blackboard: as well as the interpersonal structure of mood, there is the transitivity structure, which shows the roles of participants, circumstances, and the process-type of the clause; and the theme/rheme structure, which shows the textual structure of the clause.  A full discussion of these aspects of clause structure can be found in Halliday (1985.a).

When the structure, complete with lexical items, has been generated it must be output. If the architecture of the system is to be strictly stratal then this should be done by a process of realisation.  The architecture of the old NIGEL system is not stratal, but it will be made so in the new version developed here; the description of this process of realisation is the subject of chapters two and three.

The traversal program repeatedly cycles through the grammatical network, starting at clause level, generating the gross structure of a clause, and then re-entering the network until everything is exhaustively described.

The placing of the interface with respect to NIGEL

The new output programs will be located, in terms of the vase picture, in the phonetic/graphetic stratum.  The nature of the realisational relationship between the ‘emic’ and the ‘etic’ strata will be clarified below in section 1.5.

Important differences with other theories

It is important not to make the mistake of thinking of systemic grammars as big fancy phrase structure rules (Matthiessen 1988). The mode of organisation of NIGEL is as a paradigmatic resource - it is organised according to meaning rather than syntagmatic distribution.

The grammar is more accurately described as a lexico-grammar. Perhaps a more accurate description still would be morpho-lexico-grammar, as the lexicon and morphology are both handled in the grammar itself.  Thus systemic theory conflates parts which are separated and modularised in other theories.  Generative linguistics separates out morphology and lexicon from syntactic grammar (cf Horrocks 1987 ch.1).  By contrast, systemic approach is to treat morphology as lowest-ranking grammar and lexis the most delicate grammar, and to specify structural fragments as realisations in paradigmatic contexts defined by the system network, through realisation statements and the blackboard (cf Halliday 1961; Matthiessen 1988).

1.5  Design parameters and principles for the interfaces

There are three primary desirable attributes for an interface between a grammar and its substantive realisation as sound or image, listed below.

  1. Stratal Realisational Principle (SRP): a strict stratificational model should be adhered to.  This principle means that the output module should make no meanings of its own, rather it should realise graphological structure, and that the output system itself should play no part in the building of lexico-grammatical or graphological structure.  In line with this principle, the strategy taken in this project was not to build a whole new meaning making resource ‘below’ or ‘beside’ the existing grammar; instead the original NIGEL lexico-grammar was extended.

  2. Stratal opacity principle (SOP): the interface should be opaque.  The output program should be able to function without ‘looking back’ into the grammar and the grammar should not be able to look into output program (nor should they need to do such ‘looking’).  All information should be passed in a predictable way.

  3. Stratal unidirectionality principle (SUP): the interface should be one-way - from the grammar to the output module.  The output module should not need to communicate anything to the grammar.

If an interface satisfies these principles then the ‘lower’ side of the interface, the synthesising process, should be describable as a procedural algorithm - it will not need to interact with other strata. All of the input to the module comes from the interface, so it should possible to write a computer program to do the output. The output modules are referred to as output programs, to emphasise the deterministic nature of the speech- and typography-synthesising processes.

These three principles have great practical value. They describe a maximally modular system. Their value lies in two areas.

  1. From a computational point of view modularity allows development of various components independently: modular systems are much more tractable than non-modular ones.

  2. From a systemic-theoretical viewpoint, modularity is equally desirable. Systemicists have been building modular theories since Halliday (1961), for many of the same reasons that computer systems designers build modular programs.  Indeed, no linguistic work could proceed without some sort of modular approach.  The systemic modularity is many-dimensional and functionally motivated; this design follows the theoretical modularity of stratification outlined above.

Clean interfaces

We might call an interface relationship that satisfies the first two points (SRP and SOP) above a “clean interface”.  The metaphor of cleanliness is used to give a sense of the high value to be placed on such interfaces.  Interfaces which are not clean tend to make for confusion and difficulty in engineering: it is very difficult to work with a system whose modules interact in an unconstrained way.  So, to the systems engineer, interfacial relationships which are not clean are ugly.

Text output is interpreted from the NIGEL program’s internal grammatical representation after the program has finished working out the function structure of a nascent utterance.  This situation is illustrated in figure 1.4; there is no clear distinction between the traversal program and an output module.  The output program is part of the grammatical traversal. The interface, complete with punctuation algorithm was added as an afterthought, without much theoretical guidance (Matthiessen, personal communication).

The old NIGEL output system could not be described as a clean interface since it violates the first two ‘clean’ principles.  Firstly the interface is not between strata, rather it is part of the grammatical traversal program.  Secondly, it is not opaque - the punctuating process ‘digs around’ in the grammar and the blackboard to find the information that is output as punctuation (thus violating the SOP). For example; the grammar is interrogated to establish whether a clause is followed by another in a clause complex, and whether it is interrogative.  While it is essentially one way - with the punctuation system not affecting the process of grammatical traversal - thus satisfying the SUP, the old output system is not clean.  In fact it is poorly documented and difficult to understand, modify and maintain.  This thesis outlines the general form of the interface that is to replace it.

graphics2

FIGURE 1.4
‘Before & after’ pictures of NIGEL’s output interface.

There is a one-to-one correspondence between any given internal lexico-grammatical structure and a derived punctuation.  There are no systems in the lexico-grammar itself which are sensitive to the way something is to be punctuated - thus the meaning-making potential of punctuation is not being exploited.  This may seem a trivial point, but it has theoretical repercussions which are far reaching.  Punctuation is discussed in chapter three.

If a phonological output program for NIGEL which worked the same way as the current text output program were to be added, then there would be a similar one-to-one mapping: each lexico-grammatical product would produce one phonetic product.  NIGEL is a grammar of written English, and as such it does not produce those grammatical structures which can be directly realized as phonological prosodies.  A phonological output program added on in this way would have two deficiencies: a lack of the relevant lexico-grammatical structures; and a ‘dirty’ interface.

This thesis describes the work that has been done in extending the NIGEL lexico-grammar by adding both a phonological and a graphological stratum, and designing a clean interface between the lexico-grammar and an output program.

Should there be one interface or two?

There will be two interfaces, specialised for the two modes of output. The two will both share some of the extensions to the grammar - this is because both phonology and graphology express some of the same structure generated by phono-grammatical systems, notably the wording.  Examples of the similarities and differences are plentiful later in the thesis.  The picture of the proposed ‘clean’ NIGEL architecture is in Figure 1.4.

1.6  Prosody

Two of the main concerns of this thesis, intonation and punctuation, both organise language prosodically. In the terms of this thesis prosody is any supra-segmental mode of linguistic organisation, approach that is in the tradition of Firth (1948) and Halliday; Palmer (1970) commentating on the work of J.R. Firth puts it that (paraphrased): prosody is anything that is not segmental.  

The approach taken here differs from the concern of the generative phonology school with supra-segmental organisation: there are some aspects of syntagmatic organisation which are common to a number of segmental units, e.g. Tone (Kenstowicz & Kisseberth 1979 provide an introduction).  The auto-segmental phonology school tends to pay a lot of attention to the way that prosodies are tied to segmental units, and their phonology has a segmentation in the supra-segmental tier.  In contrast to this, the model adopted here takes a more pragmatic (in the non-linguistic sense) approach to supra-segmental organisation.  There is no theoretic emphasis on the way in which prosodies are ‘tied’ to segmental strings; that is left as an engineering problem.  All of the rule-based procedural phonology is left to that component of the system which does not make any meanings - the output program.

For the purposes of this thesis the notion of prosody needs to include writing as well as speaking.  Firth (1948) described the question mark as a prosody: although it is realised by a particle the question really belongs (typically) to a whole sentence.  It is possible to conceive of other syntagmatic ways of realising this prosodic ‘question-ness’, for example one could draw blue circles around questions - and red and green circles respectively around sentences which would have been marked with full-stops or exclamation points (cf. the Spanish ‘upside-down’ question mark which comes before questions, in addition to the “?” that comes at the end).

Firth (1948) used the term piece to refer to text spans of any length.  Thus one can talk of letters, words, syllables, phonemes or sentences as pieces of text.  The term will be used in this thesis in the same way, for spans of both written and spoken language.

1.7  Generality

It was stated above that the interfaces to phonological and graphological output engines are to be as general as possible.  The intention is to design interfaces which are capable of the greatest possible extension.  A very general graphological interface might be designed for the realisation of lexico-grammatical structure in different fonts for example, while a very general phonological interface might be able to specify voice quality.  Neither of these possibilities is to be realised in the immediate future - but it is considered important not to build the system in such a way that it could not handle these things at a later date.

There are two dimensions along which the project is not designed to be general. Neither (i) cross-linguistic generality nor (ii) cross-theoretical generality is claimed for this work.

  1. There is on-going work on a German version of NIGEL at GMD/IPSI, Germany and within the multi-language generation project at The University of Sydney work has started on Chinese and Japanese NIGEL grammars.  In the mid 1980s a NIGEL compatible grammar of Japanese was developed by John Bateman and others at Kyoto University in Japan (see Bateman et al, 1987, Bateman & Matthiessen, in press). Work will start on other languages soon. We are now in a position to extend the work on interfaces described in this thesis to these projects.

  2. There is no generality sought across different linguistic theories. There are some things built into the interface which are specific to a Hallidayian English grammar. An example of this is the Given/New structure which is described in the next chapter.

Paralinguistic phenomena

While it is well known that punctuation is linguistic, there are other modes of graphetic organisation, which have not been treated linguistically.  Halliday (1985.b p.30) outlines the difference between prosodic linguistic and paralinguistic features for spoken language:

The difference between… [prosodic and paralinguistic features] …is as follows.  Prosodic features are part of the linguistic system; they carry systematic contrasts in meaning, just like other resources in the grammar, and what distinguishes them from these other resources is that they spread across extended portions of speech, like an intonation contour, for example.  Paralinguistic features also extend over stretches of varying length; but they are not systematic - they are not part of the grammar, but rather additional variations by which the speaker signals the import of what he is saying.

This is a global perspective on what is systemic and what is not - it is impossible to systematise paralinguistic features in a very general way.  Halliday lists such features as “tambre (breathy, creaky etc. voice qualities), tempo, loudness, facial and bodily gestures” as paralinguistic.  However, from a local perspective these may be quite systematic.

A richest possible phonology

People make use of ‘paralinguistic’ features in a systemic way: all of the paralinguistic aspects of speaking listed above can be used systematically.  However, in a global sense there is no ‘standard’ grammar which would adequately explain the way that, for example, variations in tempo are used by speakers.  A very general phonological component in a generation system might allow such control by operating in a locally systemic way.

A richest possible graphology

The above quote comparing the prosodic and paralinguistic comes from a section in Halliday (1985.b) entitled “What writing leaves out”, the implication being that there are fewer prosodic aspects of writing than there are of speaking.  However Halliday does go on to describe the prosodic function of punctuation.  This thesis will explain how punctuation can be produced by the NIGEL system in the same way as intonation.  It will try to ‘push back’ the boundary between the prosodic and the paralinguistic.  Paralinguistic aspects of written text include its spatial organisation and the type-faces that it is printed in; since these can be shown to be systematic they can be treated as prosodic, the model developed here allows these locally-systemic prosodies to be incorporated in the generation system.

The major source of examples in this thesis is one text, the liner notes to a compact disc - James Brown’s In the Jungle Groove.  It  is reproduced in figure 1.5.  In this text major shifts in the discourse structure are signalled with large capitals at the beginning of paragraphs.  As well as the large-scale typographic distinctions, there are some systematic distinctions made between sub-sentence units: proper names have three different realisations, song names are in italics, album names in all-caps, and other names have the standard initial capitalisation.

While there is some literature on the function of paralinguistic graphetic features it tends to be quite anecdotal.  Crystal & Davy (1969) talk about typography and punctuation in terms of genre; Cummings and Simmons (1983) look at the graphetic organisation of literature; see also a number of articles in “Visible Language” issues  XX 1 and XXXIII 1, notably Berry (1989).  For example, type-face choice is often cited as systematic, but it is typically described very vaguely (cf Hofstadter 1983, Knuth 1982).  The NIGEL system provides a good place to start developing and testing models of typographic systems, capable of signalling what Halliday (1985.b p.30 quoted above) described as  “the import of what he is saying”.  Although typography is a paralinguistic matter in a global sense it will be shown in chapter three that it can be seen as a prosodic linguistic feature in a local sense: that is, typography can be described systematically for groups of texts, and the description of typography can be done in the same terms as the description of punctuation.

The recent introduction of typographic systems to small computers, effectively giving users the physical resources of a typesetter and an art-department, has been a significant inspiration to this work.  These typographic systems allow specification of typography in formal codes.  The codes essentially provide a discrete state interface between typography and users.  Such an interface make it possible to design a very general graphological stratum which can, to return to the Jungle Groove example, specify that all song names are to be realised in italics and that major shifts in a text are to be marked off with a large capital letter.

The non-discrete and complicated nature of acoustic phonetics does not lend itself to computational control in such a way; instead it is very hard to engineer understandable speech synthesis with even the rudiments of intonation.  This means that the graphological systems in NIGEL can be to some extent far in advance of the phonological.  While speech synthesis technology is a long way from allowing us to specify voice quality and volume in a discrete language we can specify font-choice, size, and case.  There is unfortunately not enough room in this thesis to dwell on the ways that choices in font-choice and voice quality are used systematically; the focus is on how those choices may be represented.

There are some modes of graphetic organisation which will remain in the domain of the paralinguistic: the examples given in figure 1.6 show an advertisement, in which one letter appears in a different type-face to the rest of its word; and two versions of Carroll’s “the mouses tail”.  In both of these, while an intuitive understanding of the meaning of the graphetic organisation is obvious, the iconicity of the tail, and the ‘difference’ of the letter “i”, there is no underlying system by which any poem, or any word could receive the same graphetic realisation.  This is in contrast to the In The Jungle Groove text, in which there is a graphetic, typographic system.

Figure 1.5
The Liner-notes to James Brown’s “In the Jungle Groove”.

Figure 1.6
Three examples of paralinguistic graphetic organisation

1.8  Other similar work

Bateman and Mann (1988) proposed a similar project to this one; looking at “the abstract specification of intonation”, the most obvious difference being the attention paid to graphology as well as phonology by this thesis.  The proposal was not implemented, due to lack of funding.  Silverman (1987) provides a good survey of more general work on intonation in speech synthesis.

1.9  Outline of the thesis

In summary of sections 1.1 to 1.8, this thesis could be described in the terms of a problem and its solution.  The problem is essentially a three part lack in the old PENMAN system:

  1. The old version of NIGEL has inadequate system architecture; that is, it does not have a clean interface between graphology and an output program.

  2. In its original form NIGEL has no phonological output system.

  3. The old NIGEL system also lacks some lexico-grammatical systems.  There are no systems for making meanings through anything but wording, no systems for meaningful intonation, and no systems for the meaningful manipulation of punctuation or typography.

Restatements of these general problems, and their solutions, are embedded in the rest of the thesis.

The second chapter discusses Hallidayian phonology, and what is involved in adding phonology to the lexico-grammar, as well as looking briefly at the speech synthesiser and specifying the form that phonological output will take.  Chapter three follows with a description of the graphology.  There was no model that could simply be adopted, so the graphological work is my own synthesis of Halliday’s (1985.b) and Waller’s (1980) with standard systemic theory.  The discussion here concentrates on describing only the graphological system, without looking at the lexico-grammar.

Chapters four and five discuss the interfaces themselves, graphological and phonological respectively.  Both interfaces use the same representational formalism.  The focus in the graphological chapter is on the interfacial code itself, and with actual examples of typography produced from it.  The focus in chapter five is on an overview of the speech synthesiser.  The conclusion (chapter six) presents the agenda for further development, in coding new parts of NIGEL and some theoretical considerations.