14.4.3 A Three-way Alignment

The preceding encoding of the alignment of parallel passages from two texts requires that those texts and the alignment all be part of the same SGML document. If the texts are in separate documents, then additional <xptr> elements must be supplied, as discussed in section 14.2 . These external pointers may appear anywhere within the document, but if they are created solely for use in encoding links, they may for convenience be grouped within the <linkGrp> (or other grouping element that uses them for linking).

To demonstrate this facility, we consider how we might encode the alignments in an extract from Comenius' Orbis Sensualium Pictus .

IMAGE HERE

[The figure shows the page from the Orbis pictus of Comenius which is discussed in the text.]

Each topic covered in this work has three parts: a picture, a prose text in Latin describing the topic, and a carefully-aligned translation of the Latin into English, German or some other vernacular. Key terms in the two texts are typographically distinct, and are linked to the picture by numbers, which appear in the two texts and within the picture as well. [ see note 84 ]

 

First, we present the text portions. The English and Latin portions have been encoded as distinct <div> elements. Identifiers have been attached to each typographic line, but no other encoding added, to simplify the example.

<!-- English text -->

<div id=E98 lang=EN><head>The Study</head>
<p><seg id=E9801>The Study</seg>
<seg id=E9802>is a place</seg>
<seg id=E9803>where a Student,</seg>
<seg id=E9804>a part from men,</seg>
<seg id=E9805>sitteth alone,</seg>
<seg id=E9806>addicted to his Studies,</seg>
<seg id=E9807>whilst he readeth</seg>
<seg id=E9808>Books,</seg>
<!-- ... -->
</div>

<!-- Latin text -->

<div id=L98 lang=LA><head>Muséum</head>
<p><seg id=L9801>Museum</seg>
<seg id=L9802>est locus</seg>
<seg id=L9803>ubi Studiosus,</seg>
<seg id=L9804>secretus ab hominibus,</seg>
<seg id=L9805>solus sedet,</seg>
<seg id=L9806>Studiis deditus,</seg>
<seg id=L9807>dum lectitat</seg>
<seg id=L9808>Libros,</seg>
<!-- ... -->
</div>

 

Next we assume that we have stored a digitized image of the picture itself in some external entity we will call com98 (for further discussion of the handling of external images and graphics, see section 22.3 ). We further assume that we can address portions of this image as a two-dimensional co-ordinate space. The SPACE location method of the <xptr> element (discussed in section 6.6 above) can now be used to point to the whole picture and to two portions of it, one containing the picture of a student and the other of a book, as follows:

<xptr n='1' id=p981 doc=com98>
<xptr n='2' id=p982 doc=com98 from='space (2d) (75 5) (133 75)'>
<xptr n='3' id=p983 doc=com98 from='space (2d) (55 42) (90 60)'>
Note that each external pointer has its own unique identifier, in addition to the n attribute, which last holds the visible label (or `explainer') used for this image portion in the original.

 

As printed, the text exhibits three kinds of alignment.

  1. The English and Latin portions are printed in two parallel columns, with corresponding phrases, (represented above by <seg> elements), more or less next to each other.
  2. Particular words or phrases are marked as terms in the two languages by a change of rendition: the English text, which otherwise uses black letter type throughout, has the words `The Study', `a Student', `Studies', and `Books' in a roman font; in the Latin text, which is printed in roman, the corresponding words (`Museum', `Studiosus', `Studiis', and `Libros') are all in italic.
  3. Numbered labels appear within the text portions, linking keywords to each other and to sections of the picture. These labels, which have been left out of the above encoding, are attached to the first third and last segment in each language quoted below, and also appear (rather indistinctly) within the picture itself. If it is desired to transcribe them in the text, they might be encoded using as <ref> elements, <anchor> elements, or <xptr> s to the picture; the number itself would be transcribed as the value of the n attribute (or as the content of the <ref> ).

 

The first kind of alignment might be represented by using the corresp attribute on the <seg> element. The second kind might be represented by using the <gloss> and <term> mechanism described in section 6.3.4 . The third kind of alignment might be represented using pointers embedded within the texts, although this would involve some duplication. We choose however to use the <link> element, since this provides an efficient way of representing the three-way alignment between English, Latin and picture without redundancy.

<linkGrp type=alignment>
  <link targets='E9801 L9801 p981'>
  <link targets='E9802 L9802     '>
  <link targets='E9803 L9803 p982'>
  <link targets='E9804 L9804     '>
  <link targets='E9805 L9805     '>
  <link targets='E9806 L9806     '>
  <link targets='E9807 L9807     '>
  <link targets='E9808 L9808 p983'>
</linkGrp>

 

This map, of course, only aligns whole segments and image portions, since these are the only parts of our encoding which bear identifiers and can therefore be pointed to. To add to it the alignment between the typographically distinct words mentioned above, new elements must be defined, either within the text itself or externally by using the extended pointer mechanism. Encoding these word pairs as <term> and <gloss> , although intuitively obvious, requires a non-trivial decision as to whether the Latin text is glossing the English, or vice-versa. Tagging all the marked words as <term> avoids the difficult decision, but might be thought by some encoders to convey the wrong information about the words in question. Simply tagging them as additional embedded <seg> elements with identifiers that can be aligned like the others is also a possibility. All of these require the addition of further markup to the text. This may pose no problems, or it may be infeasible (e.g. if the text is held on a read-only medium). If it is not feasible to add more markup to the original text, the extended pointer mechanism is likely to be the best choice. For example, to indicate that the words `Studies' and `Studiis' correspond, two external pointers might be defined and aligned as follows:

        <xptr id=xt981  from='id (s E9806) token (4)'>
        <xptr id=xt982  from='id (s L9806) token (1)'>
        <link targets='xt981 xt982'>