[MEI-L] syllable connectors

Thu Jul 10 23:38:53 CEST 2014

Hi Craig,

Let’s get rid of the (relatively) easy stuff first --

1. The position of a syllable with a word is handled by syl/@wordpos, where the value can be ‘i’, ‘m’, or ‘t’.  There probably should also be a value for ‘complete word’ as well, ‘w’ perhaps?

2. There was a typo in the elision example.  It should’ve been --

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>Dios</syl>
    <syl con="elided" >que˘</syl>
    <syl>al</syl>
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

3. Since @n is restricted to a single NMTOKEN, use verse/@label to capture multiple verse numbers:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse label="1.,2.,6">
    <syl>Dios</syl>
    <syl con="elided" >que˘</syl>
    <syl>al</syl>
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

Now, for the real “meat”.  We can’t forget that one of the functions of MEI, if not the primary one, is to represent existing documents.  The reality is that real-world documents are going to contain errors and inconsistencies that have to be dealt with.  If MEI could mandate the “correct” way of doing things, as SCORE can, for example that divisions between syllables are always indicated by either a dash or an underscore, then life would be much easier.  What we need, however, is to accommodate multiple purposes -- recording what the document says, what it *ought to say*, and what it means (when we’re interested in that sort of thing).

What other XML representations typically do (and what MEI does in some other places) is to record what the document says as the content of elements and any necessary annotations of that content as embedded elements and/or attributes.  My proposed changes are an attempt to do that with <syl>.  In the case of the syllable “que˘” above, the elision marker is “in the text” of the document.  I would prefer to leave it there and record the fact that this syllable is elided with the next in the @con attribute.  Sure, the connector can be moved to an attribute, but there are several problems with that approach:

a. the ability to say anything about the visual aspect of the connector, for example that it’s bold, is lost;
b. omissions or errors can’t be explicitly indicated, for example that a connector isn’t present but ought to be or is there when it shouldn’t be;
c. it’s difficult (but not impossible) to handle multiple values when, for example multiple connectors are present (erroneously or not), as in ‘que-˘’.  The rules of SCORE and modern notation aren’t universally followed.  ☺

In other words, the markup <syl con=”separated”>wan-</syl> contains the same data as <syl con=”d”>wan</syl>.  The difference is simply where the data is.  BUT, as I indicated in the list above, the latter is much more restrictive regarding what data can appear.  Unless @con is also allowed to contain unrestricted CDATA, it will never be able to accommodate all the symbols that have been (or even could be) used to connect syllables.

I recognize that you’d like the data to be as regular as possible and so want to place the connector(s) in an attribute.  But what would you do with (c)?  Splitting the data by putting ‘que’ in <syl> and ‘-˘’ in @con (if such a thing were allowed) could still result in “something [going] wrong in some obscure case”, whether the splitting is done by a human or by a software agent (Mr. OMR).  In my opinion, it’s less dangerous to leave the data together in just one place.

I agree with you that putting "wan- - - - - - - - - - -" inside <syl> is not likely to be useful for (re-)rendering notation from the markup (and should probably be processed to be simply “wan-“), but it does preserve information about the original document.  Again, it’s less dangerous (and less work) to just leave it in place than to create the markup <syl con=”d”>wan</syl>, especially if you’re not interesting in (re-)rendering or reconstructing the text (i.e., prose), which might be the case if the creation of the markup is done in stages.

Under no circumstances is this change (putting the connector in the character data of <syl>) meant to take the place of intelligent, dynamic rendering using SCORE or any other processor.  For example, even when a processor encounters “wan- - - - - - - - - - -" it shouldn’t render this literally.  The size and number of dashes, underscores, etc. should be automagically calculated.  But this data could be useful to those dealing strictly with an image of the notation, and so shouldn’t be discarded.

--
p.

From: mei-l [mailto:mei-l-bounces+pdr4h=virginia.edu at lists.uni-paderborn.de] On Behalf Of Craig Sapp
Sent: Wednesday, July 09, 2014 5:39 PM
To: Music Encoding Initiative
Subject: Re: [MEI-L] syllable connectors

Hi Perry,

<<But now I believe it would be better to use @con to record the *function* of the connector and put the lyric transcription/visual rendition *inside* the syllable element itself as is done in many other places in MEI. >>
<< Repetitions of a connector, "wan - - - - - - - - - - -" for example, would be allowed inside <syl> so that no data is lost (well, except for the location of each dash), but could be compressed to a single dash for presentation purposes.>>

I am not too keen on placing the visual aspect of the lyrics text inside of <syl> CDATA since it is mixing the underlying prose content with its graphical presentation in the music.  The <syl> character data should only contain the prose of the text.  If the text extracted from the music should include "-  -  -  -" after the word "wan", then it should be in the <syl> character data; otherwise, it should not.

For text extraction from lyrics, I would want to know if the <syl> data is at the start, middle or end of word, so that I can extract the data segmented by words instead of syllables by adding spaces or not between the <syl> character data (primarily for searching purposes, but also for displaying as regular prose/verse).  It would be preferable if I do not have to delete any characters from the <syl> data when reconstructing the prose, since something will go wrong in some obscure case.

When two syllables of a word are separated by a long distance between two notes in a graphical score, multiple dashes are used.  If they layout of music changes, then the single/multiple dash display should change (automatically).  Hard-encoding of single/double dashes distinction is not very useful for manipulation of the layout unless you are intent on encoding the static layout of a specific edition.

As an aside: I often come across the reverse case when two syllables are too close to comfortably be separated by a hyphen, the hyphen should be dropped and the two syllables should be displayed as a single word.  I do not know any notation editor which handles this case, and I have to do it manually when necessary (attaching the word to a single note, and leaving the next note without a syllable).

SCORE always uses a dashed line between two syllables, and multiple dashes appear automatically as the line is extended.
The number, size and distance between the dashes is controllable on this line.  In other words SCORE does not use a character-encoding of a hyphen to display the word separators.  The same goes for word extenders which are not literally a sequence of underscores.

<<The text inside <syl> can be processed (using regular expression matching) to create any output needed, for example, the text "as-is" with hyphenated words (e.g., "wan-ton a-ban-don") or "joined-up" in a more poetic style (e.g., "wanton abandon"). >>

This is how the Humdrum representation for lyrics works.  In general it works well, but there are complications.  In particular when there is a hyphen between two syllables in prose, you need a way of indicating that it should remain.  I don't come across that much in lyrics, but I would encode the word "long-term" as two syllables:
"long--" and "-term", with the double hyphen indicating that when the lyrics are extracted from the music, the final prose should include a hyphen between those two syllables.  Such a system should be spelled out.  This system works well in 7-bit ASCII data, but I wonder if someone uses a strange or inconsistent unicode hyphen characters, what will happen?  Also, this would not be great if graphic-like display is used, for example "wan - - - - - - - - - - " could be compensated for in a regular expression, but only after discovering that someone was doing such a thing in the data, and would make the regular expression quite complicated for removing the extended hyphen.

Another complication when extracting text prose, is how am I to detect an elision character in the CDATA as you have pointed out so many of them occur in unicode? :-)  This seems to make a case for a functional elision tag which contains a optional attribute for how it should be rendered as character(s) for separating two syllables.

I don't understand this encoding which you can explain more:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>Dios</syl>
    <syl con="elided">que˘al</syl>
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

I would expect that the syl at con attribute describes how the current syllable connects to the following syllable, not an internal connector:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>Dios</syl>
    <syl con="elided">que</syl>
    <syl>al</syl>
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

Also remember a few months ago we were having problems on representing verse numbers (in rondeaux), such as , "1.,2.,6" for indicating that the line of music is for the 1st, 2nd and 6th verses.  How should this be encoded.  In most musical editors, this has to be treated as regular text with a space elision before the first syllable in the lyrics.

-=+Craig

On 9 July 2014 11:50, Roland, Perry D. (pdr4h) <pdr4h at eservices.virginia.edu<mailto:pdr4h at eservices.virginia.edu>> wrote:

Hi everybody,

I knew that eventually someone would trip over this.  :-)  And that we'd need to fix it.

This is another one of those places where the original purpose/form of MEI conflicts with later developments.  One of the original purposes of syl/@con was to allow a hand-encoder to mark the *function* of a syllable connector just by indicating just what they saw -- if the score contained a dash, the encoder would write <syl con="d"> and so on.  Another was to make it easier to convert existing representations into MEI.  For me, however, the main point was to get at the function of the individual syllable.

But now I believe it would be better to use @con to record the *function* of the connector and put the lyric transcription/visual rendition *inside* the syllable element itself as is done in many other places in MEI.  Consider for a moment --

"wan", "ton", and "wanton" are all English words.  The difference between the word "wan" followed by the word "ton" and the single word "wanton" divided syllabically is all in the connectors between the syllables.  For example:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>wan-</syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

Of course, in the following markup, because a connector is absent the difference is not discernible:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

But, if we allow @con to have a value of "none", we're really no better off because we still don't know which visual connector *ought* to be present or what its (supposed) purpose is.  The following is still semantically indistinguishable from the preceding example because the orthography of the word "wan" and that of the first syllable of "wanton" (without its hyphen) are the same thing:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl con="none">wan</syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

But, to record which connector *should* be present we can use <supplied>:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>wan<supplied>-</supplied></syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

Or use <gap> to record a missing connector without supplying one:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>wan<gap reason="missing hyphen"/></syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

Having put the connector *inside* <syl>, @con can be used to record the function of the connector:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl con="separated">wan<supplied>-</supplied></syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

Actually, I think I prefer @con to record info *about the syllable* since it's an attribute *of* the syllable.  The new values for @con (or for a new attribute if we want to keep @con around but deprecate it) could be "separated", "extended", "elided", and "unknown".  But I could be persuaded otherwise.

This also works in the (hopefully) more usual case when the connector is present but our favorite naïve encoder (Mr. OMR) can't (or doesn't want to) determine the function of the connector:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>wan-</syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

It would be better to have this info, of course, because depending on the rhythm of the vocal line and the prevailing notational style, a dash can be used for both separation and extension.  For example, when the first syllable is to be sung on multiple notes the markup could be:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl con="extended">wan-</syl>
    <syl>ton</syl>
  </verse>
  <verse>
    <syl>wan</syl>
    <syl>ton</syl>
  </verse>
</lyrics>

In fact, there could be (and often are) multiple dashes filling the space between the first and last notes of the melisma or just one depending on the source document or on the rendering processor (when the MEI is to be rendered).  The same thing occurs with the underscore separator.

But this kind of many-visual-representations-to-one-function situation is particularly acute when it comes to elision.  Various symbols have been used to indicate syllable elision -- breve, inverted breve, caron, circumflex, and tilde just to name a few.  The following example indicates an elision of "que" and "al":

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>Dios</syl>
    <syl con="elided">que˘al</syl>
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

But so does this:

<lyrics xmlns="http://www.music-encoding.org/ns/mei">
  <verse>
    <syl>Dios</syl>
    <syl con="elided">que^al</syl
    <syl>mun-</syl>
    <syl>do</syl>
  </verse>
</lyrics>

One could encounter any number of visual renditions indicating elision. And one should be able to use any appropriate Unicode or SMuFL code point for the connector.  SMuFL provides

      U+E550
      lyricsElisionNarrow
      Narrow elision

      U+E551
      lyricsElision
      Elision

      U+E552
      lyricsElisionWide
      Wide elision

      U+E553
      lyricsHyphenBaseline
      Baseline hyphen

      U+E554
      lyricsHyphenBaselineNonBreaking
      Non-breaking baseline hyphen

     (See the attached image or http://www.smufl.org/version/latest/range/lyrics/ for visual examples)

The text inside <syl> can be processed (using regular expression matching) to create any output needed, for example, the text "as-is" with hyphenated words (e.g., "wan-ton a-ban-don") or "joined-up" in a more poetic style (e.g., "wanton abandon").  Repetitions of a connector, "wan - - - - - - - - - - -" for example, would be allowed inside <syl> so that no data is lost (well, except for the location of each dash), but could be compressed to a single dash for presentation purposes.

I believe this will work better than the old system (it's clearer, no info is lost), but I'd like to hear other viewpoints.

--
p.

> -----Original Message-----
> From: mei-l [mailto:mei-l-bounces at lists.uni-paderborn.de<mailto:mei-l-bounces at lists.uni-paderborn.de>] On Behalf Of
> Christine Siegert
> Sent: Friday, July 04, 2014 9:58 AM
> To: Music Encoding Initiative
> Subject: Re: [MEI-L] syllable connectors
>
> Dear Johannes, dear list,
> The Sarti project agrees, too.
> All the best,
> Christine
>
>
> Prof. Dr. Christine Siegert
> Universität der Künste Berlin
> Fakultät Musik, Musikwissenschaft
> Fasanenstraße 1B
> D-10623 Berlin
>
> Tel.: +49 (0)30 3185 2318<tel:%2B49%20%280%2930%203185%202318>
> siegert at udk-berlin.de<mailto:siegert at udk-berlin.de>
> -----Ursprüngliche Nachricht-----
> From: Karen McAulay
> Sent: Friday, July 04, 2014 12:16 PM
> To: Music Encoding Initiative
> Subject: Re: [MEI-L] syllable connectors
>
> Yes!
>
> Best wishes
> Karen
>
> Dr. Karen McAulay
> Music and Academic Services Librarian
> +44 (0)141 270 8267<tel:%2B44%20%280%29141%20270%208267> (direct)
> K.McAulay at rcs.ac.uk<mailto:K.McAulay at rcs.ac.uk>
> -----Original Message-----
> From: mei-l [mailto:mei-l-bounces at lists.uni-paderborn.de<mailto:mei-l-bounces at lists.uni-paderborn.de>] On Behalf Of
> Johannes Kepper
> Sent: 04 July 2014 10:56
> To: Music Encoding Initiative
> Subject: [MEI-L] syllable connectors
>
> Dear MEI-Listeners,
>
> doing some manual coding of vocal music, we ran across a situation where
> the
> layout of the printed score did not allow to put in any separator (well,
> better connector) between two syllables of a word. The current list of
> allowed connectors does not have an explicit option of "no connector at
> all". Do we all agree that there should be one?
>
> Best,
> Johannes
>
> _______________________________________________
> mei-l mailing list
> mei-l at lists.uni-paderborn.de<mailto:mei-l at lists.uni-paderborn.de>
> https://lists.uni-paderborn.de/mailman/listinfo/mei-l
>
>
> _______________________________________________
> mei-l mailing list
> mei-l at lists.uni-paderborn.de<mailto:mei-l at lists.uni-paderborn.de>
> https://lists.uni-paderborn.de/mailman/listinfo/mei-l

_______________________________________________
mei-l mailing list
mei-l at lists.uni-paderborn.de<mailto:mei-l at lists.uni-paderborn.de>
https://lists.uni-paderborn.de/mailman/listinfo/mei-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-paderborn.de/pipermail/mei-l/attachments/20140710/267547b0/attachment.html>