Essay on automated music engraving: 1.4 Building software

[ << Music engraving ]	[Top][Contents][Index]	[ Literature list >> ]
[ < Getting things right ]	[ Up : Music engraving ]	[ Music representation > ]

1.4 Building software

This section describes some of the programming decisions that we made when designing LilyPond.

[ << Music engraving ]	[Top][Contents][Index]	[ Literature list >> ]
[ < Building software ]	[ Up : Building software ]	[ What symbols to engrave? > ]

Music representation

Ideally, the input format for any high-level formatting system is an abstract description of the content. In this case, that would be the music itself. This poses a formidable problem: how can we define what music really is? Instead of trying to find an answer, we have reversed the question. We write a program capable of producing sheet music, and adjust the format to be as lean as possible. When the format can no longer be trimmed down, by definition we are left with content itself. Our program serves as a formal definition of a music document.

The syntax is also the user-interface for LilyPond, hence it is easy to type:

{
  c'4 d'8
}

to create a quarter note on middle C (C1) and an eighth note on the D above middle C (D1).

On a microscopic scale, such syntax is easy to use. On a larger scale, syntax also needs structure. How else can you enter complex pieces like symphonies and operas? The structure is formed by the concept of music expressions: by combining small fragments of music into larger ones, more complex music can be expressed. For example

f'4

Simultaneous notes can be constructed by enclosing them with << and >>:

<<c4 d4 e4>>

This expression is put in sequence by enclosing it in curly braces { … }:

{ f4 <<c4 d4 e4>> }

The above is also an expression, and so it may be combined again with another simultaneous expression (a half note) using <<, \\, and >>:

<< g2 \\ { f4 <<c4 d4 e4>> } >>

Such recursive structures can be specified neatly and formally in a context-free grammar. The parsing code is also generated from this grammar. In other words, the syntax of LilyPond is clearly and unambiguously defined.

User-interfaces and syntax are what people see and deal with most. They are partly a matter of taste, and also the subject of much discussion. Although discussions on taste do have their merit, they are not very productive. In the larger picture of LilyPond, the importance of input syntax is small: inventing neat syntax is easy, while writing decent formatting code is much harder. This is also illustrated by the line-counts for the respective components: parsing and representation take up less than 10% of the source code.

When designing the structures used in LilyPond, we made some different decisions than are apparent in other software. Consider the hierarchical nature of music notation:

In this case, there are pitches grouped into chords that belong to measures, which belong to staves. This resembles a tidy structure of nested boxes:

Unfortunately, the structure is tidy because it is based on some excessively restrictive assumptions. This becomes apparent if we consider a more complicated musical example:

In this example, staves start and stop at will, voices jump around between staves, and the staves have different time signatures. Many software packages would struggle with reproducing this example because they are built on the nested box structure. With LilyPond, on the other hand, we have tried to keep the input format and the structure as flexible as possible.

[ << Music engraving ]	[Top][Contents][Index]	[ Literature list >> ]
[ < Music representation ]	[ Up : Building software ]	[ Flexible architecture > ]

What symbols to engrave?

The formatting process decides where to place symbols. However, this can only be done once it is decided what symbols should be printed – in other words, what notation to use.

Common music notation is a system of recording music that has evolved over the past 1000 years. The form that is now in common use dates from the early Renaissance. Although the basic form (i.e., note heads on a 5-line staff) has not changed, the details still evolve to express the innovations of contemporary notation. Hence, common music notation encompasses some 500 years of music. Its applications range from monophonic melodies to monstrous counterpoints for a large orchestra.

How can we get a grip on such a seven-headed beast, and force it into the confines of a computer program? Our solution is to break up the problem of notation (as opposed to engraving, i.e., typography) into digestible and programmable chunks: every type of symbol is handled by a separate module, a so-called plug-in. Each plug-in is completely modular and independent, so each can be developed and improved separately. Such plug-ins are called engravers, by analogy with craftsmen who translate musical ideas to graphic symbols.

In the following example, we start out with a plug-in for note heads, the Note_heads_engraver.

Then a Staff_symbol_engraver adds the staff,

the Clef_engraver defines a reference point for the staff,

and the Stem_engraver adds stems.

The Stem_engraver is notified of any note head coming along. Every time one (or more, for a chord) note head is seen, a stem object is created and connected to the note head. By adding engravers for beams, slurs, accents, accidentals, bar lines, time signature, and key signature, we get a complete piece of notation.

This system works well for monophonic music, but what about polyphony? In polyphonic notation, many voices can share a staff.

In this situation, the accidentals and staff are shared, but the stems, slurs, beams, etc., are private to each voice. Hence, engravers should be grouped. The engravers for note heads, stems, slurs, etc., go into a group called ‘Voice context’, while the engravers for key, accidental, bar, etc., go into a group called ‘Staff context’. In the case of polyphony, a single Staff context contains more than one Voice context. Similarly, multiple Staff contexts can be put into a single Score context. The Score context is the top level notation context.

Flexible architecture

When we started, we wrote the LilyPond program entirely in the C++ programming language; the program’s functionality was set in stone by the developers. That proved to be unsatisfactory for a number of reasons:

When LilyPond makes mistakes, users need to override formatting decisions. Therefore, the user must have access to the formatting engine. Hence, rules and settings cannot be fixed by us at compile-time but must be accessible for users at run-time.
Engraving is a matter of visual judgment, and therefore a matter of taste. As knowledgeable as we are, users can disagree with our personal decisions. Therefore, the definitions of typographical style must also be accessible to the user.
Finally, we continually refine the formatting algorithms, so we need a flexible approach to rules. The C++ language forces a certain method of grouping rules that cannot readily be applied to formatting music notation.

These problems have been addressed by integrating an interpreter for the Scheme programming language and rewriting parts of LilyPond in Scheme. The current formatting architecture is built around the notion of graphical objects, described by Scheme variables and functions. This architecture encompasses formatting rules, typographical style and individual formatting decisions. The user has direct access to most of these controls.

Scheme variables control layout decisions. For example, many graphical objects have a direction variable that encodes the choice between up and down (or left and right). Here you see two chords, with accents and arpeggios. In the first chord, the graphical objects have all directions down (or left). The second chord has all directions up (right).

The process of formatting a score consists of reading and writing the variables of graphical objects. Some variables have a preset value. For example, the thickness of many lines – a characteristic of typographical style – is a variable with a preset value. You are free to alter this value, giving your score a different typographical impression.

Formatting rules are also preset variables: each object has variables containing procedures. These procedures perform the actual formatting, and by substituting different ones, we can change the appearance of objects. In the following example, the rule governing which note head objects are used to produce the note head symbol is changed during the music fragment.

[ << Music engraving ]	[Top][Contents][Index]	[ Literature list >> ]
[ < What symbols to engrave? ]	[ Up : Building software ]	[ Putting LilyPond to work > ]

1.4 Building software

Music representation

What symbols to engrave?

See also

Flexible architecture

LilyPond — Essay on automated music engraving v2.24.4 (stable-branch).