agma_logo AGMA SCHWA

SCA++ Syntax

This page documents the syntax used by SCA++.

Overview

SCA++ uses three input boxes, one for phoneme classes, one for rules (= sound changes), and one for the words that the changes should be applied to.

General Remarks

Whitespace in class definitions and rules is ignored; for example,

a/b/_
a /                 b/    _

both of these rules are equivalent.

In the examples in this guide, remarks enclosed in brackets (), unless otherwise noted, are comments and serve only documentation purposes. They are not part of the syntax of SCA++.

Cheat Sheet

Basic Usage

t / d / _                 (ta > da)
p, t, kʷ / b, d, gʷ / _   (pa, ta, kʷa > ba, da, gʷa)
p, t, kʷ / b / _          (pa, ta, kʷa > ba, ba, ba)

Environment

t / d / _a   (ta, te > da, te)  
a / i / t_   (ta, da > ti, da)

Word boundaries

d / t / _#   (da, ad > da, at)  
d / t / #_   (da, ad > da, at)

Optional and negated elements

d / t / _(a)r   (der, dar, dr > der, tar, tr)  
d / t / _~[a]   (da, de > da, te)

Wildcards

d / t / _*r   (dr, dar, d桜r > dr, tar, t桜r)

Rules

There are five types of rules, each of which use a slightly different syntax.

Each rule must be on a single line.

Substitution Rules

Substitution rules have an input and output. Their syntax is:

input/output/context

NOTE: The first / can also be a > if that’s what you prefer. Semantically, there is no difference between the two.

Examples:

p / b / vowel_vowel         (p > b between vowels)
pp, tt, kk / p, t, k /_#    (word-final voiceless stops degeminate)

Input

The input of a substitution rule consists of comma-separated characters, classes, and combinations thereof.

Examples:

a               (matches the character `a`)        
abc             (matches the characters `abc` in a row)              
a, b, c         (matches either `a`, or `b`, or `c`)          
ad, bd, cd      (matches either `ad`, or `bd`, or `cd`)     
{ a, b, c }d    (first matches either `a`, `b`, or `b`, and then `d`)     
{ a, b, c }d, f (matches either the same as the previous line, or a single `f`) 

Note that lines 4 and 5 are equivalent. Both do the same thing, it’s just written differently.

The wildcard operator * may be used in place of a character and matches any one character.

Output

The output of a substitution rule is just like the input, except that:

  1. The output must contain the same number of elements as the input:

    a, b, c / e, f, g / _             (OK)
    { a, b, { c }} / e, { f, g } / _  (OK, classes are flattened)
    {a, b, c}d / abcd, fegh, i / _    (OK, how complex each element is doesn't matter)
    a, b / e, f, g / _                (Error: too many output elements)
    a, b, c / e, f / _                (Error: not enough output elements)

    The only exception to this is when you have exactly one output element, in which case you can have as many inputs as you like:

       a, b, c / d / _   (OK, `a`, `b`, and `c` all become `d`)
  2. The wildcard operator may not be used in the output (because what would that even mean?).

  3. Percentages can be used to introduce irregularity: an element may be replaced with a class that contains percentage-qualified elements:

    a, b, c / e, f, %{ 20%g, 40%h, r }

    More on percentages later on.

Context

The context is the same for all rules and determines what must precede or follow the input for a rule to apply to it. The syntax of the context is as follows:

  1. The context must contain exactly one underscore:

    a / b / _    (OK)
    a / b / ___  (OK, multiple consecutive underscores are treated as one)
    a / b /      (Error: no underscore in context)
    a / b / _c_  (Error: multiple non-consecutive underscores in context)  

    A context containing only an underscore means that the input will be replaced with the output wherever it occurs. For example, the rule

    a / b / _

    means ‘every occurrence of a is replaced with b’.

  2. The underscore may be preceded or followed by characters and classes. These indicate that the input muse be preceded or followed by those characters and classes for a substitution to take place. The characters and classes that are part of the context themselves are not replaced. For example, the rule

    a / b / c_d

    means ‘a becomes b between c and d, but c and d remain unchanged’.

  3. A # sign at the very beginning or end of whatever comes before or after the underscore indicates a word boundary. For example

    a / b / #_   (OK, a > b at the beginning of a word)
    a / b / _#   (OK, a > b at the end of a word)
    a / b / #_#  (OK, a > b if the word is ‘a’)
    a / b / _c#  (OK, a > b if followed by ‘c’ at the end of a word)
    a / b / _#c  (Error: characters after the end of a word are not allowed)
    a / b / c#_  (Error: characters before the beginning of a word are not allowed)
  4. The ~[] operator negates an element. This means a rule only applies if it doesn’t contain that element at that position.

    a / b / ~[c]_        (OK, a > b unless preceded by ‘c’)
    a / b / _~[c]        (OK, a > b unless followed by ‘c’)
    a / b / _~[{c, d}]e  (OK, a > b unless followed by ‘ce’, or ‘de’)
    a / b / ~[]_         (Error, ‘~[]’ must contain  an element)
  5. Brackets () may be used to indicate optional elements, which may, but need not, be present:

    a / b / _(c)e       (OK, a > b before ‘e’ or ‘ce’)
    a / b / _({c, d})e  (OK, a > b before ‘e’, or ‘ce’, or ‘de’)

Epenthesis Rules

Epenthesis Rules are just like substitution rules, except that they have no input, and their output may contain only one element:

/ a / b_       (OK, insert ‘a’ after every ‘b’)
/ e / #_s      (OK, insert ‘e’ before word-initial ‘s’)
/ a / _        (OK, insert ‘a’ absolutely everywhere (not recommended))
/ a, b / _     (Error: the output of an epenthesis rule may contain only one element)
/ {a, b}c / _  (Error: same as previous line, since this expands to `ac, bc`)

Deletion Rules

Deletion rules are the opposite of epenthesis rules: they have no output. However, their input may consist of more than one element:

a // b_    (OK, yeet ‘a’ before ‘b’)
e // _#    (OK, yeet word-final ‘e’)
a // _     (OK, yeet ‘a’ everywhere)
a, e // _  (OK, yeet ‘e’ and ‘a’ everywhere)
// _       (Error, empty deletion rule)

Again, whitespace doesn’t matter, so whether you use // or / / here is up to you.

Metathesis Rules

Metathesis rules are identified by their ‘output’ consisting of &. A metathesis rule reverses each input element. Diacritics remain attached to the preceding character:

st / & / _       (OK, ‘st’ becomes ‘ts’)
st, zd / & / _#  (OK, ‘st’ and ‘zd’ become ‘ts’ and ‘dz’ word-finally)
ɑ̃n̩e / & / s_     (OK, ‘ɑ̃n̩e’ becomes ‘en̩ɑ̃’ after ‘s’)

Reduplication Rules

Reduplication rules are identified by their ‘output’ consisting of one or more + signs. The input elements are repeated n times, where n is the number of + signs:

p, t, k / + / #_  (OK, geminate word-initial ‘p’, ‘t’, ‘k’)
s / ++++ / _      (OK, ‘s’ becomew ‘sssss’)
st / + / _        (OK, ‘st’ becomes ‘stst’)

Classes

Classes can be defined in the ‘Classes’ input box, in which case they are assigned a name and can be referred to by that name in rules and following definitions.

The syntax for a class definition is as follows:

class-name = { characters }

The class name consists of one or multiple characters and may contain any character that doesn’t have special meaning (like # or /).

The characters inside the class definition are sequences of characters that are separated by commas. You can also define classes in terms of other classes:

front               = { i, e }
back                = { u, o }
vowels              = { front, back }

In the example above, the classes front and back in the definition of vowels are expanded right then and there, yielding { i, e, u, o }.

Using Classes in Rules

Classes denote alternatives and normally simply expand to their containing elements. The following are all equivalent:

{a, b, c}
{{a, b, c}}
{{a, b}, c}
{a, {b, c}}
{{a}, b, c}
{a, {b, {c}}}
{ {{a}}, {{{{ b, {{c}} }}}} }

As we have done multiple times already, we can also use classes directly in a rule without assigning them a name first. For example, assuming vowels is defined as above, the rules below are equivalent:

ai, ae, au, ao / a, b, c, d / _
a{vowels} / a, b, c, d / _
a{i, e, u, o} / a, b, c, d / _ 

IMPORTANT: Class names must be separated from surrounding characters that do not have special meaning (like { or /) by an extra pair of {}. If you were to write avowels rather than a{vowels}, it would interpret avowels either as the name of a class, or, since we haven’t defined any class with that name, as the character sequence a v o w e l s.

For example, assuming we have the following class definitions:

FS  = { a, b }
SR  = { b, c }
FSR = { o, p, q }

We can use them as follows:

FSR    (equivalent to ‘{ o, p, q }’)
{FS}R  (equivalent to ’{ a, b }R’)
F{SR}  (equivalent to ’F{ b, c }’)

Definition Order

Class definitions are processed top to bottom. The following is valid, but does not do what you might think it does:

vowels       = { front, back }
front        = { i, e }
back         = { u, o }
vowels-or-q  = { vowels, q }       

In this case, vowels is defined in terms of front and back, but front and back are not defined yet and are just treated as the character sequences f r o n t and b a c k. The vowels class is thus equivalent to {f, r, o, n, t, b, a, c, k}

This is because a class definition is expanded as soon as it is encountered. Here’s another example. Consider the definition of vowels-or-q above. In it, we’re using the vowels class, which we defined in terms of front and back.

However, front and back in the definition of vowel will always have the meaning that they had at the time vowels was defined. This means that that vowels-or-t is NOT defined as { i, e, u, o, t }, but rather as {f, r, o, n, t, b, a, c, k, q}

This behaviour is necessary, because otherwise, the following might lead to complications:

a = { b }
b = { a }

If forward references to classes were allowed, this would lead to problems: in the example above, we would be defining a in terms of b, we’re defining in terms of a, which we’re defining in terms of b and so on. It would never stop.

This is why class definitions are processed in order. Doing so solves this problem: In the example above, the class a is defined as being a class containing only the character b. And the class b is then defined to be the same as the class a.

Operators

Due to the fact that classes are very similar to sets, we can apply set-theoretical operations to them to construct new classes.

The Difference Operator

The binary ~ operator is used to construct new classes by removing characters from a class. It’s left-hand side should be a class, but its right-hand side may be either a class or simply a character. Assuming FS, SR, and FSR are defined like so:

FS  = { a, b }
SR  = { b, c }
FSR = { o, p, q }

We then get:

FSR~o       (Equivalent to ‘{ p, q }’)
FSR~d       (No effect since ‘FSR’ doesn't contain ‘d’; same as ‘FSR’)
FSR~{o, p}  (Equivalent to ‘{q}’)
FS~SR       (Equivalent to ‘{a}’)

The reason why this is called the ‘difference’ operator is because, it computes set difference between two classes.

Other operators

A detailed explanation of all of these will be provided in the near future.

The * operator computes the cartesian product of two classes.
The + operator concatenates classes element by element.
The | operator computes the union of two classes.
The & operator computes the intersection of two classes

Grammar Specification

This section is intended as a formal specification of the syntax of SCA++. You probably want to skip it if you’re not a programmer.

Terminals are in all-caps and are not further elaborated on in here. See lib/parser.hh for a list of all tokens, which more or less correspond to the terminals.

<rule> ::= <substitution-rule>
         | <epenthesis-rule>
         | <deletion-rule>
         | <metathesis-rule>
         | <reduplication-rule>
         
<class-def>  ::= TEXT [ "=" ] <simple-el>

<substitution-rule>  ::= <input> SEPARATOR <output>    <context>
<epenthesis-rule>    ::=         SEPARATOR <output>    <context>
<deletion-rule>      ::= <input> SEPARATOR             <context>
<metathesis-rule>    ::= <input> SEPARATOR "&"         <context>
<reduplication-rule> ::= <input> SEPARATOR "+" { "+" } <context>

<input>      ::= <input-els>  { "," <input-els> }
<input-els>  ::= { <input-el> }+
<input-el>   ::= <simple-el> | "*"

<output>     ::= <output-els> { "," <output-els> }
<output-els> ::= { <output-el> }+
<output-el>  ::= <percent-alternatives> | <simple-el>
          
<context>    ::= SEPARATOR [ <ctx-els> ] { USCORE }+ [ <ctx-els> ] EOL
<ctx-els>    ::= <decorated-els> 

<decorated-els> ::= { <decorated-el> }+
<decorated-el>  ::= <simple-el> 
                  | <boundaries> 
                  |     "("  <decorated-els> ")"
                  | "~" "["  <decorated-els> "]"
<boundaries>    ::= { "#" | "$" }

<percent-alternatives> ::= PERCENTAGE <percent-class>           
<percent-class>        ::= "{" <percent-list> "}"
<percent-list>         ::= <percent-els> { "," <percent-els> }
<percent-els>          ::= [ PERCENTAGE ] { <percent-el> }+
<percent-el>           ::= ( TEXT | <percent-class> )  
                             
<simple-el>            ::= TEXT | <simple-el-class>
<simple-el-class>      ::= <simple-el-class-lit> { <set-op> <simple-el-rhs> }
<simple-el-rhs>        ::= <simple-el-class-lit> | TEXT
<simple-el-class-lit>  ::= CLASS-NAME | "{" <simple-el-list> "}"
<simple-el-list>       ::= <simple-els> { "," <simple-els> }
<simple-els>           ::= { <simple-el> }+
<set-op>               ::= "~" | "&" | "*" | "|"
For more information, see the YouTube channel Agma Schwa