<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
<!ENTITY info-references SYSTEM "informative-references.xml">
<!ENTITY norm-references SYSTEM "normative-references.xml">
<!ENTITY grammar SYSTEM "sm-escaped.abnf">
<!ENTITY tcpSM SYSTEM "TCP.sm">
<!ENTITY eppSM SYSTEM "EPP.sm">
<!ENTITY dccpSM SYSTEM "DCCP.sm">
]>
<rfc ipr="full3978" docName="draft-bortzmeyer-language-state-machines-02-BETA" category="std">
<?rfc toc="yes"?>
<?rfc strict="yes"?> 
<front>
<title abbrev="Cosmogol">Cosmogol: a language to describe finite state machines</title>
<author fullname="Stephane Bortzmeyer" initials="S.B." surname="Bortzmeyer">
<organization>AFNIC</organization>
<address><postal><street>Immeuble International</street><code>78181</code><city>Saint-Quentin-en-Yvelines</city><country>France</country></postal> <phone>+33 1 39 30 83 46</phone><email>bortzmeyer+ietf@nic.fr</email><uri>http://www.afnic.fr/</uri></address>
</author>
<date month="November" year="2006"/>

<abstract><t>Several RFCs contain a state machine to describe a
protocol. There is no standard way of describing such a machine, the
most common way being an ASCII-art diagram. This document specifies an
other solution: a domain-specific language for finite state
machines. It allows state machine descriptions to be automatically
checked and may be translated into other formats. Its purpose is to
provide a stable reference for RFCs which use this
mini-language.</t></abstract>

</front>

<middle>

<section title="Requirements notation">
            <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
            "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
            and "OPTIONAL" in this document are to be interpreted as
            described in <xref target="RFC2119"/>.</t>
</section>

<section title="Introduction" anchor="introduction">

<t>One can find finite state machines, for instance, in RFC 793 <xref
target="RFC0793"/> or RFC 4340 <xref target="RFC4340"/>. The Guide for
Internet Standards Writers <xref target="RFC2360"/>, in 2.12
"Notational conventions" and 3.3 "State machines description", lists
several ways to describe them but does not recommend one. Unlike
grammars, which are typically specified with <xref
target="RFC4234">ABNF</xref>, state machines have no standard
description language. RFCs typically use figures, list of transitions or tables.</t>

<t>Figures (wether in ASCII-art, in Unicode-art, in SVG, in GIF or
whatever) are:
<list style="symbols">
<t>impossible to analyze automatically (for instance to check if they
are deterministic),</t>
<t>not readable if the state machine is large.</t>
</list>
</t>

<t>Another issue, and one which created a lot of discussions, is the
"need" to allow something more than US-ASCII (and some people require
even more than raw text) in the RFCs.  A common "use case" is this need
to specify state machines through drawings. That it is not the only
way and not even the best way and the choice here is to use an
ASCII-based languages, thus requiring no change in the format of the
RFC.</t>

<t>Informal natural language text is not perfect either, because it
impossible to analyze automatically (for instance to check if they are
complete).</t>

<t>Tables are also a possible solution (if the machine is
finite). They are fine for automatic processing but very bad for
presentation to humans, specially if they are large. Most people find
them too low-level.</t>

<t>To conclude, let us note that RFC 4006 <xref target="RFC4006"/>
uses a list of tuples, each tuple being a transition. Although the
(informal) syntax it uses is not parsable by a program, the idea
behind it is close from the Cosmogol language.</t>

<t>Cosmogol does not intend to be a protocol specification language:
it focus on the structural description of the state machine. Many
dynamic aspects of the protocol, such as timing, are therefore out of
scope for this specification.</t>

</section>

<section title="Terminology">

<t>TODO: because of the state of this document, some choices are not
final. Every time you see the word ALTERNATIVE in uppercase, it means
several possible choices are listed.</t>

<t>The terminology of state machines is not perfectly standard. We use
here the words:
<list style="symbols">
<t>state,</t>
<t>message, the condition of a transition,</t>
<t>action, performed after the transition.</t>
</list></t>

<t>TODO: timers. David MENTRE &lt;mentre@tcl.ite.mee.com&gt; suggests
to make timers explicit by separating them from normal messages, with
the timeout value as a parameter. He quotes RFC 3261, p. 128. The
<xref target="introduction">introduction</xref> explains why Cosmogol
focus on the structure of the state machine, not a complete
specification of the protocol.</t>

<t>The Cosmogol language contains declarations, assignments and transitions. A
declaration announces that a name will be used for either a message, a
state or an action. An assignment binds a value to a variable.</t>

<t>A transition is described by the name of the message, the names of
the current and next state and an optional action. They are the heart
of the Cosmogol language: in Cosmogol, a state machine is a list of
transitions.</t>

<t>A processor is a program that processes Cosmogol files. It can be
validating or not. Any processor MUST check the syntax of the file. A
validating processor MUST perform the checks described in <xref
target="semantics"/>.</t>

<t>In addition to the checks, a processor MAY perform other tasks such
as translating to another format, for instance <xref
target="graphviz">Graphviz</xref>.</t>

<t>TODO: some way to modularize state machines? For instance, X509
checking is described by several SM.</t>

</section>

<section title="Grammar">

<t>Here is the grammar of Cosmogol, using <xref
target="RFC4234">ABNF</xref>. The use of definitions of appendix B of
<xref target="RFC4234">the RFC</xref> ("core") is assumed.</t>
<figure>
<artwork type="abnf">
&grammar;
</artwork>
</figure>

<!-- TODO: the current grammar authorizes empty machines. Should we
     assign them a meaning? -->

<t>TODO: Julian Reschke <eref
target="http://www1.ietf.org/mail-archive/web/cosmogol/current/msg00007.html">suggests
to have also an alternative XML syntax</eref>, with the same
terminology and model.</t>
</section>

<section title="Semantics" anchor="semantics">
<t>A validating processor MUST perform all these checks.</t>

<t>Every message, state and action MUST be declared. The possible
values for the right side of a
declaration are:
<list style="symbols">
<t>MESSAGE</t>
<t>STATE</t>
<t>ACTION</t>
</list>
</t>

<!-- TODO: clearly says that messages and states are in different name spaces -->

<t>The order between statements (transitions, declarations and
assignments) has no meaning. For instance, the declaration of a
message can takes place after its use in a transition.</t>

<t>All names are case-sensitive. ALTERNATIVE: make them
case-insensitive, which is possible since everything is in
US-ASCII.</t>

<t>TODO: should we document naming *conventions*, such as "States in
uppercase, messages in capitalized"? Another good convention would be
for Timers (see RFC 3261.)</t>
<!-- Mohsen Souissi said yes, as a recommandation only. -->

<t>Assignments are only possible to pre-defined variables. No
assignment is mandatory. The variables are:
<list style="symbols">
<t>Title (used for some displays)</t>
<t anchor="initial-state">Initial <!-- Mohsen Souissi prefers "Idle"
-->(to indicate the initial state; if this variable is assigned, every
state MUST be reachable - may be indirectly - from the initial
state)</t>
<t>Final (to indicate the final state; if this variable is assigned,
this final state MUST be reachable - may be indirectly - from every
state)</t>
<!-- Do note it is useless for RFC but common otherwise -->
</list>
</t>

<t>ALTERNATIVE: allow non pre-defined variables ? Or force them to be
prefixed by "x-" ? Or allow a "simpler than RFC" way to add more
pre-defined variables ? An IANA registry ? Examples of variables which
may be useful soon: revision number, date, author and other typical
meta-data.</t>

<t>When there are several current states indicated, they must be
interpreted as a set. For every member of the set, the message yield
to the next state. Same thing when there are several messages. This
allows some grouping of similar transitions. So, the
following state machine:
<figure>
<artwork>
Waiting, End: timeout, user-cancel, atomic-war -> Start;
</artwork>
</figure>
is to be interpreted as completely equivalent to:
<figure>
<artwork>
Waiting: timeout -> Start;
End: timeout -> Start;
Waiting: user-cancel -> Start;
End: user-cancel -> Start;
Waiting: atomic-war -> Start;
End: atomic-war -> Start;
</artwork>
</figure></t>

<t>The state machine MUST be deterministic, that is for every couple
(current state, message), there must be only one output (next state
and optional action).</t>

<t>Besides the "Initial" variable mentioned <xref
target="initial-state">above</xref>, a processor may provide a mean to
the user to declare (may be on the command line) a state as the start
of the machine and the processor may check that every other state is
reachable from this state, as if it were declared as "Initial". Same
thing for the "Final" state.</t>

<t>A processor may provide a flag to require that the state machine is
complete, that is every transition must be explicitely listed.</t>

</section>

<section title="Internationalisation considerations">
<t>The character set of the language is US-ASCII only, for conformance
with <xref target="RFC2026"/>, section 2.1. This reflects the fact
that RFC must be written in english (TODO: something which does not
seem to be documented anywhere). ALTERNATIVE: Julian Reschke would <eref
target="http://www1.ietf.org/mail-archive/web/cosmogol/current/msg00007.html">prefer
UTF-8</eref> partly because the IETF may lift the current restrictions
at some point of time.</t>
</section>

<section title="IANA Considerations">
<t>None. TODO: Julian Reschke <eref target="http://www1.ietf.org/mail-archive/web/cosmogol/current/msg00007.html">suggests registering a MIME type</eref></t>
</section>

<section title="Security Considerations">
        <t>Implementors of state machines are warned to pay attention
	to the default case, the one for which there is no explicitely
	listed transition.</t>
<t>ALTERNATIVE: force every transition to be declared. This is
believed to be too demanding for large SM.</t>
</section>

</middle>

<back>
<references title='Normative References'>
&norm-references;
</references>
<references title='Informative References'>
&info-references;
<reference anchor="graphviz" target="http://www.graphviz.org/">
<front>
<title abbrev='Graphviz'>Graphviz, Graph Visualization
Software</title>
<author><organization>AT&amp;T Research</organization></author>
<date month="December" year="2004"/> <!-- Actually, it is the date it
was released under a free software licence -->
</front>
</reference>
<reference anchor="smc" target="http://smc.sourceforge.net/">
<front>
<title abbrev="SMC">The State Machine Compiler</title>
<author surname="Rapp" initials="C.R." fullname="Charles
		  W. Rapp"><organization>Rapp</organization><address><email>rapp@acm.org</email></address></author>
<date month="January" year="2000"/>
</front>
</reference>
<reference anchor="ragel" target="http://www.cs.queensu.ca/home/thurston/ragel/">
<front>
<title abbrev="Ragel">Ragel State Machine Compiler</title>
<author surname="Thurston" initials="A.T." fullname="Adrian
						     D. Thurston">
<organization>Queen's University</organization>
<address><email>thurston@cs.queensu.ca</email><uri>http://www.cs.queensu.ca/home/thurston/</uri></address>
</author>
<date year="2006" month="August"/>
</front>
</reference>

<reference anchor="fsmlang"
	   target="http://fsmlang.sourceforge.net/">
<front>
<title abbrev="FSMLang">FSMLang</title>
<author><organization></organization><address><email>ringwinner@sourceforge.net</email></address></author>
<date month="September" year="2006"/> <!-- Dummy data -->
</front>
</reference>

<reference anchor="graph-easy"
	   target="http://search.cpan.org/~tels/Graph-Easy/">
<front>
<title abbrev="Graph::Easy">Graph::Easy</title>
<author surname="Tels"><organization></organization><address><email>nospam-abuse@bloodgate.com</email></address></author>
<date month="March" year="2006"/> 
</front>
</reference>
</references>

<section title="Examples">

<t>The TCP state machine, from RFC 793 <xref target="RFC0793"/>.</t>
<figure>
<artwork>
&tcpSM;
</artwork>
</figure>

<t>The EPP state machine, from RFC 3730 <xref target="RFC3730"/>.</t>
<figure>
<artwork>
&eppSM;
</artwork>
</figure>

<t>The DCCP state machine, from RFC 4340 <xref target="RFC4340"/>.</t>
<figure>
<artwork>
&dccpSM;
</artwork>
</figure>

</section>

<section title="First implementation">
<t>The first implementation of the Cosmogol language can be found at
<eref target="http://www.cosmogol.fr/"/>. It is a processor which is
able to check state machines specified in Cosmogol and to translate
them into Graphviz.</t>
</section>

<section title="Related work">
<t>All of them are interesting back-ends for a Cosmogol processor:
<list style="symbols">
<t>Graphviz <xref target="graphviz"/> is a widely-used language to
describe graphs. It has been <eref
target="http://www.linux.com/article.pl?sid=05/11/08/2018216">used for
state machines such as TCP</eref>. But it is more presentation-oriented, you
cannot restrict it to just the description. Consequently, there are
currently no tools to check, for instance the determinism.</t>
<t>The Perl module Graph::Easy <xref target="graph-easy"/> shares most
of the aims of Graphviz. It is also oriented towards presentation.</t>
<t>SMC <xref target="smc"/>, Ragel <xref target="ragel"/> and FSMlang
<xref target="fsmlang"/> are more oriented towards
code-generation. </t>
</list>
</t>
</section>

<section title="Changes">
<section title="Changes from -01">
<t><list style="symbols">
<t>TODO</t></list></t>
</section>
<section title="Changes from -00">
<t><list style="symbols">
<t>The syntax of a transition is different: the current-state is now
the first item, and not the message. There was a clear consensus among
the reviewers on this change.</t>
<t>Several messages are now allowed in a transition, to indicate a set
of messages. Same thing for the current state.</t>
<t>Several bug fixes in the grammar.</t>
</list></t>
</section>
</section>

<section title="Acknowledgements">
<t>Significant contributions have been made by Pierre Beyssac,
Emmanuel Chantreau, Frank Ellermann, Kim Minh Kaplan, Thomas Quinot,  Bertrand Petit,
Phil Regnauld, Mohsen Souissi and Olivier Ricou.
</t>
</section>

</back>

</rfc>