CIDM Best Practices 2017 | September 11-13, Burlington, VT

Stilo is pleased to support the CIDM Best Practices conference - the premier annual conference for managers of information development, whether it be for user and product support, or training. Discover how organizations are pursuing dynamic publishing, content management, social media, enterprise-wide information creation through wikis and blogs, and much more.  In its 19th year, the Best Practices Conference will be held in Burlington, Vermont.

Visit the conference website to find out more and to register!  (Register on or before July 4th and save $170!)

CMS/DITA Europe 2017 | October 30-31, Berlin, Germany

Stilo is pleased to once again be one of the sponsors at this year’s Content Management/ DITA Europe conference being held for the first time in Berlin, Germany from October 30-31.

Join colleagues from around the world for two days of career-empowering knowledge, practices, networking, and practical solutions. With insights from over 30 speakers, an exhibit hall packed with the best content management solutions, and the very best in industry networking, CMS/DITA Europe is designed with your specific needs in mind—providing strategies you can use immediately as you create and manage technical content, and expanding your professional network with information development experts from around the world.

Whether you’re a Novice or a Master, you’ll find sessions that provide inspiration and practical skills, covering topics such as:

  • Implementing a component Content Management System
  • Adopting DITA, Markdown, or other XML structures
  • Effectively reusing content
  • Creating corporate taxonomies
  • Incorporating social media strategies
  • Managing teams
  • Collaborating across the enterprise

Those new to content management and/or DITA will find guidance for starting their journey, while for experts, our program offers ways to continue pushing the boundaries. Join us for a chance to share what you already know and find out about things you don’t!

Stop by our booth in the expo hall and ask us for a demo of Migrate, our cloud XML conversion service which enables technical authoring teams to convert content from source formats including FrameMaker and Word to XML DITA or AuthorBridge, our web-based XML editor which provides subject matter experts with a Guided & Fluid authoring experience without requiring any knowledge of XML or DITA.

We're very pleased to announce that Stilo's Patrick Baker, VP Development & Professional Services, has been selected to make the following presentations at CMS/DITA Europe 2017 - be sure to add them to your schedule!

Monday, October 30, 2017 | 10:40 - 11:20am
DITA: Start small, grow big using open source tools

Abstract | You’re considering using DITA and would like to try it out without incurring significant upfront costs, but also keeping your options open longer-term. Where do you start? How will you approach the challenges of content creation, content management, and publishing your content? There are in fact plenty of options. The good news is that XML and DITA are open standards. This has led to a healthy ecosystem with quality commercial and inter-operable open source tools, that do away with vendor lock-in and keep operating costs down. We will discuss the three challenges, show an example of how end-to-end solutions can be built based upon Git and other open source tools. In fact, the result may be better than you’d expect.

Monday, October 30, 2017 | 4:20 - 5:00pm | Technology Test Kitchen
Recipe for success: Guiding SMEs to make contributions in DITA - Just add AuthorBridge

Abstract | Join our Technology Test Kitchen to try out AuthorBridge, Stilo’s web based XML editor that enables SMEs to easily create structured content without requiring any knowledge of DITA or its complexities. Its unique architecture provides a Guided + Fluid authoring experience that sets it apart from other XML editors.

Experience level:

  • No knowledge of DITA or XML is required

What you’ll need to bring:

  • All you need to bring is your laptop! We will provide sample content
  • If you prefer, you can bring your own DITA content to work with

What you’ll take away:

  • The knowledge of how to edit and review existing DITA topics and to quickly create new ones in a collaborative environment between SMEs and technical authors
  • A free 30-day trial of AuthorBridge, which you can continue to use following the conference


Find out more and register for CMS/DITA Europe 2017

Exercise of share options

8 June 2017

Issue of new shares / Exercise of Share Options

The Board of Stilo International plc (“Stilo” or the “Company”) (LSE:STL), the AIM quoted software and cloud services company, announces that it has today issued 75,000 new ordinary shares of 1p each in the company (”Ordinary Shares”) following notification of the exercise of share options by an employee. The exercise price of the new shares is 1.5 pence per share.

Application has been made for the 75,000 Ordinary Shares to be admitted to trading on AIM and it is expected that admission will take place on 14th June 2017.

The Ordinary Shares will rank pari passu with the existing shares of the Company.  Following allotment of the Ordinary Shares, the total issued share capital of the Company will be 113,843,470 ordinary shares.

For the purposes of the Financial Conduct Authority’s Disclosure and Transparency Rules (“DTRs”), the issued ordinary share capital of Stilo following this allotment will consist of 113,843,470 ordinary shares with voting rights attached (one vote per share). There are no shares held in treasury.  This total voting rights figure may be used by shareholders as the denominator for the calculation by which they will determine whether they are required to notify their interest in, or a change to their interest in, Stilo under the DTRs.


Stilo International plc
Les Burnham, Chief Executive T +44 1793 441 444
Liam O’Donoghue, Company Secretary T +44 20 7583 8304

SPARK Advisory Partners Limited (Nominated Adviser)
Neil Baldwin T +44 203 368 3554
Mark Brady  T +44 203 368 3551

SI Capital (Broker)
Andy Thacker
Nick Emerson
T +44 1483 413500

Result of AGM 18 May 2017

18 May 2017

Stilo International plc (AIM:STL), the AIM quoted software and cloud services company, announces that all resolutions proposed at its AGM held earlier today were duly passed without amendment.


Stilo International plc
Les Burnham, Chief Executive T +44 1793 441 444
Liam O’Donoghue, Company Secretary T +44 20 7583 8304

SPARK Advisory Partners Limited (Nominated Adviser)
Neil Baldwin T +44 203 368 3554
Mark Brady  T +44 203 368 3551

SI Capital (Broker)
Andy Thacker
Nick Emerson
T +44 1483 413500

Chairman’s AGM statement 18 May 2017

18 May 2017

Stilo International plc (“Stilo” or the “Company”) (AIM:STL) is holding its Annual General Meeting later today. The Company provides software tools and cloud services that help organisations create and process content in XML format, so that it can be more easily stored, managed, re-used, translated and published to multiple print and digital channels.

At the meeting Chairman David Ashman will make the following statement:

“Following the launch of AuthorBridge v2.0 in February 2017, we have been receiving very encouraging feedback from trial users. However, there are still some important aspects of development that need to be undertaken over the coming months and this continues to be a high priority activity for the Company.

As a consequence, AuthorBridge is not expected to contribute significantly to sales revenues in 2017. Rather, our goal in 2017 is to implement AuthorBridge at a number of key customer sites and in so doing, provide a solid foundation upon which we can build future business.

Otherwise, the market for Migrate DITA conversion services and OmniMark software remains steady, and overall Company trading is in line with management expectations.

The Company remains un-geared, and cash balances at 30 April 2017 stood at £1,560,000 (31 December 2016: £1,466,000). Current levels of cash will serve to fund additional development, sales and marketing efforts as we look to grow our portfolio of solutions and enter new market sectors. It will also be used to assist with potential acquisitions, whilst providing an appropriate financial reserve for the business. Ongoing, it is the Board’s intention to maintain a progressive dividend policy with scope for special one-off dividends as may be deemed appropriate from time to time.

Subject to approval by shareholders at the meeting, a final dividend for the year ended 31 December 2016 of 0.05 pence per Ordinary Share will be paid on 23 May 2017 to those shareholders on the register at 21 April 2017.”


Stilo International plc
Les Burnham, Chief Executive T +44 1793 441 444
Liam O’Donoghue, Company Secretary T +44 20 7583 8304

SPARK Advisory Partners Limited (Nominated Adviser)
Neil Baldwin T +44 203 368 3554
Mark Brady  T +44 203 368 3551

SI Capital (Broker)
Andy Thacker
Nick Emerson
T +44 1483 413500

OmniMark design principles

  • The streaming paradigm
  • Rules-based programming
  • Hierarchical markup parsing model
  • Powerful pattern matching
  • Referents

The streaming paradigm

The streaming paradigm is an approach to programming that concentrates on describing the process to be applied to a piece of data, and on processing data directly as it streams from one location to another. In the streaming model, the use of data structures to model input data is eliminated, and the use of data structures to model output is greatly reduced. This keeps to a minimum the use of system resources when processing large volumes of data. As a side-effect, because the processing requirements are consistent, system performance on larger data sets can be predicted with a great deal of accuracy. A program will run with equal success on a 2 kilobyte file or a 2 gigabyte file.

OmniMark has an abstracted streaming model which allows a stream to be attached to different sources of input and output — files, databases and messages — with a minimum of effort. This abstraction also allows code processing the content itself to be dissociated completely from the problems of managing or even knowing about details of the input or output type, with obvious productivity and code simplification benefits.

An application may have multiple input streams open to permit data integration. Multiple output streams may also be used to feed different targets. For instance a complex application may be taking a stream fed from a file, integrating that input with a stream fed by accessing a database and outputting the data to multiple systems (potentially in different formats).

Rules-based programming

OmniMark incorporates a declarative scripting language. This means that an application is constructed of rules for dealing with events which are triggered by the recognition of patterns of data coming into the program from a stream. In dealing with content the individual pieces of content are well known, the order in which they occur is not. This arbitrariness of content is one of its basic properties and rules are the best mechanism for dealing with it. OmniMark’s rules may be triggered by data events generated by the two types of built-in processors, the pattern processor and the markup processor. The markup processor is tightly coupled with markup parsing.

The two processors may be used in conjunction to process a single piece of content to create powerful hybrid applications; where XML is being processed and complex pattern matching is used upon the content in the markup – the text in the elements. The pattern processor may also be used ahead of the markup processor to prepare content for parsing – converting non-XML into XML for instance. The output stream of the pattern processor is fed in as the input stream of the markup processor.

All of these features are implemented in an elegant framework, which results in applications consisting of well-delineated functional code blocks both in terms of readability and actual functionality, thus producing an easily maintained application.

Hierarchical markup parsing model

Many people concerned with XML will be familiar with, or will at least have heard of, SAX and DOM models for processing. SAX is an event-based model and DOM is tree-based. OmniMark employs a third model – hierarchical. Like SAX, OmniMark leverages an event-based model, but where SAX would generate three events for an element (the start, the content and the end) OmniMark generates only one, treating the occurrence of the whole element as a single event to activate a rule. Since elements can be nested, a hierarchy of activated rules will be created, modeling the structure of the content. This simple model is easy to understand and the resulting process flow is clear, concise and thus easy to maintain.

The event-based parsing model fits neatly with the streaming approach to processing content, with the markup processor receiving a stream of data and triggering events as the elements are found, without needing to buffer or decompose the input. Therefore this model supports the design considerations for OmniMark of remaining scalable and performant when processing massive data sets or receiving high volumes of content.

In conjunction with the triggering of the rules, the markup processor maintains the current element context for the set of elements being processed at any instant. This allows the application to query and make decisions based on data about that context including the attribute values associated with the elements being processed and their parents. This mechanism has been shown to handle the vast majority of content processing requirements. However, by using other features of OmniMark this may be augmented should it be necessary – for instance a tree of all elements accessed may be constructed in the application for later manipulation.

Powerful pattern matching

The OmniMark pattern processor implements a pattern matching language that is both powerful and easy to use. Based upon an optimized regular expression mechanism, it has many other features, including:

  • Maintaining context. The same pattern may have different meanings in different contexts. Therefore context needs to be maintained to allow different rules to fire in different situations.
  • The ability to lookahead for patterns without actually processing the values. This allows program flow to be changed, before the pattern is reached, to allow the pattern to be processed in the right context.
  • Complex pattern matching procedures (i.e., independently-called functions). This allows sophisticated pattern matching to be encapsulated and reused.
  • Nested pattern matching (matching a pattern within a pattern).

The pattern processor will activate the associated rule when a pattern defined has been matched. These features are encapsulated in a language that is very English-like, making it clear, easy to comprehend application functionality, which simplifies both development and maintenance.


Often the order in which content is received as input is not the order in which it is required for output. OmniMark’s patented referent mechanism allows a placeholder to be inserted in the output stream and its value supplied later when it is available. The streaming mechanism handles the buffering of output containing unresolved placeholders. The whole referent mechanism may be scoped and nested so that buffering is kept to a minimum. Code that is processing content needs no knowledge of the mechanism; a referent is just like any other target. The major benefit of this mechanism is that it maintains the efficiency of the streaming model while enabling powerful re-ordering functionality that would otherwise be severely constrained. Referents are a key innovation within the OmniMark language and it is one reason why OmniMark is so successful at blending power and performance.

Breaking lines on your own with OmniMark

By Jacques Légaré, Senior Software Developer and Mario Blažević, Senior Software Developer
1. Motivation

OmniMark has had line-breaking functionality built-in ever since the XTRAN days. This functionality can be used to provide rudimentary text formatting capabilities. The language-level support for line-breaking is described quite thoroughly in the language documentation.

OmniMark’s language-level line-breaking support is very simple to use, and aptly supports the use-case where all the output of a program needs to be similarly formatted. Where the performance is less stellar, however, is when line-breaking needs to be activated and deactivated on a fine-grained level. The reason for this is simple: when line-breaking is disabled (say, using the h modifier), OmniMark cannot predict when it might be reactivated. As a result, it still needs to compute possible line-breaking points, just in case. As efficient as OmniMark might be, this can cause a significant reduction in performance, sometimes by as much as 15%.

As of version 8.1.0, the OmniMark compiler can detect some cases when line-breaking functionality is not being used, and thereby optimize the resulting compiled program to by pass line-breaking computations. However, the compiler cannot make this determination in general: this is an undecidable problem. For instance, consider the following somewhat contrived example:

replacement-break ” “ “%n”process do sgml-parse document scan #main-input output “%c” doneelement #implied output “%c” element “b” local stream s set s with (break-width 72) to “%c” output s

Note that line-breaking is only activated in the element rule for b, and so line-breaking will only be activated if the input file contains an element b. The OmniMark compiler cannot be expected to predict what the input files might contain when the program is executed!

Another issue with OmniMark’s built-in line-breaking is that it does not play well with referents. Specifically, consider the following program:

replacement-break ” “ “%n”process local stream sopen s as buffer with break-width 32 to 32 using output as s do xml-parse scan #main-input output “%c” done close s element #implied output “%c”  || “.” ||* 64

This program puts a hard limit of 32 characters on the maximum length of lines output to s. When this program is executed, a run-time error is triggered in the body of the element rule, where we attempt to output 64 periods. On the other hand, consider the following similar program:

replacement-break ” “ “%n”process local stream sopen s as buffer with (referents-allowed & break-width 32 to 32) using output as s do xml-parse scan #main-input output “%c” done close s set referent “a” to “.” ||* 64 output s element #implied output “%c”  || referent “a”

This program accomplishes virtually the same task, but instead uses a referent to output the string of periods. In this case, no run-time error is triggered: the line-breaking constraints have been silently violated.

Because of these issues, it is better to use OmniMark’s built-in line-breaking only when necessary, whereas in other cases to implement line-breaking using other language constructs.

The remainder of this article discusses how to simulate line-breaking on PCDATA using string sink functions.

2. string sink functions

string sink function is a function that can be used as the destination for strings. In a very real sense, a string sink function is the complement of a string source function, which is used as the source of strings. While a string source function outputs its strings to #current-output, a string sink function reads its strings from #current-input.

string sink function is defined much like any other function in OmniMark, the only difference being that the return type is string sink: for example,

define string sink function dev-null as void #current-input

This is a poor man’s #suppress, soaking up anything written to it.

string sink function can have any of the properties normally used to define functions in OmniMark: e.g., it can be  overloadeddynamicetc …. The argument list of a string sink function is unrestricted. However, in the body of a string sink function, #current-output is unattached.The form of OmniMark’s pattern matching and markup parsing capabilities makes string sink functions particularly convenient for writing filters, taking their #current-input, processing it in some fashion, and writing the result out to some destination. However, since #current-output is unattached inside the function, we need to pass the destination as an argument. For this, we use a value string sink argument. For example, a string sink function that indents its input by a given amount might be written

define string sink function
 indent (value integer     i,
 value string sink s)
 using output as s
 output ” “ ||* i       repeat scan #current-input
 match “%n”
 output “%n” || ” “ ||* i      match any-text* => t
 output t

The function indent could then be used like any other string sink:

; …
 using output as indent (5, #current-output)
 do sgml-parse document scan #current-input
 output “%c”

(The ability to pass #current-output as a value string sink argument is new in OmniMark 8.1.0.)

You can find out more about string sink functions in the language documentation.

3.  Line-breaking in OmniMark

We can use a pair of string sink functions to simulate to some extent OmniMark’s built-in line-breaking functionality. The benefit of this approach is that it impacts the program’s performance only where it is used.

3.1. Simulating insertion-break

To simulate the effect of insertion-break on PCDATA we need to scan the input and grab as many characters as we can up to a specified width. If we encounter a newline in the process, we stop scanning. Otherwise, we output the characters we found, and append a line-breaking sequence provided by the user.

define string sink function
 insertion-break       value string      insertion
 width value integer     target-width
 into value string sink destination

We can start by sanitizing our arguments:

assert insertion matches any ** “%n”
 message “The insertion string %”” || insertion
 || “%” does not contain a newline character %”%%n%”.”

This assertion is not strictly necessary. However, OmniMark insists that the line-breaking sequence contain a line-end character, and so we do the same.

We can grab a sufficient number of characters from #current-input by using OmniMark’s counted occurrence pattern operator:

using output as destination
 repeat scan #current-input
 match any-text{1 to target-width} => l (lookahead “%n” => n)?
 output l

The use of lookahead at the end of the pattern allows us to verify if a %n is upcoming: we should only output the line-breaking sequence if the characters we grabbed are not followed by a %n.

output insertion
 unless n is specified   match “%n”
 output “%n”

We can then use this to break the text output from the markup parser: for example,

 using output as insertion-break “%n” width 20 into #current-output
 do sgml-parse document scan #main-input
 output “%c”

3.2. Simulating replacement-break

Simulating insertion-break on PCDATA is straightforward, because it can insert a line-breaking sequence whenever it sees fit. On the other hand, replacement-break is slightly more complex, since it must scan its input for a breakable point. For clarity, the characters between two breakable points will be referred to aswords; if the breakable points are defined by the space character, they are effectively words.

define string sink function
 replacement-break       value string      replacement
 width value integer     target-width
 to value integer     max-width    optional
 at value string      original     optional initial { ” “}
 into value string sink destination

The argument original is used to specify the character that delimits words; the argument is optional, as a space character seems like a reasonable default. target-width specifies the desired width of the line. max-width, if specified, gives the absolute maximum acceptable line width; if a line cannot be broken within this margin, an error is thrown. Finally, the argument replacement gives the line-breaking sequence.

As before, we start by ensuring our arguments have reasonable values:

assert length of original = 1
 message “Expecting a single character string,”
 || ” but received %”” || original || “%”.”   assert replacementmatches any ** “%n”
 message “The replacement string %”” || replacement
 || “%” does not contain a newline character %”%%n%”.”

The second assertion is repeated from above, for the same reasons as earlier: OmniMark insists that the replacement string contain a newline, and so we will do the same. The first assertion insists that breakable points be defined by a single character; again, this is a carry-over from OmniMark’s implementation.

For replacement-break, the pattern is very different from that of insertion-break: in that case, we could consume everything with a single pattern, using a counted occurrence. This does not suffice with replacement-break: rather, we have to consume words until we reach target-width.

using output as destination
 local stream line initial { “” }

The stream line will be used to accumulate text from one iteration to another.

repeat scan #current-input
 match ((original => replaceable)? any-text
 ** lookahead (original | “%n” | value-end)) => t

The pattern in the match clause picks up individual words. If the line length is still below target-width, we can simply append the word to the current line and continue with the next iteration:

do when length of line + length of t < target-width
 set line with append to t

If this is not the case, we can output the text we have accumulated thus far, so long as it does not surpassmax-width

else when max-width isnt specified
 | length of line < max-width
 output line
 output replacement
 when replaceable is specified            set line to t droporiginal?

If all else fails, we could not find an acceptable breakable point in the line: OmniMark throws an error in this case, so we will do the same.

 not-reached message “Exceeded maximum line width”
 || ” of %d(max-width) characters.%n”
 || “The line is %”” || line || “%”.%n”         done

Our string sink function needs a few more lines to be complete. For one, our previous pattern does not consume any %n that it might encounter. In this case, we should flush the accumulated text, and append a%n:

match “%n”
 output line || “%n”
 set line to “”

Lastly, when the repeat scan loop finishes, there may be some text left over in line, which needs to be emitted:

output line

Just as was the case previously in Section 3.1, “Simulating insertion-break”, we can use our function to break text output from the markup parser: for example,

 using output as replacement-break “%n” width 10 to 15 into #main-output
 do sgml-parse document scan #main-input
 output “%c”
4.  Going further

We demonstrated in Section 1, “Motivation” that referents and line-breaking did not play well together: in fact, a referent could be used to silently violate the constraints stated by a break-width declaration. In the case of our string sink simulations, referents are a non-issue: a referent cannot be written to an internal string sink function, which effectively closes the loophole.

OmniMark’s built-in line-breaking functionality can be manipulated using the special sequences %[ and %]: by embedding one of these in a string that is output to a stream, we can activate or deactivate (respectively) line-breaking. The easiest way of achieving this effect with our string sink functions would be to add a read-only switch argument called, say, enabledviz

define string sink function
 insertion-break         value     string      insertion
 width value     integer     target-width
 enabled read-only switch      enabled      optional
 into value     string sink destination

and similarly for replacement-break. We could then use the value of this shelf item to dictate whether the functions should actively break their input lines, or pass them through unmodified.

Breaking lines using string sink functions in this fashion is really only the beginning. For instance, we could envision a few simple modifications to replacement-break that would allow it to fill paragraphs instead of breaking lines: it would attempt to fill out a block of text so that all the lines are of similar lengths.

The code for this article is available for download.

Pattern matching in OmniMark

Writing better patterns addresses two goals: making the patterns do more for you, and getting them to run fast. The former is always more important than the latter — there’s no point having a program running fast if it isn’t doing what you want it to — but they are not incompatible, and very often more effective patterns will run faster than less effective ones, because they are more to the point.

There are three main principles discussed in this paper:

  • Fail Fast — write patterns that don’t waste time,
  • Succeed Slow — write patterns that do their job efficiently, and
  • Divide and Conquer — build patterns to cover all the cases.
Fail Fast, Succeed Slow

All OmniMark programmers, at one time or another, ask themselves if their “find” rules could or should be running faster. Most of the time it doesn’t matter. If you’re not waiting around for your OmniMark program to run, there isn’t a problem. But if your thumbs are getting tired of twiddling, it might be time to take a look at your find rules.

Perhaps the most important thing that should be kept in mind about using find rules is that they spend most of their time failing to produce results. At most, only one find rule will end up capturing a piece of text, so all the rules ahead of it are going to fail. This leads to the first principle of writing find rules: “fail fast” — or, spend as little time as possible on find rules that are failing.

Most find rules already fail quite fast; the first character in the find rule isn’t the one being looked at, most of the time, and OmniMark takes advantage of this in the way it chooses the find rules to look at. One thing that you should avoid, if you can, is a find rule that starts with any, and which usually fails later in the pattern — i.e. an any match that isn’t a “catch-all” at the bottom of the program.

The second principle of writing find rules is “succeed slow” — once you are in a find rule, or in a repeat scan match, pick up as much with it as you can. It is often the case in word processor formats, for example, that commands come in bunches. So pick up bunches, not just single commands. This cuts down the number of find rules that are performed.

With these two principles in mind, let’s visit an old friend:

find any

This little rule ends up sitting at the bottom of many an OmniMark program. It can sit by itself as above, in which case it means “please throw away anything you haven’t yet recognized.” Or it can do something simple, as in the following, where it copies the otherwise unrecognized character to the current output:

find any => one-character
   output one-character

In both cases, this little rule usually provides an excellent opportunity to speed up the OmniMark program. First of all, on the “fail fast, succeed slow” principles, it makes sense to pick up as long a run of characters as possible that will not be recognized by any other rule.

What these characters are depends very much on other find rules at work. For example, if other rules recognized text starting with any of the seven-bit ASCII graphic characters, then a rule such as:

find [\ " " to "~"]+

will “fail fast,” if the character being looked at is a seven-bit ASCII graphic, as well as “succeed slow,” because it will spend the time it takes to find all the following characters. And the nice thing about such a rule is that it can be put anywhere in the program — it’s not getting in the way of other find rules.

Just be careful to pick up any leftover characters that are sometimes, but not always picked up by other find rules. Leaving the good old “find any” rule in the program, after the “gobbler,” does this, or it can be combined as follows:

find [\ " " to "~"]+ | any

Note that the any doesn’t have a “+” sign. If it did, it would consume the rest of your input, without giving the other find rules a chance to examine it.

Finally, remember that OmniMark copies characters that are unrecognized by any find rule to the current output, and does it in a very efficient way. So if you simply want all otherwise unrecognized characters copied, don’t do the copying yourself — not doing anything is the best way to fail.

Alternatively, if you want to discard all characters not otherwise captured by find rules, you should make your default output be #suppress, and explicitly output to #main-output or wherever the output is going. Doing so makes unmatched characters go to #suppress — efficiently discarding them.

Divide and Conquer

A common need in pattern matching is to pick up everything up to, but not including, a known closing delimiter. For example, a match part that picks up the text in a quoted literal can do so as follows, with a quote character ending the picked-up text:

match "%"" [\ "%""]* => text "%""

For other than single-character closing delimiters, a simple “any except” character set doesn’t do the job. The following shows why:

match "--" [\ "-"]* => text "--"

The problem is that a single “-“, not part of a “–“, will terminate the matching done by the “any except” but won’t be matched by the final “–“. That’s where the lookahead used in the upto macro comes into play — it makes sure that the whole of the terminating delimiter is present.

Here’s a common solution to this problem, using the handy lookahead:

match "--" ((lookahead not "--") any)* "--"

What’s going on here is that the pattern repeatedly “looks ahead” to make sure that the terminating condition has not yet been met, and if it hasn’t, consumes another character. When the termination condition is found, the “*” loop exits and finds the following match, the final “–“.

The lookahead formulation does a good job of picking up text. But for those wanting to “fail fast, succeed slow”, it’s really unsatisfactory, because it examines every character in the text twice — once in the lookahead to ensure it’s not part of the closing delimiter, and a second time in the any gobbler. A better approach is one that “divides and conquers” — examining only once any character that isn’t a “-” using the “any except” form, and only doing a lookahead when a “-” is encountered:

match "--" ([\ "-"]+ | "-" lookahead not "-")* => text "--"

The divide and conquer approach to writing patterns comes in handy even when lookahead isn’t required. For example, the following match part picks up a C-like string, gobbling everything but quotes and “\” quickly, and handling “\” separately (so that a quote can be put in the text using “\””):

match "%"" ([\ "%"\"]+ | "\" any)* => text "%""

When the patterns start to become large and more complex, divide and conquer is the real winner. Here’s how to write a divide and conquer pattern in general:

  1. For each character that is only sometimes matched, based on the character or sequence of characters that may precede it (such as characters preceded by “\” in C-like strings), construct an alternative for all possibilities starting with the first character of the preceding characters. For example:
    "\" any ; for escaped characters in C-like strings
    "%" any ; for escaped characters in OmniMark strings
  2. For each character that is only sometimes matched, based on the character or sequence of characters that may follow it (such as the “-” possibly followed by another “-” in XML-like comments), construct an alternative that matches that character with a “lookahead” or “lookahead not” that excludes the non-matching cases. For example:
    "<" lookahead not [letter | "/!?"] ; for illegal uses of "<" in XML
  3. Pick out all the characters that must always be matched, but are not matched by one of the previously constructed alternatives, and match them using a character set matcher with a “+” on it. For example:
    [" " to "~" \ "%"\"]+ ; for what's allowed "as is" in C strings
    [\ "%"%%"]+ ; for what's allowed "as is" in OmniMark strings
  4. Take the partial patterns from the first three steps, connect them with “|” (or), putting the most likely alternatives first (for speed only). For example:
    ([" " to "~" \ "%"\"]+ | "\" any) ; for C string text
  5. Append an “*” (repeat zero or more times) or a “+” (repeat one or more times) to the connected partial patterns, depending on whether or not the text as a whole can consist of zero characters. The result will look something like:
    ([" " to "~" \ "%"\"]+ | "\" any)*
  6. Recursively apply divide and conquer to any of the constructed alternatives that itself matches delimited text.

As an example of divide and conquer, here’s a find rule that matches an XML start tag:

find "<" (letter [letter | digit | ".-_"]*) => element-name
   ([\ "%"'<>/"]+ |
      "%"" [\ "%""]* "%"" |
      "'" [\ "'"]* "'")* => attributes
   ("/" => empty-element)? ">"?

The core of the pattern (that matches the attributes) is a three-way alternative that picks up everything except a quote or apostrophe, and then tries each of the two types of quoted text. (Quoted text generally needs to be specially recognized because it can contain things that would be recognized as delimiters outside of quotes.)

** and ++

If you’ve used the “**” and “++” pattern operators introduced in OmniMark version 6, you’ll maybe be wondering where they fit into all of this. What “**” and “++” do is take the principles discussed above and apply them in a number of common and useful cases.

“**” and “++” take a (preceding) character set and (following) pattern, and match everything within that character set up to and including the pattern. (They differ only in that “++” fails if it does not encounter at least one character prior to the pattern.) For example, a convenient way of writing the XML comment matcher shown earlier in this paper is:

match "--" any** "--"

This is certainly shorter than the previous formulations. More importantly, it is easier to read and it runs efficiently.

OmniMark uses the divide and conquer principle on the pattern following “**” or “++” and builds a loop that only stops when it needs to look further, in the same way that the divide and conquer rewrite did. It saves the programmer the trouble, and is able to do things that can be hard for the programmer.

Although they’re very useful, and should get a lot of use, “**” and “++” don’t deal with all pattern matching problems. It’s good for the programmer to understand the principles described in this paper when they have to be applied explicitly.

Migrating legacy OmniMark programs

OmniMark has undergone a tremendous evolution from its earliest days as a very simple rule-based SGML scripting language, to its current state as a general-purpose programming language with modern software-engineering features.

During the course of this evolution, great pains have been taken to maintain backwards compatibility. Still, some changes to the core language have been necessary. The requirements of a general-purpose programming language, suitable for engineering complex high-performance systems, are very different from those of a simple narrowly-targeted scripting language.

Even so, many older programs do not need to be modified to work with current versions of OmniMark. The only programs that do need modification are those that are written with a version 2 (v2) coding style.

For most programs that do require modifications, they can be updated automatically in only a few seconds, using the migration script provided with this article. A few programs will require additional hand-editing, generally taking just a few minutes more.

This article will walk you through the migration process, and help you troubleshoot those few programs that require further modification.

This process can be run with any version of OmniMark from version 6 to the current version, and the results will be compatible with all of these releases.

You can download the conversion scripts mentioned here in zip format for Windows. These scripts are provided in source code form to make it easier for you to customize them for your particular code base.

This article assumes the use of the OmniMark Studio for Eclipse. In versions 8 and above, this will be the OmniMark Studio for Eclipse. In version 7, you might have either the Studio for Eclipse or the standalone Studio. In version 6 you will be using the standalone version of Studio. The program provided will work in all of these versions. The procedures for running the program vary slightly. The steps for using Studio for Eclipse are listed first, and the steps for standalone Studio follow. If you have a lot of files to convert, you may wish to compile the migration programs so that they can be run from a batch script. See the OmniMark Studio documentation for more information on compiling programs, and the OmniMark Engine documentation for information about running compiled programs.

If you are working with OmniMark version 6 in a Unix environment, you will have to compile the migration programs and use an OmniMark Engine to execute them.

Running the Migration Program

This step will take a few seconds for each file you need to migrate.

Procedure for OmniMark Studio for Eclipse

Run the to-six.xom program to upgrade the syntax.

  1. Unzip the program files into a suitable directory.
  2. In OmniMark Studio for Eclipse, create a project for running the migration.
    1. From the file menu, choose New -> project
    2. Expand the OmniMark option and choose OmniMark Project.
    3. Enter a name for your project.
    4. Uncheck “Use default” and navigate to the directory where you have placed your files.
  3. Open to-six.xom
  4. Create a launch configuration for the program to-six.xom by pulling down the “Run” menu and clicking on “Run as OmniMark Program”. Don’t worry about the error messages.
  5. In the Parameters tab, under Arguments, enter
    1. “include=” to specify the directory containing any include files used by the program you are upgrading, for example include=C:\MyPrograms\OmniMark\xin
    2. the path to the program you want to migrate
    3. the path to the output destination. Don’t overwrite your input file. Either give the new file a different name or place it in a different directory
      -of newfile.xom
    4. the path to a log file to capture any error messages
      -log logfile.xom
    5. If you are using OmniMark version 8 or newer, you may want to consider adding the command-warning-ignore deprecated here, to avoid seeing numerous warnings about obsolete syntax. The obsolete syntax has been retained in this program so that it continues to work in versions 6 and 7.
  6. Click Run at the bottom of the launch configuration screen.
  7. Examine the result.
    1. You will find that all of the lines have been moved over by a few spaces. Whenever the program changes a line, it inserts the original line in front, as a comment beginning with “;pre-6”. Spaces are added to the front of the rest of the lines to keep everything lined up. (These extra spaces and comment lines will be removed in a later step.)
    2. You should examine the changes to make sure that the new lines are correct. You may examine the changes by searching for the text “;pre-6”.
    3. Open the log file to examine the error messages. If you see any errors, you will have to correct the code by hand. There should be very few (if any) errors. You may see warnings. Warnings about deprecated syntax can be ignored. For other warnings, you should examine the referenced lines in the output file and make sure that the code is correct. If not, you should correct the output file by hand.

Procedure for standalone Studio (OmniMark 6)

1.     Run the to-six.xop project to upgrade the syntax.

a.   In OmniMark Studio, open the project to-six.xop.

b.   Select the input file, the output file, and the log file by editing the project options. If you are migrating a program file and it includes files from a different directory, then you also specify the include path at this step.

i.   Specify the directories to be searched for include files if any. Click on the “Arguments” tab. Type “include=” followed by the name of a directory that contains include files. (Do not leave spaces around the “=” sign.) Do this once for each directory that contains include files. For example, to search the folder “C:\MyPrograms\OmniMark\xin” enter:


ii.   Click on the “Arguments” tab, and browse to the include or program file that you want to migrate, and add it to the argument list.

iii.   Click on the “Output” tab, and type in the name of the output file. This will be a new version of your original include or program file, compatible with OmniMark 6 and above. Do not overwrite your input file. Either give your output file a new name or place it in a different folder.

iv.   Also in the “Output” tab, type in the name of a log file to capture any error messages.

c.   Save the project.

d.   Pull down the “Run” menu, and select “Execute Project”.

e.   When you see “Hit <ENTER> to continue.”, press the enter key to return to the Studio.

2.     Examine the result to make sure it is correct.

a.   Open the output file. You will find that all of the lines have been moved over by a few spaces. Whenever the program changes a line, it inserts the original line in front, as a comment beginning with “;pre-6”. Spaces are added to the front of the rest of the lines to keep everything lined up. (These extra spaces and comment lines will be removed in a later step.)

b.   You should examine the changes to make sure that the new lines are correct. You may examine the changes by searching for the text “;pre-6”.

c.   Open the log file to examine the error messages.

If you see any warnings, you should examine the referenced lines in the output file and make sure that the code is correct. If not, you should correct the output file by hand.

Once you have finished converting your program files and your include files, you should try running the programs with the newer version of OmniMark.

Procedure for OmniMark Studio for Eclipse

  1. In OmniMark Studio for Eclipse, bring your updated program into Eclipse as before.
  2. Select the program in the navigator window and choose Run… from the Run menu to open a launch configuration. Specify the options you want, and click “run”

Procedure for standalone Studio (OmniMark 6)

Create a project file for each command-line:

  1. In OmniMark Studio, open your program file.
  2. Pull down the File menu and select “Create Project File” (or click on the “Create new project” button on the toolbar).
  3. Pull down the Edit menu and select “Project Options”. Use this dialog box to fill in the information that you had specified on the command-line.
  4. Pull down the Run menu and select “Start Debugging” to run the project (or press the “Start debugging” button on the debug toolbar). If your program takes too long to run in debug mode, you can use Run menu and select “Execute Project”. But first, make sure you save your project, your program, and include files. (To maximize speed, “Execute Project” reads files from disk, not from the Studio buffers.)
  5. See the OmniMark Studio documentation for more information on OmniMark projects.

At this point you should have very few syntax errors if any. Correct any of the remaining errors by hand, and then run your programs again. Make sure they produce the same results as your old programs under your old version of OmniMark. If you have any problems, or just want to understand this process better, see Appendix C: What the Migration Process Does.

Part 4: Cleaning Up

At this point, you have successfully upgraded your programs to work with the newest versions of OmniMark. Now you probably want to get rid of the comment lines added during this process. This step will take a few seconds for each file you are upgrading.

Procedure for OmniMark Studio for Eclipse:

  1. Open clean.xom in OmniMark Studio for Eclipse.
  2. In the Launch Configuration, specify the output file from to-six.xom as the input, and save the output to a new file.
  3. Run the program.

Procedure for standalone Studio (OmniMark 6)

  1. Open the project clean-six.xop.
  2. Edit the project options:
    1. The input file should be the should be the output file from to-six.xop.
    2. Save the output to a new file.
  3. Save the project file.
  4. Pull down the “Run” menu, and select “Execute Project”.
  5. When you see “Hit <ENTER> to continue.”, press the enter key to return to the Studio.

You can compare this output file to your original (pre-OmniMark 6) file with any line-by-line comparison utility. You will see that the only changes are the ones necessary to upgrade the syntax.

Appendix A: Quoted Variable Names

Prior to 5.3, OmniMark allowed you to quote your variable names if you preceded them with a herald (or a type-specific keyword like active or increment).

Without heralding, quoted variable names are indistinguishable from quoted text strings. For this reason, this feature was dropped in version 5.3.

OmniMark 7 re-introduces quoted names to support prefixing of symbolic operators. OmniMark from version 7 on uses a different syntax for quoted names so that they cannot be confused with text strings. In these versions, a quoted name must be wrapped in either #”…” or #’…’.

Programs which use quoted variable names will be automatically migrated to the OmniMark 7 syntax. These programs will be compatible with all newer releases, but they will not be compatible with OmniMark 6 without hand modification.

Appendix B: Hiding Keywords

Another potential problem is that a global variable declaration can “hide” a keyword. If your program did not have any variable declarations, then a variable reference always had to be preceded by a herald or a keyword that acted like a herald. So OmniMark was always able to tell the difference between your variables and language keywords.

From OmniMark V3 to OmniMark 5.2, you could use or omit the herald, as you wished, as long as you declared all of your variables. From OmniMark 5.3 on, variables were always referenced without a type herald.

When the variable name is the same as a keyword, OmniMark sometimes can’t tell which you mean. If this occurs, you may get an error message like:

     omnimark --
     OmniMark Error xxxx on line 1179 in file my-prog-1.xom:
     Syntax Error.
     ...  The keyword 'SDATA' wasn't recognized because there is a
     variable, function, or opaque type with the same name.

You can fix these types of errors quickly by doing a search and replace. Make sure you change only the variable references, though, and not the keywords too.

Appendix C: What the Upgrade Process Does

This section briefly describes some of the transformations that the upgrade program (to-six) does.

Global Variable Declarations and Translation Types

The first step of the migration process determines whether global variables must be declared, and if so, generates them.

It does this by reading the program file, and all of the files that it includes. If there are no global variable declarations already, but variable references are detected, then a list of global variables will be generated and inserted at the beginning of the program file.

Global variable declarations are not generated for include files, because they would duplicate the ones generated for the main programs.

At this time, the keyword “down-translate” will also be placed at the top of the program if it is needed.

Pattern Assignments and Comparisons

One thing the program does is correct the use of the equals symbol (=).

Before OmniMark V3, the “=” symbol was only used for pattern assignment. V3 introduced a new symbol for pattern assignment (=>) and used “=” for comparisons. However, the use of “=” for pattern assignment was still supported for backwards compatibility.

Needless to say, you shouldn’t use the same symbol to mean different things. Since version 5.3, OmniMark issues warning messages wherever the “=” symbol is used for pattern assignment, with a view towards eventually removing this use from the language. contains a function that looks for solitary “=” symbols in your program and converts them either to “is equal” (the old form of the equality comparison) or to “=>” (the new form of the pattern assignment operator). Either way, ambiguity is eliminated at this stage. The “is equal” construct will be changed back to “=” in a later phase.

Heralds and Mods

Removing type heralds is the final and most extensive part of the process. This is done by a function in

In addition to removing heralds, this step also replaces some deprecated constructs with their modern equivalents. This includes:

  • Changing “set counter” and “reset” to “set
  • Changing “set buffer” and “set stream” to “set
  • Removing “counter“, “stream“, and “switch” everywhere except in variable declarations
  • Converting the “and” form of variable declarations to a sequence of declarations. Omnimark allows syntax like:
    local switch x and counter y

    This is converted to:

    local switch x local counter y
  • Converting the verbose forms of comparisons (“is/isnt equal“, “is/isnt greater-than“, “is/isnt less-than“) to the symbolic forms
  • In some contexts, converting the shelf names “sgml” and “output” to “#markup-parser” and “#main-output
  • Removing the heralds “pattern” and “another

Quoted Variable Names Again

You may find that this step results in messages like:

WARNING: Quoted variable name (stream "my-var") -
  replacing with v7 syntax.

That means that "my-var" may be a quoted variable name here. The variable will be changed to use the OmniMark 7 syntax for quoting names (#”my-var”).

You may wish to examine the modified lines of code, and make sure that it really is a variable reference. If you are migrating to OmniMark 6, you will have to remove the “#” character and the quotes.

When you are migrating to OmniMark 6, make sure that the unquoted name is legal. It must begin with a letter or a character whose numeric value is between 128 and 255, and the subsequent characters must be either one of those, a digit, or a period (.), hyphen (-), or underscore (_). Any other characters must either be replaced or removed.

You will have to be careful with quoted variable names inside of macros. The sequence “%@(…)” in a quoted variable name means that a macro argument is being spliced into the name at that point.

Using macro arguments to build variable names was one way of simulating structures in early OmniMark programs. Now, a better way is simply to use keys to simulate field referencing.

In any event, you cannot use macros this way in OmniMark 6. The best way to correct this is to pass in the complete list of variable names that the macro operates on, instead of just passing in a piece of a variable name.

Duplicate Variable Names

Finally, with the removal of heralds, there is one other problem area that needs to be dealt with.

In most languages, when you define a variable in a local scope with the same name as a variable in the outer scope, the inner variable hides the outer one. In OmniMark, prior to 5.3, you could still reference the outer variable by heralding it, provided it had a different type than the inner one.

Usually, the only time a name is reused in a program is when one of the variables has a very short lifespan, only being used to capture a value and transfer it to the final destination variable, and the programmer uses the same name because it’s easy:

     find digit+ => id ":" any-text+ => value "%n"
       local counter id
       set id to pattern id

This can be easily corrected by changing the name of the pattern variable.

The file contains a function that can detect some of these variable name reuses. It also attempts to warn about variables declared with the same name as another variable visible in the same scope.

These checks are heuristic, and can be fooled by macros, or by declaring the variables in one file, and using them in another. However, these checks should find many of the common cases.

How to prepare your content for conversion to DITA

Presented by Helen St. Denis, Conversion Services Manager | Stilo

So, the decision is made to implement DITA and the content audit is done, now you need to get your content into DITA. So, what do you really need to do to your content before you start the conversion process? Maybe not as much as you may think!

In this webinar we discussed …

  • What is the most useful thing to do pre-conversion?
  • What kind of things influence what else you might want to do pre-conversion?
  • What are the common trouble areas in the different source formats?
  • What is best left for post-conversion?

View recording (registration required)


Meet the presenter

Helen St. Denis

Helen originally joined Stilo as a technical editor in the documentation team, and now works closely with Stilo Migrate customers, helping to analyse their legacy content and configure appropriate mapping rules. She also provides Migrate customer training and support.
Over a period of several years, Helen has helped Migrate customers to convert tens of thousands of pages of content to DITA and custom XML.

Helen holds a BA in English from St. Francis Xavier University in Antigonish, Nova Scotia and has pursued graduate studies at Queen’s University in Kingston Ontario.