Developing SGML/XML content processing applications

OmniMark is a proven technology, designed from the ground up to efficiently address the full range of challenges encountered when building enterprise content processing applications. The great advantage of OmniMark lies in the fact that is is one tool that can be used to process content in all its forms and to do so in a way that shortens development times, reduces resource demands and maintains high-speed performance even when confronted with volumes that cripple other tools.

Content processing challenges

Content processing comes with a very specific range of challenges. Some of these challenges are quite unlike those that are encountered in other branches of data processing and, if these challenges are to be handled successfully, the appropriate tools and techniques will need to be applied. While it may be tempting to ignore these differences and to apply alternatives instead, the history of content processing investments would seem to indicate, not surprisingly, that this strategy frequently encounters serious problems. Some of the challenges that should be given due attention when entering the content processing domain include those associated with managing context, complexity, change and cost.

Managing context

Content has meaning in context. This is true at the highest and lowest levels and means that, at the lowest levels of manipulating content, merely matching patterns or altering markup structure is not enough. The key is to recognize and process patterns in context. OmniMark’s unique hierarchal processing model ensures that the developer is in control of the context at every moment, even in the most complex of nested constructs. Add to this OmniMark’s ability to combine markup parsing with pattern matching in a single seamless operation, then OmniMark provides a truly universal content processing solution. From the perspective of the user of content products, at the highest level, the results can be very compelling – the right content provided in the right format at the right time. When the power of OmniMark is being leveraged, managing context is not left to chance.

Managing complexity

Issues with complexity affect content processing in many ways. Complexity is, in reality, a defining quality of content as it is intrinsically unpredictable in numerous ways – scale, composition, and uses. A truly effective tool would provide answers for handling each of these uncertainties. This is where many patchwork solutions fade away, and where OmniMark shines.

Addressing the challenges of content scale

Issues of scale affect content processing in two major ways: content processing applications must be able to efficiently deal with both instances exhibiting great individual size and with very large volumes of instances. And on a bad day, you will be confronted by both simultaneously. The size of the data sets found in content processing operations is virtually unmatched in data processing. Of course, many application types encounter large data sets in the form of very large collections of discrete data points. Only in content processing, however, does one regularly encounter individual items of data that are metabytes or even gigabytes. When it comes to volume, content processing events frequently have a direct relationship to business transactions, meaning that in many industries, content instances can proliferate at a prodigious rate. Because it is tied to business transactions, content processing must achieve and maintain demanding levels of reliability and performance when handling these volumes – something that patchwork solutions simply cannot do.OmniMark’s streaming architecture is uniquely suited to efficiently processing data of all sizes and volumes without concerns arising about expanding resource requirements. And OmniMark’s streaming architecture is fully integrated with its context handling facilities, meaning that no change of algorithm is required to handle exceptionally large data sets measured by size or quantity.

Addressing the challenges of content composition

Content processing also encounters problems that grow rapidly with the increasing variety of content composition. New forms and sources of content are constantly being acquired, and delivery demands grow ever more sophisticated and immediate. The natural and proper reaponse to this dynamic complexity is to divide and conquer, breaking the solution into individual components that execute sequentially and that each address a dimension of the content composition, Implementing such solutions, however, can be expensive when it comes to execution with large amounts of time and resourcecs being consumed in data buffering. OmniMark solves this problem by allowing developers to chain together individually specialized and reusable content processing components in a fully streaming fashion with no buffering of data. The twin objective of productivity and performance are, in this way, kept in balance.

There are other dimensions to the challenge of content composition, namely the integration of numerous data types and the emergence of the requirement to apply processes across a heterogeneous data set in order to achieve a single, finished outcome. In these circumstances, there is a pressing need to be able to perform a wide range of operations on the content, including numeric computations and binary data manipulations. Just as the simple mixture of text and markup processing is inadequate to really process content, so it is also important that developers have access to a complete environment capable of working with all possible content forms. That integrated environment is OmniMark.

Addressing the challenges of content uses

The final dimension of complexity requiring management is the variety of uses to which content is being put within the modern enterprise. A content processing application can be required to feed products of numerous descriptions to people working in numerous environments. For some, print products will serve best. For others, it will be syndicated dissemination to mobile devices. More recently, and increasingly commonly, the consumer of the content products will be other applications, which in turn engage people performing their functions. Some content deployment scenarios are scheduled while others are initiated purely on-demand. The latter of these is the more challenging and, unfortunately for some, the more common. When on-demand requests for content are made, the requirements have been set, in a highly contextual manner, and the user is awaiting a response. This is the wrong time for performance to become an issue and it is the wrong time to simply serve up a large, undifferentiated mass of content. Meeting the complexities associated with keeping pace with the on-demand infrastructure can simply be too much for patchwork approaches to content processing. It is OmniMark that can scale to meet these demands.

Managing change

Content processing applications are never static. They evolve constantly as new needs appear and new sources of content are brought on-line. High value content is subject to a continuous process of adaptation as it is updated to reflect external changes. Managing change is a matter of precision in content processing – being able to propogate changes quickly, accurately and confidently. In order for this to be possible, a number of elements must be aligned. Developers must be able to work in a highly productive way so that new requirements can be quickly isolated, validated, demonstrated and put into production. Content managers must be able to orchestrate all of the activities associated with adapting content and doing so with a maximum use of automation so that valuable people are engaged to do only that which is absolutely necessary. Where change takes the form, as it often does, of adding options that do not negate previous content or services, the process is one that continuously adds to the complexity of the content and the associated processing environment. All of these occurrences contribute to content’s natural tendency towards redundancy, which inevitably undermines productivity.

OmniMark’s unrivalled ability to process content of all forms make it the tool of choice for addressing all of the challenges associated with managing dynamic content. OmniMark is especially proficient at normalizing content – isolating and removing redundancy. This is a critical capability if controlling the lifecycle cost of an evolving content repository is important. OmniMark also delivers a variety of additional capabilities, such as capturing and maintaining metadata, that are essential tools for the effective management of change.

Managing Cost

Uniquely suited to handling the challenges of managing context, complexity and change, OmniMark provides exceptional performance and productivity for content processing solutions. Together, these features address key business drivers within content-intensive industries: cost-effectiveness and return on investment.

Digital publishing solutions developed using OmniMark exhibit a remarkable ability to contain the costs associated with creating, maintaining and publishing content. By providing effective capabilities for processing and managing the code features of content, OmniMark addresses the sources of cost increases at their root. As an example, normalization processes implemented using OmniMark can eliminate significant amounts of redundant content – and this has been shown, in many environments, to be the single largest factor in realizing savings. It should be noted that performing normalization well requires the type of content processing precision that only OmniMark can deliver.

The other side of the cost equation is dominated by returns on investment. The investments in question here are those made in the content assets upon which organizations typically rely. These investments will dwarf those made in the technical infrastructure put in place to process and manage the content. Returns on these often substantial content investments come from the successful delivery of the content products needed by the organization and its stakeholders. Increasing the number, variety and precision of these products in a manner that more effectively satisfies the originating business needs is how investments in content achieve a return. For this to happen, the processes operating on the content must be able to easily facilitate the rapid development and deployment of new content products to meet new requirements as they surface. This is where, once again, OmniMark demonstrates its worth. So effective is OmniMark in processing content to realize new products that it is one of the very few technology investments that not only pays for itself, but makes the ongoing investments in the content itself a source of continuous returns.

More about OmniMark