WEBINAR | Automatically Convert Content To DITA XML & Deduplicate Exact Topics
June 23, 2021 @ 11:00am EDT
Our Presenter:
TJ Dhaliwal | Technical Sales Product Specialist
Migrate is a well-established cloud service that enables technical authoring teams to convert content to DITA XML from source formats including HTML, Word, FrameMaker, RoboHelp, InDesign, DocBook, Flare, AuthorIt, MindTouch, Excel, XML and SGML. In version 4.0, we added the capability to identify and eliminate redundant topics as part of the conversion process.
In this webinar, we will quickly demonstrate how to automatically convert your content to DITA 1.3 and how this content can be automatically deduplicated using Migrate’s latest feature – Exact Topic Deduplication.
Facilitate Translation with Planning
By Peter Fournier
“Plan early, plan thoroughly”
Planning for translation
Joann Hackos, Grande Dame of DITA, claimed in 2011 that just switching to DITA could save 20% on translation costs. A competent Content Management System could easily save another 20% (Joann Hackos’ Keynote on DITA and Translation Management).
If you need translation or localization, DITA seems like a slam dunk decision. Might as well get started with DITA in English and develop the translation part later, yes?
No, don’t do that. You really must plan to include translation into your DITA workflow from the start or you can fail to deliver the expected cost savings promised at the start of the migration to DITA project.
Basic issues
The translation and localization industry has migrated to its own XML standards. There are several:
- XLIFF (XML Localization Interchange File Format)
- TMX (Translation Memory eXchange)
- TBX (TermBase eXchange)
- SRX (Segmentation Rules eXchange)
- ITS (Segmentation Rules eXchange)
- and no doubt others …
All of these standards play well with DITA but can also play very badly depending on the tools, workflow, CMS or CCMS and outside suppliers you will use in your implementation of DITA + translation.
This diagram shows some of the complexities involved in an integrated workflow.
Without translation the diagram looks more like this:
So, is this complexity of translation good news or bad news? It’s excellent news with proper planning. What are the steps required to achieve excellent results?
Step One: It’s been done before …
DITA plus translation has been done before. Seek out others who have already implemented the transition. You will find that there are many ways to approach the problem. Some of the ways will be similar to your situation, others won’t be, but the more people you talk to the more ideas you’ll have to work with.
Step Two: Talk to translation companies
Before talking with a translation company, you need to prepare a sample of your content, in DITA, that represents the full scope of the content you will want translated. To keep all conversations on track you likely want to use the same sample you use for testing Stilo Migrate and OptimizeR.
Talk to several companies. You will find each one has different procedures, tools, and content specialties. You are looking for the company with the best fit with DITA, your content, and the capacity for the expected translation load. To minimize the technology load at your end you will most likely want to deal with a company that can handle DITA as the input to translation workflow.
However, some translation companies require XLIFF files as input to the process. If XLIFF is required ask the company about handling the conversion from DITA to XLIFF and back again. If you must generate XLIFF internally there are tools, like Fluenta, that may enable the conversion in both directions reliably.
One of the features you will want to explore with these suppliers is “translation memory”. Modern translation software can remember what has not changed and so does not need to be translated again. However, this memory can be complicated. Can I send the DITA files for an 800-page book to the supplier and expect them to manage the translation memory? Some CCMS’ and CMS’ can handle sending just the changed files. Does that fit in with the translation supplier’s workflow?
Step Three: Talk to Component Content Management System (CCMS) and Content Management System (CMS) companies
If you are already using a CCMS or CMS you may find the system is already optimized for efficient translation from a DITA source.
If you are in the market for a CMS or CCMS pay close attention to how the system will help you manage translation. Some systems can deal directly with translation companies over the internet.
In the pilot phase of a migration to DITA plus translation you likely don’t need a CMS or CCMS right away; you can use the file system instead. Using the file system in the early stages has two main advantages. The first is that it’s easier to conceptualize how all the pieces fit together before introducing a database system — it aids in learning. This will be a benefit when selecting a CCMS or CMS. Second, it makes interaction with outside suppliers easier; ZIP a folder with your content and send it out for review and discussion.
Here’s a typical folder structure for DITA in the filesystem:
Step Four: Translation companies have important things to say about writing
Translation may be necessary but is never easy. Ask your translation supplier(s) for advice on how to write English content that makes good translation possible. Grammar varies dramatically from language to the next. For example, in French tables are feminine but floors are masculine. In English tables and floors have no gender. That’s a trivial example. Apparently Finnish has 15 inflections for nouns. This thread, Product names and reuse: a very serious anti-pattern when translating documents, on the OASIS site gives an excellent summary of some of the problems related to DITA reuse. In other words, authoring, at the most basic level, can make translation more expensive. Be sure to explore this issue with potential translation suppliers.
Step Five: Translation companies have important things to say about DITA
Reuse, especially the more advanced reuse options in DITA, can cause problems and expense during translation. CONREF, DITAVAL, CONKEYREF, HREF all have special caveats when interacting with translation companies. It’s important to understand these limitations and opportunities before committing to a flavor of DITA or a specific translation partner.
Step Six: Start small
Back in step two I recommended developing a representative sample, in DITA, of your content. Use and refine that sample in all your dealings with suppliers. Having all the conversations based on the same sample will make the final choices much easier.
Starting small also has the benefit of making mistakes easier to fix. Just like Stilo Migrate‘s ability to let you iteratively approach the perfect migration from MS Word, say, to DITA, a good sample of content will allow you to manage the iterative approach to a complete DITA plus translation system in your company.
Conclusion
Converting from a legacy authoring platform to DITA is one thing, converting to DITA plus translation is entirely different. Suddenly you are not trying to optimize for internal requirements but internal and external optimizations simultaneously. The more planning you can do for this external and internal optimization the better. It may add six months to the planning phase of a transition to DITA but will pay off big time in avoided costs and maximized efficiency.
Not recommendations but good sources …
The following are the best sources of general information on translation I’ve found this week. Please don’t take these links or other links above as recommendations! They’re just good information I’ve found.
How To Translate DITA Projects [Step-By-Step Guide]
About the Author
Peter Fournier has extensive experience in the BNR/Nortel documentation space. In the late 80’s and early 90’s he studied the feasibility of moving the Technical Documentation to SGML. He later developed, with his team and advanced online help system for Network Management and other software produced by Nortel. The core of the online help software was based on SGML principles of containerization but only had five or six base elements, and a lot of attributes. It was engineered to be compatible with SGML so the group had no trouble generating valid XML when the draft standard appeared in late 1996 or early 1997. In 2005 he discovered, with great joy, DITA XML. He introduced DITA to JDSU (now Lumentum) in 2008 and served as DITA manager and technical prime until 2018. Between 2010 and 2014 he also found time to get a startup going and developed software to assist groups of 1 to 20 people to get into DITA and manage all the background complexity, including publication. As of 2021 he’s back in the DITA space and loving the Stilo philosophy of making highly complex transformation software easily accessible to customers.
Simplifying Complex Conversions to DITA XML
By Peter Fournier
DITA XML conversion projects can fail simply because they get too complicated. Major stumbling blocks leading to complexity include:
- Too many styles, sometimes running into the hundreds in large document suites.
- Badly applied styles for example a “Heading 1” followed by a “Heading 4”.
- A “Normal” paragraph manually formatted to look like something else.
- Inconsistent content such as multiple ways to label procedures, for example “To clean XYZ” and “Cleaning XYZ”.
Because of these and many other problems it has been very common to cleanup a suite before converting to DITA XML and often doing post-conversion cleanup as well. Pre- and post-cleanup can be the most time consuming, complex, hard to manage and expensive part of a conversion project.
Enter Stilo’s Migrate software. For instance, suppose we have a list in a document that looks like this:
In Microsoft Word, or any other WYSIWIG editor, all the program knows is that
- this text contains a heading (maybe),
- paragraphs, some of which are in a numbered list,
- and formatting to make the text look the way it does.
Migrate assists in the automated conversion to DITA XML by helping you create conversion rules and enhance the final DITA XML output. For example, you can create:
- A rule that detects the fact that the first word in the heading ends in “ing”, indicating it is probably a task heading. It, and the subsequent text will be placed in a “task” topic. The paragraph will be placed in a “title” element.
- However sometimes a heading isn’t really a heading it just looks that way, it’s a manually formatted “Normal” paragraph. That’s OK. You can create a rule that detects the manual formatting and the applies rule 1 above.
- A rule that adds a registered trademark symbol to the first occurrence of the word “QuickTrace”.
- A rule that detects that some paragraphs are numbered and that these should be wrapped in “steps/step/command” elements.
- A rule that detects that the word following the word “Click” is likely a UI element. The word following “Click” will be wrapped in a “uicontrol” span.
- However, sometimes the word “Click” is followed by a word that is in turn followed by a “→” character in which case the whole string with embedded arrows will be wrapped in “menucascade/uicontrol” elements.
- A rule that detects that a paragraph before the actual steps should be wrapped in a “context” element.
- A rule that detects that the last paragraph is indented at the same level as the numbered list and is not followed by another list item. Therefore, it should be wrapped in a “result/p” element.
The final result of the conversion to DITA will look like this:
So, starting with just formatted paragraphs you can create enriched, valid DITA XML, as you can see in the picture — and avoid, perhaps eliminate, pre- and post-cleanup.
To get a better idea of just how enriched the DITA XML really is here is the file that generated the improved output:
Note that the “→” character isn’t present in the XML, it’s added as part of generating the output.
The next time Migrate encounters a task formatted this way it will know what to do. That makes your conversion project simpler, more accurate and faster as you go along.
About the Author
Peter Fournier has extensive experience in the BNR/Nortel documentation space. In the late 80’s and early 90’s he studied the feasibility of moving the Technical Documentation to SGML. He later developed, with his team and advanced online help system for Network Management and other software produced by Nortel. The core of the online help software was based on SGML principles of containerization but only had five or six base elements, and a lot of attributes. It was engineered to be compatible with SGML so the group had no trouble generating valid XML when the draft standard appeared in late 1996 or early 1997. In 2005 he discovered, with great joy, DITA XML. He introduced DITA to JDSU (now Lumentum) in 2008 and served as DITA manager and technical prime until 2018. Between 2010 and 2014 he also found time to get a startup going and developed software to assist groups of 1 to 20 people to get into DITA and manage all the background complexity, including publication. As of 2021 he’s back in the DITA space and loving the Stilo philosophy of making highly complex transformation software easily accessible to customers.
LavaCon 2021 | October 24-27, Virtual
We will be exhibiting virtually at LavaCon 2021, the19th annual content strategy conference that is scheduled for 24–27 October.
LavaCon started in Hawaii (hence our name) to help organizations generate revenue and reduce costs using state-of-the-art content technologies.
However, LavaCon is more than just a conference. It’s a gathering place where content professionals share best practices and lessons learned, network with peers, and build professional relationships that will last for years to come.
“LavaCon is my favorite content conference—it’s the most cutting edge and best mix of people.”
JB, Director of Content, Juniper Networks
The 2021 Featured Speakers include:
- SHEILA O’HARA, Principal Content Design Manager, Microsoft
- STACEY KING GORDON, UX Content Strategy Manager, Google
- MARA POMETTI, Lead AI Content Strategist, IBM
- KAT PARK, Designer, NASA Jet Propulsion Laboratory, Caltech
- RACHEL ROUMELIOTIS, VP of Content Strategy, O’Reilly Media
- EMMELYN WANG, Global Business Development Leader, Amazon Web Services
- EESHITA GROVER, Director, Marketing, Cisco Systems
- STEFAN GENTZ, Senior Worldwide Evangelist, Adobe TCS
- ERICA JORGENSEN, Senior Content Manager, Microsoft
- NOZ URBINA, Omnichannel Content Designer, Urbina Consulting
- SELENE DE LA CRUZ, Content Design Director, Mastercard
- COLIN BUDD, Global Design Strategist, IBM
- KAREN BROTHERS, Content Management Specialist, 3M
- JENNIFER KEMPER, Director, Content Strategy, AmerisourceBergen
- MARLI MESIBOV, Lead Content Strategist, Verily Life Sciences
- STACEY GORSKI, Director, Medical Excellence & Strategic Projects, Sanofi Pasteur
- DAVID DYLAN THOMAS, Author, Design for Cognitive Bias
- PHYLISE BANNER, Associate Director, Instructional Design, Emeritus
- DR. CLARK SHAH-NELSON, Assistant Dean, Instructional Design and Technology, University of Maryland
- NOEL WURST, Sr. Manager, Communications, SmartBear
Visit the conference website for further information and to register.
Controlling Your Evolution To DITA XML
By Peter Fournier
So, you’re convinced that moving your documentation to DITA XML is a great idea. The move will increase productivity, enable reuse and allow multichannel publishing to many platforms. It will also future proof your authoring and publishing – no more getting captured by tool vendors! Yeah!
But converting your existing documents to DITA can be a daunting task. Documentation is likely the messiest data ever created. It’s very common to discover that all those MS Word documents that should be moved to DITA are not identical from a data point of view: different documents use different styles, very often headings are actually “Normal” but manually formatted to look like a heading, a numbered list can be a procedure in one place or just a list in another. From a structured authoring point of view, source documents written in WYSIWYG tools are not your friend.
Because of the complexity of source documents, companies moving to DITA usually choose to get some help getting from a legacy authoring tools into DITA. There are several routes to choose from:
- Over-the-wall conversion
Send your documents to a company specializing in conversion. They take your documents, convert them, then send them back for QA. The QA usually involves several conversion and refinement cycles before achieving acceptable results. - In-house conversion with the help of consultants
This can deliver better results faster if only because it facilitates QA, conversation, discussion and exception handling. Unfortunately, when the consultants leave they take with them the knowledge of exactly how the work was done. You lose precious conversion experience that could have been used in your next project. - In-house conversion using off-the shelf tools
This can be an excellent solution since you retain the knowledge of how to convert documents to DITA and making the next 1,000 pages of conversion easier to do. Unfortunately, off-the-shelf tools often require expensive cleanup of the source documents and are not sophisticated enough to do a complete conversion, including advanced DITA features, all in one go.
Taking control with Stilo’s Migrate
Migrate smooths the path from messy WYSIWYG documents to advanced DITA XML. Here’s how.
- Migrate has the same advantages as over-the-wall conversion
You upload your documents to Migrate in the cloud and convert them there with help from Migrate/DITA experts. They will assist you with initial conversions and you always pay a fixed rate per thousand words per document. Subsequent cycles through the conversion process are free, in-house, and fast – refine and QA as much as you need, for free. - Migrate has the same advantages as working with consultants
Stilo’s business model is not about charging per hour as though they were normal conversion consultants. No, Stilo’s business model is assisting you to become your own conversion expert while always being available in the background to help with difficult problems. There is a consulting fee to be paid but it’s not a long term-contract for a fixed amount – you only pay Stilo for the consulting you require to get passed the sticky bits. - Migrate has the same advantages as using off-the-shelf tools
Migrate has more depth and capability built-in than any other off-the-shelf tool. This minimizes the amount of cleanup required before conversion and enables conversion to DITA with advanced features. Also, Migrate captures the process for converting documents in “rule sets”. Any one rule set can be applied to any number of documents. So, documents from a specific group or year in your company might require different conversion rules than documents from another group or year. That’s fine. Rule sets allow you to permanently capture all of the learning and knowledge about conversion as you go along.
Control achieved!
Stilo’s Migrate is unique in the industry. It is finely tuned, based on extensive experience over 30 years, to satisfy real business objectives. Among them are:
- Proof of concept before commitment
Migrate allows companies to start their migration to DITA with small bite-sized projects done for minimal cost. If the project is successful companies can expand to larger document suites without entering a new project or dealing with a new company. Rule sets guarantee capturing experience in-house so that what is learned in converting 30 pages can be applied and refined in the next 100 pages, the next 1,000 pages, and the next 10,000 pages or more. Migrate scales very well within typical budget processes. This is budget control. - Project management
Stilo’s Migrate model fits in well with standard SW project management. In the last 1,000 pages converted did we experience more or less requirement for consulting with Stilo experts? In the last thousand pages did we have to create new rules at a higher or lower rate than the previous 1,000 pages? These measures are very similar to standard SW measures such as defects per thousand lines of code. This is project control. - Source coverage
You have to convert source documents saved in DOCBOOK, HTML, HTML5, DOCX and MIF. Can Stilo handle all these sources? Yes, of course. This is technology control. - Learning and knowledge capture
Are we climbing the corporate knowledge ramp? Can we see our way to no longer needing hand holding from Stilo? Of course. That is the Stilo business model. This corporate business intelligence control.
Control comes in many flavors and at different levels in a company. Stilo’s Migrate helps achieve control at every level.
About the Author
Peter Fournier has extensive experience in the BNR/Nortel documentation space. In the late 80’s and early 90’s he studied the feasibility of moving the Technical Documentation to SGML. He later developed, with his team and advanced online help system for Network Management and other software produced by Nortel. The core of the online help software was based on SGML principles of containerization but only had five or six base elements, and a lot of attributes. It was engineered to be compatible with SGML so the group had no trouble generating valid XML when the draft standard appeared in late 1996 or early 1997. In 2005 he discovered, with great joy, DITA XML. He introduced DITA to JDSU (now Lumentum) in 2008 and served as DITA manager and technical prime until 2018. Between 2010 and 2014 he also found time to get a startup going and developed software to assist groups of 1 to 20 people to get into DITA and manage all the background complexity, including publication. As of 2021 he’s back in the DITA space and loving the Stilo philosophy of making highly complex transformation software easily accessible to customers.
How to Convert SGML to XML
How to Convert SGML to XML
- Take our self-study “Introduction to OmniMark” training course (materials and labs available upon request at https://www.stilo.com/omnimark-training/) which will teach you to:
- work effectively with the core capabilities of OmniMark
- use OmniMark’s powerful pattern-matching capabilities for text processing and converting content into XML/SGML
- use OmniMark’s markup parser to validate XML/SGML markup
- use OmniMark to process and enrich XML/SGML content prior to publication
- Write code to create an SGML to XML data conversion program
- Debug your program
- Run your program using OmniMark
- Retrieve the output results from your SGML to XML conversion program
How To Handle Graphics When Transitioning To DITA XML
How To Handle Graphics When Transitioning To DITA XML
If using Microsoft Word:
- Extract image & XML data
- Understand where images are stored & how they are named
- Learn to use captions or alt text to create filenames
- Confirm caption markup appears in XML
- Mock cropping effects using properly sized & positioned viewboxes
If using FrameMaker:
- Convert all graphics to SVGs
- Determine if the SVG will have linked or embedded images
- If embedding images, convert the image to a supported format
- Recreate images with callouts by parsing the .mif file for graphical information
In many cases, moving to DITA can be an overwhelming task, especially if your content is image-heavy. So learning how to properly handle images will not only save you time and energy, but also provide reassurance that your images are correctly transitioned. In this talk, we will look at many scenarios that you may encounter when moving your images over to DITA.
Free Conversion Offer
Upload your sample document (20-30 pages) and we will convert it to DITA free of charge!
We review the conversion results with you, and let you retain the output for your own testing purposes.
Press Release - Migrate 4.0 Introduces Exact-Match DITA Deduplication
Stilo International Announces Official Release of Migrate 4.0 and Introduces Exact-Match DITA Deduplication
OTTAWA, Ontario, March 8, 2021 – In an effort to maximize DITA reuse potential, Stilo (https://www.stilo.com) adds exact-match deduplication to the latest release of its well-established cloud service.
Migrating documentation to DITA has many benefits, chief among them being the opportunity to reduce cost and improve quality by reusing content. A documentation set might have reuse potential as high as 50%, however, finding good reuse candidates may prove to be difficult, especially when dealing with thousands of topics. Hence the need for automation.
The first step in the deduplication process is for the user to take the newly created DITA files and put them into a collection. At this point, the automated system compares each topic against the others in that collection. Once redundancies are identified, the system selects the canonical version, automatically eliminates redundant copies, and then updates topics and associated maps.
“Up until today, our Migrate conversion platform has focused on converting legacy content, such as HTML, Word, and FrameMaker, to DITA XML” says Bryan Tipper, CEO of Stilo. “With this new deduplication functionality, Migrate can now identify and eliminate redundant topics as part of that conversion process, and utilize the reuse mechanisms that were intended by the DITA architecture.”
About Stilo International
Stilo develops tools to help organizations automate the conversion of content to XML and build XML content processing components integral to enterprise-level publishing solutions. Operating from offices in the UK and Canada, Stilo supports commercial publishers, technology companies and government agencies around the world in their pursuit of structured content. For more information, visit https://www.stilo.com/about.
Media Contact
Bryan Tipper
+1 (613) 745 4242
[email protected]
Related Links
https://www.stilo.com/2020/09/29/automatic-exact-topic-deduplication/
https://www.stilo.com/migrate-dita/
https://cdn.stilo.com/wp-content/uploads/2021/03/Migrate-4-Exact-Match-Deduplication.jpg
SOURCE: Stilo International
Free Conversion Offer
Upload your sample document (20-30 pages) and we will convert it to DITA free of charge!
We review the conversion results with you, and let you retain the output for your own testing purposes.
2020 Accounts Now Available
We are pleased to announce that the 2020 Accounts are now available. The password has been sent via email to all shareholders who have registered with us on our website here.
If you are an existing shareholder and would like to otherwise request a copy of the accounts, or have follow-up questions that you would like to raise, then please contact us at [email protected].
IXIASOFT Announces the Acquisition of AuthorBridge from Stilo International
18 January 2021
IXIASOFT announces today that it has acquired AuthorBridge from Stilo International as per the agreed press release:
IXIASOFT addresses a growing market of non-DITA authors, while reinforcing its leading position in the global CCMS market
IXIASOFT, a leading DITA CCMS software company based in Montreal Canada, announces today that it has acquired AuthorBridge from Stilo International, a UK-based provider of software tools helping organizations automate the conversion of content to XML.
Developed in collaboration with IBM, AuthorBridge is a DITA-based web editing tool providing SMEs with a guided and fluid authoring environment. AuthorBridge is specifically designed for users with no knowledge of DITA or XML. This tool has helped organizations to efficiently implement authoring for professionals in marketing, engineering, training, and support.
Increased time-to-market pressures has led organizations to rely on various internal resources to produce high-quality content. And this trend has caused a growth of non-DITA authors in the CCMS market. The addition of AuthorBridge inside of IXIASOFT’s product suite will allow it to offer supplementary solutions to better address this new market segment, while strengthening its global position in the CCMS marketplace.
IXIASOFT will continue to offer advanced editing capabilities for DITA experts through its current product integration with the leading XML editor, Oxygen.
“We are pleased to add AuthorBridge to our IXIASOFT product line. This is a great opportunity for us to grow our product offerings and further address a segment of non-DITA experts that need to contribute their knowledge quickly and easily,” says Eric Bergeron, CEO at IXIASOFT. “And with our CCMS moving toward a web-based application to offer authors an enhanced user experience, this acquisition is aligned with our overall vision to provide comprehensive and user-friendly CCMS products to the techcomm industry.”
“AuthorBridge was developed to offer an intuitive authoring experience for subject matter experts with little to no knowledge of XML” says Bryan Tipper, CEO at Stilo. “We are incredibly proud of its market acceptance, but have realized it would be best leveraged if offered through a complete CCMS solution. We are very pleased that IXIASOFT has decided to continue with its product development, and look forward to its future success.”
About IXIASOFT
Founded in 1998, IXIASOFT is a trusted global leader in the XML content management software industry. Its signature product IXIASOFT CCMS is an award-winning, end-to-end component content management solution (CCMS) that has been deployed by industry leaders like Mastercard, Ericsson, Komatsu, Omron, Qualcomm, and SAP®. For more information, visit: http://www.ixiasoft.com.
About Stilo International
Stilo develops tools to help organizations automate the conversion of content to XML and build XML content processing components integral to enterprise-level publishing solutions. Operating from offices in the UK and Canada, we support commercial publishers, technology companies and government agencies around the world in their pursuit of structured content. For more information, visit https://www.stilo.com/about.
Note to Stilo shareholders
The transaction was executed December 22nd 2020 and a public announcement date of January 18th 2021 was agreed with IXIASOFT. The financial consideration received will be included in our 2020 accounts.
ENQUIRIES
Stilo International Limited
If you are an existing shareholder and have follow-up questions that you would like to raise, then please contact us at [email protected].