If you’re new to DITA conversion projects or you’re planning on converting content soon, then you should give some thought to all the pieces that go into a successful conversion. A well thought out content strategy will help guide you through the process of conversion, but if you haven’t considered all the minutiae, this article will help fill in the blanks.
Review the conversion

Make sure that you take a look at the converted content to check for the following items:

  • You captured all content you wanted, including conditional content, variables, and document details (subtitles, document dates and identifiers, version numbers, etc.).
  • The content was chunked correctly for your needs. Were sections used where you wanted topics or vice versa?
  • Did tables convert correctly? Is all information there, including items like vertical text (as attributes)?
  • Are there any validity problems? Check validity. Publish using an XML editor like oXygen using the DITA Open Toolkit to XHTML to ensure you can publish without errors.
  • Are the file names the ones you want to use going forward? Is the structure the one you want to keep? If you make changes, make sure you modify all references as well or you’ll have broken paths. If you are using a quality CCMS, consider making these changes after you have uploaded your files (no path changes necessary).
  • Did you leave the chaff behind? If you still have any content that is redundant, useless, or outdated, remove it now. (Tip: Ideally, you should do the bulk of this work before you convert.)

Don’t assume anything. Having eyes on your converted content (all of it) is going to save you endless frustration down the road. Even the best conversion doesn’t always result in exactly what you need or want.

Gather metrics

If you haven’t already, make sure you have “before” metrics using your legacy tools and processes for how long it took to author new content, update content with changes, fix bugs, review content (peer review, tech editor review, SME review, QA review), translate content, and publish content. Remember that your legacy metrics will likely be based on books or chapters and that your new metrics will be based on topics (and maps for publishing). This means you need to have an idea of how many “topics” would have made up a chapter or book in legacy content so that you aren’t stuck comparing apples to oranges. Start keeping metrics on the items below as well to get an idea of how long it takes to implement your re-use strategy, for example.

Identify and fix content that needs work

It is an inevitable certainty that the content you convert will need some work once you get it back. Although you can and should do the bulk of your re-writing before you convert, once you see content in topics, you might realize that you have some key areas to clean up. Concentrate on the big ones, including:

  • Ensure you have the right topic type for the content. If your content is a procedure, then it should be in task topic. If it is meant for quick look up (tables, lists, alphabetized or organized), then it should be a reference topic. If the content explains what something is, how it works, or why the user should care about it, then it’s a concept topic. Those are broad distinctions, but make sure the content fits the topic type.
  • Do your tasks focus on user goals and is your content minimal? If your tasks cover functionality (using widget A, customizing widget B, etc.), then you likely suffer from badly focussed content.  One of the biggest benefits of DITA is topic-based writing and minimalism, which both enforce writing content that users actually need to get their daily work done. Take the time now to at minimum identify what needs work and ideally re-write the best candidates for improvements. It’s important to do this part before you work on your re-use strategy because it will give you all sorts of ideas on how you can re-use content.
Load into a CCMS (optional but usually recommended)

Even with only one writer and no translations, if you have a decent amount of re-use and any workflows to adhere to, the ROI on a CCMS (Component Content Management System) can be less than 3 years. The CCMS pays for itself in increased efficiency in a thousand different ways. The ROI is much shorter if you have multiple authors, standard amounts of re-use, and translate to one or more languages.  You should load your content into your CCMS before working on it more profoundly because once it is in there, it’s easier to find and update content, apply re-use strategies like keys, conditions, and conrefs, and generally work with your content. It also gives you the opportunity to get to know the ins and outs of the CCMS. If you haven’t yet purchased a CCMS but you are shopping around for one, you can use your newly converted content as part of a demo or trial. Simply ask the vendor to include your content. Although most CCMSs are intelligent enough for you to point to a ditamap and grab all associated files, there are always some files that are not referenced in your map but need to be uploaded, managed, and versioned, including but not limited to:

  • Source files for graphics (Visio, Photoshop, SnagIt, etc.)
  • Legacy materials in PDF, HTML, etc. if you want to keep copies of them
  • Videos (including source files and supporting files)
  • Graphics that may not be used directly but that you want to keep because they are valuable assets and may be used in the future (not necessarily in the documentation)
  • Engineering documents
  • Strategy documents
  • Anything else that needs to be accessed by multiple authors, versioned, and/or never lost

CCMSs will store these as BLOBs (Binary Large Objects), so make sure you add the appropriate metadata to these files to make sure they are findable and filterable by authors, editors, and managers. 

Apply re-use strategy

Your re-use strategy could be something simple like inserting keys and keyrefs to automatically pull in glossary terms or something more complex like using conkeyrefs to pull in conditional elements as needed. At a minimum, you’ll probably want to use conrefs for frequently repeated content, like software/hardware states, warnings or cautions, content that must be standardized across most or all documents, menu options, definitions, and anything that you’d like to update in one place and use wherever needed.

Your re-use strategy should be planned out in advanced as part of your larger content strategy, but once you receive content back from conversion, you need to actually implement your re-use strategy. The more re-use you can get out of your content, the better it is in the long term because it means huge savings in updating content, reviewing content, and translating content, not to mention confidence that you will never have contradictory information. Keep in mind, though, that extensive re-use takes time to implement—it’s a huge cost-saving step that, yes, costs some initial effort but will directly impact and improve your ROI by months if not years. So put the effort in to applying your re-use strategy before authors start working on content. 

Apply metadata strategy

Metadata is like putting strings on your XML elements so you can make them dance. Like your re-use strategy, your metadata strategy is primarily planned out as part of your content strategy. Some metadata will be applied automatically during conversion (like conditional attributes) if it existed in the content submitted for conversion, but you’ll also need to introduce some new metadata. Metadata lets you introduce and manage:

  • Conditional content to provide custom outputs to users on specific platforms, with specific expertise, or with certain combinations of products.
  • Publishing controls like vertical text for table heading or customizing the content for mobile outputs.
  • SEO keywords to improve findability of topics by search engines.
  • Topic information like author, product, embedded help ID, version, topic status (like draft, in review, final, published, archived), content contributors, etc. If you need to track it, then use metadata.
  • Taxonomy so users can browse by subject.
  • Categories/keywords for media, including graphics and videos (this helps make them findable by authors too).
  • Pretty much anything else you can imagine that is in addition to rather than strictly inside the readable content.
Work on publishing

A depressing number of people converting for the first time are dismayed and shocked to find out that the publishing aspect of a DITA project is a completely separate and additional effort over and above the conversion and (usually) the use of a CCMS. Remember that separating content from formatting (as with any conversion to XML) means that you then must put some initial effort into creating the look and feel that you want in each desired output. It’s a big one-time effort (with occasional updates thereafter) that new DITA-ites understandably often overlook. Whether you publish using the DITA Open Toolkit, a publishing tool/engine like Adobe’s Publishing Server, a third-party website like MindTouch or Fluid Topics, a CCMS add-on, or a homegrown solution, the publishing aspect of a DITA conversion project takes planning, effort, and testing. As with most projects, the more planning and testing you do, the less work it will be. There are many aspects to publishing, but at a high level, it can be split into two distinct parts:

  • Converting XML to the output formats you need (HTML, ePub, PDF, etc.)
  • Delivering the content to the users (how will users access and experience the content?)

The second bullet could include a lot of work including building a website, designing a search algorithm, designing the interface, introducing ratings and commenting systems, integrating with Support or Knowledge Bases, implementing user access, allowing users to submit their own content, designing support for adaptive content, adding accessibility features, and much more. This is the single most forgotten aspect of the content lifecycle and arguably, one of the most important ones. Throwing a PDF or tri-plane help at your users is no longer meeting user demands, so we need to step up and design the entire content experience.


Although conversion feels like the end of a DITA project, there’s a lot of work to be done once you get your XML. Your content strategy will be your guide throughout this process and can help you avoid making costly errors or forgetting to plan for important parts of the project.