WEBINAR | Migration to DITA – The Atmel / Microchip Story

Morten Haaker, Staff Business System Analyst | Microchip Technology
Introduced by Les Burnham, CEO | Stilo International
Broadcast Tuesday, September 26, 2017

Are you thinking about moving to DITA? In addition to learning DITA, tool and vendor selection, creation of stylesheets, change management and other challenges, you may be faced with having to convert thousands of pages of detailed technical information from an unstructured format to DITA.

This was the task Atmel’s Morten Haaker was faced with back in 2013 when the company embarked upon their DITA migration project. Morten and his team successfully converted hundreds of Atmel’s complex technical documentation (semiconductor-specific information and datasheets of more than 2000 pages) from unstructured FrameMaker to the DITA standard using Stilo’s Migrate cloud XML content conversion service.

But the conversion project didn’t end there. In April 2016, Atmel was acquired by Microchip Technology, a leading provider of microcontroller, mixed-signal, analog and flash-IP solutions, and Morten was further tasked with migrating Microchip’s FrameMaker documents to DITA.

Join us, and Morten, for this Stilo DITA Knowledge Series webinar to learn how they approached the issue of content conversion as an integral part of their DITA implementation and how they managed the conversion of Microchip’s technical documentation following the acquisition.

» View recording (registration required)


Presenter bio | Morten is a Staff Business System Analyst at Microchip Technology responsible for the technical documentation project. Located in Norway, Morten has been with Atmel/Microchip for more than 15 years. He holds a Master of Computer Science from the Norwegian University of Science and Technology and heads up the Nordic SDL Knowledge Center User Group.

WEBINAR | Migrate cloud XML content conversion service DEMO

Broadcast July 12, 2017

Join Stilo’s Conversion Services Manager, Helen St. Denis, for a demo of the Migrate cloud XML content conversion service. Migrate is a unique cloud service that enables technical authoring teams to automate the conversion of content from various source formats including XML/SGML/HTML, FrameMaker, Word, Author-it, InDesign, RoboHelp and DocBook to DITA and custom XML. It provides greater control over conversion quality, immediate turnaround times and operates on a low-cost, pay-as-you-use basis.

See why Migrate is the conversion service of choice for organizations including Altera, Cisco, Dell, EMC, Extreme Networks, IBM, Qualcomm, Teradata, Varian Medical Systems, Webtrends and many more.

Find out more about the Migrate XML cloud content conversion service

Convert sample document to DITA for free

Watch Video

ARTICLE | DITA and minimalism

Minimalism from a technical writing and training perspective was first investigated and proposed in the late 1970s by John Carroll and colleagues at the IBM Watson Research Center. It has since evolved and been extended by a variety of stakeholders.
The link between DITA and minimalism (the IBM connection notwithstanding) is not exactly carved in stone but the two complement each other like macaroni and cheese. The macaroni (DITA) provides the infrastructure and model that you need to support the cheese sauce (the minimalist content).

JoAnn Hackos’ Four Principles of Minimalism are helpful and useful. They are:

  • Principle One: Focus on an action-oriented approach
  • Principle Two: Ensure you understand the users’ world
  • Principle Three: Recognize the importance of troubleshooting information
  • Principle Four: Ensure that users can find the information they need

However, I would change things up a bit and stress some different points. Minimalism, when applied to technical writing, should result in content that is:

  • Based on knowledge of the users
  • Usable
  • Minimal
  • Appropriate
  • Findable
Based on Knowledge of the Users

Understanding your users is the underlying requisite for applying all other facets of minimalism. Without knowing how they are using the product, how they are accessing the content, and what their daily goals are with your product (and a thousand other factors), you aren’t going to be able to correctly apply the other facets of minimalism.


Write task-oriented topics that are focused on business goals rather than product functionality.

This means you need to understand your users well enough to understand those goals, including what backgrounds they have, what other tools they have at their disposal, their educational background, and a host of other information. User personas can be a powerful tool here.

The action-oriented approach is important but more specifically, you should be writing procedural information (tasks). One absolutely vital piece of any task is a detailed, goal-based context for the task. Done well, this context is an essential component of the learning process. The context the “why” that helps users situate themselves—so it must be written for and about their goals when using your product. They use that context to take their understanding of your product to the next level. The task’s focus should always be on the user. This focus is often neglected, although usually through either ignorance or time constraints, but the steps of a particular task are almost immaterial when compared to the context.


Powerful, usable content is clear and simple. In this sense, minimalism means removing words that add no value; words or phrases that are long, ambiguous, convoluted; and content that is simply not required.

Short simple sentences are easier to read and provide a basis for better translation. Topics that have only essential information are more easily parsed and will be read with more attention. If you limit yourself to essential words, then every word will be valuable to end users.

This facet of minimalism can often be done as part of an editing pass, either by you, a peer reviewer, or an editor—or better yet, all three. Remember that fewer, clearer words is more work, not less.


The careful selection and placement of every word you write should always be on your mind, from the planning stages through to the editing stages of your content. For content to be appropriate, it has to be the right information in the right place, support error recovery, and be formatted correctly.

  • Provide the right information in the right place to support users in their goals.

A pre-requisite to a task that is placed between steps four and five is a good example of content being in an inappropriate location. Always move a pre-requisite up before the context (in its valid DITA location) for consistency and because no one wants to get part-way through a task only to realize that should have done something important before even beginning. Similarly, if you need to prevent a common error from occurring between one step and another, then put it right there instead of in a separate location or topic. Best practices, mentioned in the right place, can save your users the hassle of having to troubleshoot later on.

  • Write detailed error recovery using DITA 1.3’s new troubleshooting topic and elements.

Many users will turn to the documentation only when they have run into a problem and need to look for a solution, so you are most definitely trying to write exactly what they are looking for. Troubleshooting, if it is concise and can’t be separated out from the context of its related task, should be kept inline using any of the three new troubleshooting-related elements (task troubleshooting, step troubleshooting, or the troubleshooting note type). When there’s too much information or the troubleshooting topic really should stand on its own (and be searched for as a discrete object) it should be written using the new troubleshooting topic. For any troubleshooting information (whether it’s a topic or inline), your goal is to provide error recovery by identifying a condition, a cause, and a remedy. The new troubleshooting topic also allows you to provide multiple cause and remedy pairs for a single condition to cover more complex cases.

  • Provide supporting information in the right format.

Although much of your minimalistic content will be in task topics (and thus formatted as ordered lists), your supporting information should use unordered lists, graphics, and tables depending on the type of information being conveyed. When users will need to scan and look up specific details quickly, you’ll use a table. Graphics will be most helpful when trying to convey an underlying or specific structure or flow. Unordered lists are important for listing parallel items that need to be noted


All of this targeted, minimalist, formatted, supportive content is going to be completely wasted if it’s not also all easily findable.

There are two levels to findability. At the detailed level, it means using short, targeted topics that are the appropriate type (task, concept, reference, troubleshooting, and glossary).

At the macroscopic level, findability means robust search mechanisms including faceted search, role-based content, and filtering. Whether your content strategy calls for building something custom, leveraging something in house, or buying a tool (that can take DITA source content and publish responsive Web-based content with these findability features built right in) is up to your requirements, budget, and timelines.


One of the most valuable changes you can make to your DITA content is applying minimalism to that content, both before and after your DITA conversion. Take the opportunity when learning DITA to learn how to apply minimalism to your content. It will not only improve your content and make your users far happier (leading to customer satisfaction improvements that will positively affect the entire company), it will also allow you to further define and refine your entire content strategy.

Further reading

JoAnn Hackos: Minimalism Updated 2012

OASIS DITA Adoption Feature Article on Troubleshooting in DITA 1.3: DITA 1.3 Feature Article: Using DITA 1.3 Troubleshooting

ARTICLE | Overcoming writer resistance to DITA adoption

I could easily have titled this article “Why your writers are suddenly freaking out,” because that is exactly what happens on teams that adopt DITA—there is always some degree of writer resistance. Even the best writers experience a moment of doubt when contemplating such a complete overhaul of their writing tools and standards. The trick, as someone implementing DITA, is to manage those fears wisely, even preempting them when possible, so that writer resistance becomes writer acceptance.

Successful DITA adoption relies on three pillars of strength: a solid content strategy, a detailed and realistic project plan, and a useful change management plan. That last pillar, the change management plan, is most often forgotten or left out and that’s where you’ll plan your strategy for overcoming inevitable writer resistance.

Why writers resist

Although many writers understand and embrace the changes that DITA introduces into their day-to-day writing, when they’re actually faced with adopting DITA, it can be a scary prospect. There are some key reasons for resistance.

Lack understanding of DITA best practices

Even if authors understand the basics of DITA, it’s not always clear how to do things the best way. This murky beginning can daunt even the most seasoned writers. In fact, the more senior the writer, the more likely they are to be bothered by this gap in their skillset.

Job loss

Many authors fear that their team will be downsized because DITA promises to be a more efficient way of creating and managing content. Although companies rarely downsize after adopting DITA, it is still a fear that needs to be managed because it can drive a very stubborn resistance; people feel like they are fighting for their livelihoods.

New tools

You are asking writers who have been using FrameMaker, Word, InDesign, and, in some cases, text editors to adopt a completely new writing tool, one that doesn’t necessarily reflect the look and feel of the end product they are creating—which is very confusing at first.  Piled on top of that, you are also asking them to learn a few hundred elements, attributes, and probably a CCMS as well. There’s no doubt about it, there is a lot to learn all at once.

New way of writing

The move from chapter-based or narrative writing to topic-based writing is a huge change for most writers. This is actually the change that writers struggle with most in the first 6 months.

Many authors have been doing things the same way for so many years that they’re uncertain if they’ll be able to adapt to a new way of writing, one that focusses more on modular writing and user goals. Often, this change in the way they write exposes a gap in their understanding of the end users and how they’re using the product. Although this gap is usually not their fault, it certainly hinders creating quality DITA content and reflects poorly on the writer.

Forms of resistance

There are generally two kinds of resistance you’ll run into: active and passive. You might see some authors using both forms while others will choose one over the other. The key to a smooth writer transition is to anticipate both forms and deal with them separately.

There’s no shame in being either one of these types of resisters. DITA is a big change and it’s up to management to ensure that the writers have all the knowledge (whether it’s tools, training, or communication) to ensure that change is as easy as possible.

Active resistance

Active resistance includes being vocal about doubting the adoption process, personnel, tools, or management handling of the adoption. Often, these writers will derail meetings or training with side issues—and you’ll suddenly find yourself mired in meetings that never accomplish what you need them to. Everyone’s work starts suffering and tempers run hot.

They might also take their disgruntlement to fellow team members and try to gang up on a manager, raising doubts and concerns that although real to them, are usually unfounded. If you get a group of writers entering your office one day to “talk to you about this DITA thing,” then you know you’ve neglected to address the concerns of your active resisters.

Passive resistance

Passive resistance includes not using the tools, performing tag abuse (using the wrong tags to get a specific result), not chunking content into topics, and generally not implementing the content strategy that is laid out for them. In some cases, you’ll find that an author has simply never made the transition from the old tools—they are sneakily still using them.

Passive resisters aren’t trying to rock the boat, but they definitely need help to make the change successfully.


A change management strategy is your plan for deciding how you’ll introduce the changes that DITA requires. Part of your change management plan should be specifically targeted towards identifying likely resistance from your team and how you’ll address each issue.

  • Active resisters: These are actually the easiest resisters to deal with because they are so easily identified. The best way to work with active resisters is to get them involved with the planning and content strategy. Active resistance also usually means that your writer lacks training or doesn’t understand that training will eventually cover all their concerns. Getting your active resister to create or review the training plan (what kind of information do we need at which point in the adoption) often circumvents their fears.
  • Passive resisters: Although passive resistance is harder to identify, you can plan for ways to either prevent it or catch it immediately. Feedback combined with adequate and timely training will solve this problem.

Don’t forget to proactively address your writers’ reasons for resistance in your plan. When you manage change, use the following tools to help address their underlying fears and keep your authors moving forward.

Communications plan

This plan should include how and when you’ll communicate the goals and progress of your DITA adoption. Use this plan to specifically address the issue of downsizing or job loss. Most companies don’t even think about downsizing because they are about to invest time and money into upgrading the skills of those same resources. However, authors still need to hear that their jobs are as secure as they have ever been from management.

The message you want to convey (at key points and on a regular basis) is that you are training them to be more valuable tech writers with two major goals: easier content lifecycles (happier writers and reviewers with more time to write usable, focused content) and more usable content (happier end users).

Don’t forget that DITA adoption also means new roles will be available, like content strategist, tools maintenance, and publishing expert, to name just a few. DITA adoption opens doors to all sorts of opportunities. And even if the company is downsizing, your writers will leave after being trained in DITA, making their options for a prospective job that much better.

This is also the time to let them know that their yearly performance reviews will include DITA adoption goals. For example, you can include each document converted to DITA or each new documents written in DITA as point on their performance review. You can even gamify this process and have writers compete for the most DITA-related points, which will convert to real dollars at the end of the year. If you give writers a target to shoot for with their DITA adoption, you’ll be pleased with the results.

Training plan

The right training for the right people at the right time in the adoption can prevent problems for both types of resisters. Authors need several kinds of training to get up to speed quickly but smoothly. Very early on, start with DITA fundamentals and then closely follow by topic-based writing, minimalism, tools training, process training, and DITA best practices.

Your content strategist and your publishing specialist will need training in DITA content strategy, metadata, and XSLT/DITA Open Toolkit (or equivalent, depending on your choices) very early on.

It’s a good idea to plan for ongoing training for the first four years in addition to formal training during the initial adoption phase. This ongoing training can be more informal and internal where authors and content strategists can share tips, tricks, and best practices among team members. You should also plan for key people to attend DITA-related conferences so they can get exposure to a wider world of what is possible with DITA.

Some companies have hired a consultant to bring an internal resource up to speed for the role of content strategist as a sort of ongoing, as-needed training from an expert.

Whatever your decisions for promoting, shifting, or even hiring your resources, an effective training plan will preempt a lot of resistance.

Feedback conduits

Authors need a way to have someone check their tagging and writing for the first 4-6 months to give them essential feedback on how to do things better or which pitfalls to avoid. This is where you’ll catch most of your passive resisters, but to do so, it’s essential that someone track who is and who is not getting their content checked.

Also provide a way for authors to ask questions from someone who knows the right answers. If you can do this in a way where they don’t need to risk looking or feeling dumb to their manager or peers, then it will be more of a success. A wiki page is a good idea or an anonymous Q&A drop box works well too, with the answers coming from someone with DITA experience (usually a content strategist, either full time or consulting).


Overall, writer resistance is a normal part of a DITA adoption project and can (and should) be planned for just like every other aspect of the project. Identify the root causes for your team and plan for how and when you’ll address those concerns for both active and passive resisters.

ARTICLE | DITA reuse and conversion together

When you are considering converting content from Word or unstructured FrameMaker (or other unstructured formats) into DITA, one of the things you want to consider before you start converting is your reuse strategy.
Why reuse and conversion together?

Your reuse strategy can be partially implemented as part of the conversion process, which means that you can automate some of the work. The highly automated nature of conversion is the perfect opportunity to sneak in some reuse automation at the same time. If you know what your reuse goals are, you can save a lot of manual effort by using the conversion process to automatically and programmatically add some reuse mechanisms to your content.

At its core, a DITA conversion is the process of mapping your content and formatting to DITA elements and attributes. If you opt to ignore certain formats or objects, your conversion process essentially “flattens” your content and you lose a great opportunity to automate.

For example, if you neglect to map variables to a specific element in DITA, then those variables are converted as plain text and you’ve lost your opportunity to apply a DITA element to them programmatically (and quickly).

In short, with a little planning and setup, combining reuse and conversion can save you lots of time.

Reuse strategy

A reuse strategy defines what kind of content you’ll reuse and how you’ll reuse it, including the DITA mechanisms you’ll need for each kind of reuse.

Generally speaking, you should be looking at reusing two types of content:

  1. Content that stays the same: Wherever it is, it needs to be standardized.
  2. Content that changes: Content has variations because of the
    • context of the person reading it (or someone who is re-branding and publishing it)
    • product version, suite or product combinations, or
    • changing nature of the product over time (like product or component names that may evolve).

Your reuse strategy is going to be your target. Without this strategy you’re really not going to be able to definitively know what it is you need to do during conversion or how to best do it. Planning and testing the reuse strategy before conversion is the key to being able to automate its application as part of conversion.

Reuse strategy for conversion

Although you should always develop a content reuse strategy when moving to DITA, when combining reuse and conversion, you need to add an extra layer to your reuse strategy that includes two major areas:

1.     Identify your existing content reuse (text insets, conditions, and variables) and decide how you’ll leverage it during conversion.

a)      What will each one map to?

b)      What is your desired end result?

2.     Plan for new content reuse that can be applied during conversion.

a)      What is your desired end result? What reuse mechanisms do you want to use?

b)      What and how can you automate?

c)       What do you need to change to enable automation during conversion? (You may need to apply formatting, for example, to automate something you really need.)

Content reuse in DITA

Everyone’s requirements are unique, but in general you should consider some common reuse strategies in DITA to get you started.

DITA allows you to use a variety of methods to reuse content and you’ll want to consider them all. When you’re getting started with reuse, you usually consider three main mechanisms:

  • Conref: Conref’ing is a mechanism that is equivalent to a text inset in FrameMaker, where a chunk of content (less than a topic) is pulled in from another location. DITA does this using a conref. (A push mechanism is also available, but less frequently used.)
  • Profiling: Profiling is equivalent to FrameMaker conditions, where content can be shown or hidden based on attribute values on elements.
  • Topic reuse: Topic-level reuse is simply pulling your topic into a map wherever it’s needed. Once in DITA, content is modular enough to be in short, reusable chunks. You don’t necessarily need to plan for this reuse during conversion, but it may allow you to NOT convert some content.
Warehouse topics for conrefs

These are topics that hold fragments of content and are never meant to be published as topics. You might create a warehouse topic for each of the following:

  • GUI objects, fields, buttons, icons
  • Frequently used steps, with step results and info
  • All your notes and warnings
  • Pre-requisites that are commonly mentioned, like having administrative privileges

You then use these warehouse topics as the source for conref mechanisms. Just like with text insets, warehouse topics let you write content once, and use wherever you need it. That means, when it changes, you update it in one spot. You translate it once too.

Once you know the steps that will go into a warehouse topic, for example, you can apply a distinct FrameMaker or Word format/style to the steps and then script conversion that will

  1. Pull the step into a warehouse topic (if not already there).
  2. Replace the step with a conref from the warehouse topic.

The result is that a good chunk of your reuse is automated during conversion.

DITA keys

Keys are a powerful mechanism in DITA. Although not strictly reuse, they can make reuse faster, simpler, and more dynamic. If you’re planning consistent, ongoing, and growing reuse, then consider keys as well.

Keys are used for indirect referencing of any kind. You can use keys for any piece that may need to be centrally updated or swapped out. For example, keys are often used for

  • Variables: To define terms or product names that can change based on context or over time.
  • URLs: To centrally manage and update them or customize them based on deliverables.
  • Conrefs become conkeyrefs: To pull in a different set of conrefs and quickly customize a document.
  • Related links: To customize them based on deliverable when topics are reused in multiple maps.
  • Including/excluding topics or maps: To create deliverables that have specific content without having to create many different maps.

Note: DITA keys will change for DITA 1.3; they will include a scoping mechanism that will simplify and extend linking. This article is based on DITA 1.2.

DITA keys ensure maximum reuse with minimum long-term efforts for updates.

An example of applying reuse during conversion: variables

If I know that I’ll be using keys to manage content that changes frequently (every release or when there is re-branding), I can ensure that, for example, the variables I’m using in FrameMaker for my product and version are part of my conversion. Instead of converting them as text, I can convert variables as <keyword> elements.

Converting variables as plain text is what we call flattening variables—once flattened, there is nothing that distinguishes them from the rest of the text. If you’re expecting conversion to leverage the DITA key mechanism but you are flattening variables, you will be left with adding keys manually after conversion.

Instead, as part of your conversion, you can leverage your variables by wrapping an appropriate element around them and even setting a keyref value on the element.

For example, my conversion plan for variables might specifically map variables to elements and keyrefs.

Variables become keys
Variables named… Are converted as element… With keyref value… For an end result of…
AppleBanana <keyword> product <keyword   keyref=”product”/>
ComponentB <keyword> componentB <keyword keyref=”componentB”/>

However, when you’re building this plan, it’s essential that you know that keys are defined in a map and defined only once. So if you need both Apple and Banana products to have separate names in a deliverable, then you need to create a unique keyref value for each one. When they have the same keyref value, then they resolve to the same name in the output. In my example above, <keyword keyref=”product”/> will resolve to the same product name in a particular DITA map, but can change in other DITA maps. I can no longer have both Apple and Banana in the same DITA map.

The key here is to plan, test, and then test again.

Strategies for combining conversion and reuse

It’s sometimes quite difficult to determine what can and should be done as part of the conversion to build the end result you need. There are two possible solutions to this:

  1. Manually build your end result in DITA and test it out until you’re sure it’s how you want to work. You can do this on a small set of content if you have limited funds or time, but the larger and more realistic the data set, the more accurate it will be for your overall needs. Once you have something that actually works the way you want it with the reuse set up the way you need it, you have a very concrete goal to work towards and can figure out what can be built and automated as part of conversion.
  2. Convert a small set of content, automating the reuse parts that you know you’ll need for sure and use that as an iterative process to keep building upon it until you have your desired result.

Either way, slow and steady is the way to go. Diving into conversion without considering reuse can lead to some frustrated hours or days doing something that could have taken seconds.

Best practices to prepare content for reuse

At this point, some of you may be saying that this is just too difficult and too time consuming to figure out. You want to convert now! Well, that’s ok too, but you should consider doing some basic pre-conversion work that will let you at least search and replace (or find) items that you want to reuse.

For example, what you can do while converting or before converting that will save time afterwards is:

  1. Re-write content and chunk content: A precursor to any good conversion is making sure your content adheres nicely to the topic-based writing paradigm and that you have clear distinctions between task, reference, and concept. All other work is based on this essential step. This is also where you would remove text indicators to location of other content, like the words “before, after, following, preceding, next, first”. In DITA, content can occur in any order, so you should remove any references to location.
  2. Include placeholders for future reuse: If you know you’ll be replacing the step “Select Log in from [graphic] and enter your administrative credentials.” with a conref, then go ahead and replace the content with “Conref login admin.” in your unstructured source. You’re simplifying the structure so your conversion will be faster and easier and afterwards, you can quickly insert a conref or conkeyref right where you need one.
  3. Standardize phrasing: The hardest things to do is find content that is almost the same but not quite matching. Although laborious, this pre-conversion cleanup process can set you up for easy reuse down the road. A tool like Acrolinx can help.
  4. Use a FrameMaker condition to identify likely or potential reusable content that you want to revisit after conversion. Then convert the condition as <draft-comment> elements. This is a good way to leave a note to yourself that is easily findable after conversion.
  5. Convert boilerplate content once, even if it has variations: If you have many versions of legal pages, copyright statements, standard notices, and any other content that is generally standardized, don’t bother converting those with each book. Convert them once, then modify the XML until it meets your requirements for all your books.


ARTICLE | What’s in a DITA file?

When you first start tackling a DITA conversion, it’s difficult to get a handle on just what comprises a single file. Is it one topic? Is it multiple topics? How long should a file be? How short can it be? We’ll tackle these questions in this article.
Perfect File

The perfect DITA file is one that contains one topic, where that topic is as long or as short as it needs to be.

The length of a topic depends entirely on the subject matter and issues of usability. Generally, the litmus test is asking yourself “Can a user navigate to this one topic and have all the information they need for it to be usable and stand on its own?” If it’s too short, you’re forcing them to navigate away to find more information. If it’s too long, it becomes too onerous to follow.

Your file should contain just one topic, be it task, concept, or reference, regardless of the length of your topic.

Shortest File

The shortest file allowed in DITA is a topic that contains only a title element and nothing more.

This is absolutely allowed but it is something that you would only do for a specific reason, such as adding another level of headings.

Longest File

The longest file you should have is one that supports your requirements. For most technical publications content, you should only have one topic per file.

Note: An exception to this rule is if you’re authoring training content using the DITA Learning and Training Specialization; there are very good reasons for having many topics in a single file in that case, but those would all be learning and training topics, not the core concept, task, and reference that DITA is built upon. Note that this is true as of DITA 1.2 but may change in the not-so-distant future.


DITA architecture allows you to nest many topics into one file. However, doing so introduces major limitations on reuse. If you nest topics into one file, you will be sacrificing the flexibility that DITA introduces. It’s like choosing to hop on one leg instead of running on both.

Breaking each topic out into its own file is what we call “chunking.” One purpose of chunking is to allow the authors to have incredible flexibility when it comes to reusing this content.

Consider this nested task within a task, where two tasks are in the same file.


If I want to include the information about studying with a master somewhere else, such as a guide for becoming a senator, I’ll be out of luck because I haven’t separated my two tasks into separate topics—when they’re in the same file, where one goes, the other must also follow.

Once a topic is in its own file, an author can pull that topic into any deliverable that needs it. It’s not uncommon for one file to be part of up to ten or more deliverables. If you have multiple topics in that one file, then they all must be reused along with the one you want, without the possibility of even changing the order of them.

There are many other good reasons to chunk. For example, if your content needs to be reorganized, you can quickly drag and drop topics that are each in their own files. All navigation and linking is automatically updated based on your new organization.

How It All Comes Together

Chunking content into individual topics is the first major hurdle that authors face when adopting DITA because it’s so far away from our training and understanding of writing in chapters, books, and documents. It’s not clear how all those tiny little topics come together. And be warned, you will have hundreds of topics that make up just one deliverable.

Enter the DITA map, the great glue that holds it all together.

You can think of a DITA map (which has a .ditamap file extension) as nothing but an organizational mechanism or even as a Table of Contents. A DITA map itself has very little content. It usually contains just a title. What it does have, though, are topicrefs, which are references to all those files you’ve authored.

Here’s a visual representation of a map, where I’ve told it to “pull in” my two topics (with a hierarchy, one nested below the other). When I create my map, I add the topics that are relevant to this deliverable. The map simply references them using a file path with the href attribute.


That same map in code view looks like this:


The only thing that’s actually typed into this map is the content in the element. The other two objects, the topicrefs, are pointing to the files that are my two tasks using the href value, in this case simply the names of the files: become_jedi.dita and study_jedi.dita.

When you publish a DITA map with all your topics referenced, you get a single deliverable with all your content. For PDFs, you’ll have content that looks exactly like your PDFs created from FrameMaker, Word, InDesign or whatever else you have used. For HTML output, each file becomes its own page, with automated navigation to all the other pages. All outputs are entirely customizable.



Although the length of your topics will vary depending on the subject matter, your files should contain just one topic. Use your DITA map to bring your topics together, giving you the flexibility that DITA promises, including topic-level reuse and the ability to quickly reorganize your content.

ARTICLE | DITA and graphics: what you need to know

A DITA project is an ideal time to audit, enhance, and start managing your media assets. Like any other piece of content, your media are a valuable resource that the company can leverage.

Although much of a transition to DITA concentrates on improving the quality of your content, there are also some distinct benefits to your media as well. By media, I mean:

  • Logos (for branding/marketing)
  • Screenshots
  • Illustrations
  • Diagrams
  • Image maps (a flow graphic that shows how a process or a set of tasks connect with clickable hot spots)
  • Inline graphics for buttons, tips, notes

When you’re moving to DITA, you should be thinking about two things when it comes to media:

  • Minimizing and single sourcing
  • Introducing and maintaining best practices
Graphics you no longer need

Probably the biggest mistake when moving to DITA is to lug your extra, non-essential media around with you, just in case.

Any graphic that is being used solely for the purposes of design can be managed centrally instead of being placed in everyone’s individual folders. These include (but are not limited to):

  • Icons for tips/notes
  • Logos for the title page, header, footer
  • Horizontal rules for separation of content areas

All these graphics get applied on publish so each individual author no longer has to worry about them. Your authors no longer even see these graphics—and also no longer need to manage them. Your publishing expert still needs to manage these graphics efficiently, but at least now there’s only one graphic to manage instead of dozens or hundreds.

For example, when the branding guidelines are updated, the publishing expert simply updates the logo used in the stylesheets–replacing one graphic in one place instead of a graphic in every single title page and footer throughout your library of content.

Prune and archive

The design-related graphics are easy to throw away, but we all have extra graphics lying around. It’s not unusual for a single graphic to have 5 or more other related graphics that are hanging around just in case. For example, you may have files that are older versions, variations, and different size and quality options.

We always loathe to “lose” a graphic, but DITA migration is the perfect time to archive the older versions and variations—but keep the quality options because they’ll come in handy.

A graphic is only useful if it conveys something that words cannot. If you can explain what the graphic shows, then the graphic is usually redundant and not useful. If you can’t explain it, then the graphic is needed. Prune your content of the graphics that don’t add any value.


DITA lets you specify multiple types of formats for a single graphic so that you are always publishing the right graphic for the right format. You can easily publish different formats (such as color for ePub and grayscale for print) using DITA attributes.

All graphics are either vector or raster. The format you use will depend on the type of graphic you need and the outputs you’re publishing to. For more information about raster versus vector, see this article.

Vector graphics (specifically SVG) are usually the right choice for most technical illustrations and diagrams as long as they don’t require complex coloring (like drop shadowing and shading). They are clear, clean graphics that look professional and don’t have that “fuzzy” look on publish.

A huge added bonus is that, because they are made up of layers, you can export the text (usually in the form of callouts or labels) and translate just that text if you’re providing localized content. This saves having to edit or even re-create a graphic when localizing. Another benefit is that in HTML outputs, they also allow users to zoom without pixilation, as needed.

It’s just icing on the cake that vector graphics are smaller, more compact files with lossless data compression.

SVG is an open standard, advocated by W3C, the Web consortium that is bringing you HTML5. It might not be the only graphic format in the future, but it will definitely be a forerunner. You can create SVG files from most graphics editors, including but not limited to: Adobe Illustrator (.ai and .eps files), Microsoft Visio, InkScape, and Google.


The best size for your graphics depends on the output type. PDFs and HTML have different widths and resolutions. This really does get tricky, but if you’re using the DITA OpenToolkit to publish, it’s possible to set default maximum widths (maintaining the correct aspect ratio) so that you’ll at least never overwhelm your audience with a massive graphic. Use this default maximum in combination with authors using DITA attributes to set preferred width or height (but not both).

Interactive graphics

Some illustrations are best done in 3D. The ability to manipulate, rotate, zoom and otherwise play with graphics is not just really cool, it can also let users access the information they need without overwhelming them with 20 different views/zooms of a particular object or object set.

You can also play around with something like Prezi (or better yet, Impress.js), which lets you display and connect information graphically.

Manage graphics

If your authors don’t know that a graphic exists or they can’t find it, then that graphic is a wasted resource. It’s not uncommon for an author to forget or be unable to find the graphics that they themselves created months or years before.

Every time an author re-creates a graphic that they could have re-used, they are wasting on average 5 hours. Assuming your authors are worth approximately $45/hour and that they re-create a graphic they should have re-used about 4 times per year, then that means the company is wasting $900/year/author. If you have 10 authors, that’s $9000/year that could be saved with some simple, basic management of graphics with little or no cost or effort. If you have complex graphics, double or triple that savings.

Just like topic reuse, graphics reuse is a no-brainer source of savings for your company.

They key to graphics management is metadata. File naming, even with strict enforcement, is one of those things that degrades over time. Mistakes creep in. If you’re relying primarily on file naming, then expect to lose or orphan graphics. (An orphaned graphic is one that is not referenced by any topics.)

Instead, use descriptive tags applied to each graphic so authors can search, filter, and find the graphics they’re looking for. Then make sure they search for existing graphics before creating new ones from scratch. This same metadata can also be used to let media be searchable to end users when you publish. If you have videos, you can take this one step further and provide a time-delineated list of subjects covered in the video so users can skip right to the spot they want.

Descriptive tags should also be intelligently managed so you don’t have people using slightly different tags and so that you can modify the tags when it’s needed. These tags are called a taxonomy/classification scheme and can lead to their own chaos if left unmanaged and uncontrolled. Either keep them extremely simple (fewer than 10 tags, no hierarchy), select a CCMS that allows you to manage them, or call in an expert to help you out.

Manage source files

Don’t forget to store and manage your source files for graphics in a similar way to your graphic output files themselves.

A quality CCMS makes it easy to store your source graphics with your output graphics (or vice versa), so you can easily find, for example, both your Visio source file and your eight associated .PNG files.

If your CCMS doesn’t include this functionality (or if you’re using file folders instead), the key is to use metadata that matches how you’re managing your output graphics so that your filters and searches will automatically include the source files for graphics as well.


Graphics are the least-emphasized aspect of a DITA conversion project, but it’s worth the effort to establish which graphics you need to keep, how to manage them, and how to make them findable for both authors and end users. Your graphics are valuable assets that can and should be leveraged.

ARTICLE | I got my XML back. Now what?

If you’re new to DITA conversion projects or you’re planning on converting content soon, then you should give some thought to all the pieces that go into a successful conversion. A well thought out content strategy will help guide you through the process of conversion, but if you haven’t considered all the minutiae, this article will help fill in the blanks.
Review the conversion

Make sure that you take a look at the converted content to check for the following items:

  • You captured all content you wanted, including conditional content, variables, and document details (subtitles, document dates and identifiers, version numbers, etc.).
  • The content was chunked correctly for your needs. Were sections used where you wanted topics or vice versa?
  • Did tables convert correctly? Is all information there, including items like vertical text (as attributes)?
  • Are there any validity problems? Check validity. Publish using an XML editor like oXygen using the DITA Open Toolkit to XHTML to ensure you can publish without errors.
  • Are the file names the ones you want to use going forward? Is the structure the one you want to keep? If you make changes, make sure you modify all references as well or you’ll have broken paths. If you are using a quality CCMS, consider making these changes after you have uploaded your files (no path changes necessary).
  • Did you leave the chaff behind? If you still have any content that is redundant, useless, or outdated, remove it now. (Tip: Ideally, you should do the bulk of this work before you convert.)

Don’t assume anything. Having eyes on your converted content (all of it) is going to save you endless frustration down the road. Even the best conversion doesn’t always result in exactly what you need or want.

Gather metrics

If you haven’t already, make sure you have “before” metrics using your legacy tools and processes for how long it took to author new content, update content with changes, fix bugs, review content (peer review, tech editor review, SME review, QA review), translate content, and publish content. Remember that your legacy metrics will likely be based on books or chapters and that your new metrics will be based on topics (and maps for publishing). This means you need to have an idea of how many “topics” would have made up a chapter or book in legacy content so that you aren’t stuck comparing apples to oranges. Start keeping metrics on the items below as well to get an idea of how long it takes to implement your re-use strategy, for example.

Identify and fix content that needs work

It is an inevitable certainty that the content you convert will need some work once you get it back. Although you can and should do the bulk of your re-writing before you convert, once you see content in topics, you might realize that you have some key areas to clean up. Concentrate on the big ones, including:

  • Ensure you have the right topic type for the content. If your content is a procedure, then it should be in task topic. If it is meant for quick look up (tables, lists, alphabetized or organized), then it should be a reference topic. If the content explains what something is, how it works, or why the user should care about it, then it’s a concept topic. Those are broad distinctions, but make sure the content fits the topic type.
  • Do your tasks focus on user goals and is your content minimal? If your tasks cover functionality (using widget A, customizing widget B, etc.), then you likely suffer from badly focussed content.  One of the biggest benefits of DITA is topic-based writing and minimalism, which both enforce writing content that users actually need to get their daily work done. Take the time now to at minimum identify what needs work and ideally re-write the best candidates for improvements. It’s important to do this part before you work on your re-use strategy because it will give you all sorts of ideas on how you can re-use content.
Load into a CCMS (optional but usually recommended)

Even with only one writer and no translations, if you have a decent amount of re-use and any workflows to adhere to, the ROI on a CCMS (Component Content Management System) can be less than 3 years. The CCMS pays for itself in increased efficiency in a thousand different ways. The ROI is much shorter if you have multiple authors, standard amounts of re-use, and translate to one or more languages.  You should load your content into your CCMS before working on it more profoundly because once it is in there, it’s easier to find and update content, apply re-use strategies like keys, conditions, and conrefs, and generally work with your content. It also gives you the opportunity to get to know the ins and outs of the CCMS. If you haven’t yet purchased a CCMS but you are shopping around for one, you can use your newly converted content as part of a demo or trial. Simply ask the vendor to include your content. Although most CCMSs are intelligent enough for you to point to a ditamap and grab all associated files, there are always some files that are not referenced in your map but need to be uploaded, managed, and versioned, including but not limited to:

  • Source files for graphics (Visio, Photoshop, SnagIt, etc.)
  • Legacy materials in PDF, HTML, etc. if you want to keep copies of them
  • Videos (including source files and supporting files)
  • Graphics that may not be used directly but that you want to keep because they are valuable assets and may be used in the future (not necessarily in the documentation)
  • Engineering documents
  • Strategy documents
  • Anything else that needs to be accessed by multiple authors, versioned, and/or never lost

CCMSs will store these as BLOBs (Binary Large Objects), so make sure you add the appropriate metadata to these files to make sure they are findable and filterable by authors, editors, and managers. 

Apply re-use strategy

Your re-use strategy could be something simple like inserting keys and keyrefs to automatically pull in glossary terms or something more complex like using conkeyrefs to pull in conditional elements as needed. At a minimum, you’ll probably want to use conrefs for frequently repeated content, like software/hardware states, warnings or cautions, content that must be standardized across most or all documents, menu options, definitions, and anything that you’d like to update in one place and use wherever needed.

Your re-use strategy should be planned out in advanced as part of your larger content strategy, but once you receive content back from conversion, you need to actually implement your re-use strategy. The more re-use you can get out of your content, the better it is in the long term because it means huge savings in updating content, reviewing content, and translating content, not to mention confidence that you will never have contradictory information. Keep in mind, though, that extensive re-use takes time to implement—it’s a huge cost-saving step that, yes, costs some initial effort but will directly impact and improve your ROI by months if not years. So put the effort in to applying your re-use strategy before authors start working on content. 

Apply metadata strategy

Metadata is like putting strings on your XML elements so you can make them dance. Like your re-use strategy, your metadata strategy is primarily planned out as part of your content strategy. Some metadata will be applied automatically during conversion (like conditional attributes) if it existed in the content submitted for conversion, but you’ll also need to introduce some new metadata. Metadata lets you introduce and manage:

  • Conditional content to provide custom outputs to users on specific platforms, with specific expertise, or with certain combinations of products.
  • Publishing controls like vertical text for table heading or customizing the content for mobile outputs.
  • SEO keywords to improve findability of topics by search engines.
  • Topic information like author, product, embedded help ID, version, topic status (like draft, in review, final, published, archived), content contributors, etc. If you need to track it, then use metadata.
  • Taxonomy so users can browse by subject.
  • Categories/keywords for media, including graphics and videos (this helps make them findable by authors too).
  • Pretty much anything else you can imagine that is in addition to rather than strictly inside the readable content.
Work on publishing

A depressing number of people converting for the first time are dismayed and shocked to find out that the publishing aspect of a DITA project is a completely separate and additional effort over and above the conversion and (usually) the use of a CCMS. Remember that separating content from formatting (as with any conversion to XML) means that you then must put some initial effort into creating the look and feel that you want in each desired output. It’s a big one-time effort (with occasional updates thereafter) that new DITA-ites understandably often overlook. Whether you publish using the DITA Open Toolkit, a publishing tool/engine like Adobe’s Publishing Server, a third-party website like MindTouch or Fluid Topics, a CCMS add-on, or a homegrown solution, the publishing aspect of a DITA conversion project takes planning, effort, and testing. As with most projects, the more planning and testing you do, the less work it will be. There are many aspects to publishing, but at a high level, it can be split into two distinct parts:

  • Converting XML to the output formats you need (HTML, ePub, PDF, etc.)
  • Delivering the content to the users (how will users access and experience the content?)

The second bullet could include a lot of work including building a website, designing a search algorithm, designing the interface, introducing ratings and commenting systems, integrating with Support or Knowledge Bases, implementing user access, allowing users to submit their own content, designing support for adaptive content, adding accessibility features, and much more. This is the single most forgotten aspect of the content lifecycle and arguably, one of the most important ones. Throwing a PDF or tri-plane help at your users is no longer meeting user demands, so we need to step up and design the entire content experience.


Although conversion feels like the end of a DITA project, there’s a lot of work to be done once you get your XML. Your content strategy will be your guide throughout this process and can help you avoid making costly errors or forgetting to plan for important parts of the project.

ARTICLE | Converting to DITA – mastering the task

Adopting DITA means you need to make a switch from document or chapter-based writing to topic-based writing. For writers being exposed to DITA for the first time, this shift in thinking and writing tends to be the hardest part of the transition.

At the core of topic-based writing is the DITA task. Master the task and you start mastering DITA content (or any topic-based content). Concepts and references are important too, but once you have mastered the task, everything else just falls into line.

Your DITA task will be the core of your content. The task topic is your primary way of instructing your users and guiding them through their relationship with the product—from setup to advanced configurations, tasks are going to be the most frequently read topics. If you identify the right tasks to document and document those tasks in a usable way, your documentation will be valuable and usable and your user will be happy with their product. Happy users are always good for business.

If you are working with legacy content, knowing the model and purpose of the DITA task will help you during your conversion. If you have content that doesn’t map one-to-one with the elements of a DITA task, then you’ll know that you have some pre-conversion work to do.

Purpose of a Task

The purpose of a task is to tell a user how to do something. From logging in for the first time to configuring an advanced combination of features, the task walks the user through the steps and provides important contextual information as well.

The task is intended to be streamlined, easy to read, and easy to follow. To get your task down to this minimal, usable core of material, you need to provide just the information that the user needs to complete the task and nothing more.

A well-orchestrated task has the right information in the right location—and nothing extra.


How long should a task be? It needs to be long enough to stand on its own but short enough that the user won’t give up partway through. Ideally, to be the most useful, a task should be no more than ¾ of a “page” in length (note that page lengths differ—a page is considerably shorter for mobile outputs, for example). However, there are valid use cases for both a one-step task and a 15-step task, so task length really depends on the content.

I think the better question here is not about length, but about focus: What should the focus of my task be? If I’m writing about logging in to the system, do I include every way to log in and as every type of user?

The answer is usually no. The more focussed your task is, the more usable it is. Place yourself in the users’ shoes and ask yourself what they need to know. Logging in to the system becomes a whole set of tasks (from which you can re-use steps extensively through the conref mechanism and/or use conditional metadata to make writing and updating faster and easier):

  • Log in for the first time (for admin)
  • Log in for the first time (for user)
  • Log in from a mobile device (if different)
  • Reset your lost password (for admin)
  • Reset your lost password (for user)
  • Log in as a special user
  • Etc.

This type of focus is invaluable to your end users. However, it’s the type of focus that is difficult to correctly identify without doing user testing and getting ongoing user feedback. If your company doesn’t provide an opportunity to get direct feedback from your users, you are relegated to either guessing how users will use the product or writing feature-based content; neither is a recommended way to write.

If you find yourself documenting a task based on a GUI feature, mechanism, then you may be missing the focus that your users need. Make sure you’re identifying and writing for the business goal rather than the product functionality. A correctly focussed task often strings together many pieces of product functionality.

Instead of…
Focus the task(s) on…
Using the print feature
  • Exporting to Excel
  • Sending a PDF for review
  • Publishing for mobile devices
  • Printing a hard copy
  • Managing your printer options
  • etc.
Configuring the x threshold
  • Maximizing efficiency in a large deployment (will include configuring x threshold as well as other widgets/settings)
  • Maximizing efficiency in a medium deployment
Using the MyTube Aggregator feature
  • Creating a channel of your favourite videos
  • Growing your reputation/community

Once your task is properly focussed, the question of length usually answers itself. Always keep in mind that users don’t like to read documentation, so make every task as succinct as you can.

Core Elements of a Task

Use the core elements of a task as your tools for writing a clear, clean, streamlined task that is usable and functional.

Element  Description Example
  • Clearly written description of user goal for the task
Create Daily Reports
  • Complements the title
  • Used in navigation and search results
  • When combined with the title, helps users decide whether to navigate to a specific topic
  • Uses words that bridges the gap between product terminology and user terminology
Daily reports summarize the system performance in graphical format over the last 24-hours
  • What the user must complete or have at hand before starting this task
  • The line sometimes blurs between the first step and a pre-requisite; use common sense
You must have administrative rights
You must have configured your server for access through the cloud
  • Explains why the user would perform this task, what their goals are, and places the task in a larger context
Use and customize daily reports to get a snapshot of the system’s health and identify any trending issues or problems before they become critical
Step Command
  • Tells the user what to do for each step succinctly and with no extra words
  • Covers the action they must take and no more
Log in to the command console

Step Info


  • Additional information about the step command that is essential for the user to know about that step, but is nonetheless not part of the action they must take
  • Can often include tips (which should be in a note element inside the info element) or special circumstances that need to be noted
  • Is a troubleshooting element for the user—if they cannot perform this step (e.g. forgot password), give them enough information here so they can move forward
  • With the next two elements, often the content that gets automatically stripped out when publishing for mobile devices
Your password was created as part of the installation process
Step Result (Optional)
  • Tied to particular step, this is the result of the user taking the step
  • Can be omitted when the result is obvious
Your daily metrics display
Step Example (Optional)
  • Tied to a particular step, this is an example of what they see or input
  • Can be omitted when it doesn’t apply
A list of metrics, areas, or a screenshot
Sub steps (Optional)
  • Set of sub-steps that walk the user through the details of a complex step
  • Can often be used when a command becomes too long (you are trying to put too much information into one step)

1.  Restart the agent

a. From Task Manager, locate and select the agent

b. Click Stop

c. Wait 30 seconds

d. Click Start

Step Choices
  • Can be used if the step can be done in different ways for different purposes
If you prefer nightly reports, enter 6:00 a.m.
Task Result (Optional)
  • The result of the user finishing the task correctly
  • Should tie back in with the short description and context
  • Can be omitted if the task result is obvious or doesn’t apply
Customized daily reports that help you identify trends and summarize progress are now available on your Central Admin interface and are available from a drop-down list for all other administrators
Task Example (Optional)
  • An example of what a correctly performed task looks like
  • A way to provide specific details without being specific in the steps
A screenshot or code example
  • What the user must do after they have completed this task


Re-generate all reports to include your new reports in the next bulk export

Note: There is another task available called a “Machinery Task” that has more elements and more ways to organize those elements. It is appropriate for content that covers assembling and maintaining machinery. Check the DITA 1.2 (or the latest) specification for details.

What is not included in a DITA task is a place to add detailed reference material, rationalization for performing a particular step, or complex or bifurcating task steps that cover multiple scenarios (Linux and Windows, for example).

  • Detailed reference material would be written and chunked separately in its own reference topic and either placed adjacent to this task or linked to via the relationship table in the ditamap.
  • Rationalization for each step (why you perform each step) is not needed. It clutters up the step with information that is not essential. Leave it out or add an overall explanation into a concept topic instead.
  • Bifurcating tasks (usually written in long tables in legacy materials where the user is supposed to skip down to the rows that apply to them) are no longer needed in DITA. Make each scenario its own task or use conditional metadata, and/or define your re-use strategy instead.

Fig: Example of the common elements used in a task (in an XML editor with inline elements showing)

Tasks are really the core of great, usable content. The more focussed and streamlined your tasks are, the more valuable your users will find them. It’s important that you use the correct task elements for the correct type of content. Use the elements as your guide to mastering the task. If you’re working on legacy content, then use styles or formats that map to these elements to ensure your conversion to DITA is nice and easy.

ARTICLE | DITA conversion and metadata

One of the most overlooked aspects of DITA conversion is including metadata in your conversion project. Metadata is a powerful tool. Please, leverage it! (Go ahead and picture me shouting this from the rooftops.)

Your goal is to capture and transfer metadata that is important to your content and your processes. You want to do this for a few reasons:

  1. So that it’s not forgotten and left behind. “When was this content last updated and who updated it?” You don’t want the answer to be: “Who knows. We converted it last week.”
  2. So you can leverage your XML. Adding metadata to XML is like putting a steering wheel on your car—it gives you all sorts of control over it.
  3. So you don’t have to apply metadata manually after conversion, a painful and time-consuming exercise.

You can also treat your conversion project as an opportunity to introduce new metadata into your content that can really enhance its value. The moment when content is being converted to XML but is not yet loaded into a CMS is the perfect moment for adding metadata.

Part of your overall content strategy should include a section on metadata strategy, where you plan what kinds of information you want to capture (or introduce) and how you will do so.

Metadata explained

Metadata is simply information about information. The date stamp on a file, for example, is metadata about that file. Although we’re used to seeing all sorts of metadata, we rarely use it to our benefit other than by sorting a list of files. Using Windows 7, you could, for example, easily return a list of all graphics that you’ve ever uploaded to your computer that were taken with a specific lens length, no matter where they are stored. You could do the equivalent exercise with your content files (Word documents, FrameMaker files, Excel spreadsheets, etc.) if you took the time to tag them with simple category metadata.

In the context of DITA topics and maps, metadata is information that is not part of the content itself. Metadata is expressed in an element’s attributes and values, in elements in the prolog of a topic, in the topicmeta element in maps, in various other places in maps and bookmaps, and in subject scheme maps.

Metadata in the prolog element

Use metadata for different purposes:

  1. Internal processes. For example, knowing the last date a piece of content was updated can let you know that content has become stale. This sort of metadata can also drive workflows for authoring, reviewing, and translating.
  2. Conditional content. Metadata is what lets you show/hide content that is specific to particular users, specific output types (like mobile), or particular products and helps you maximize your ability to single source and re-use content (thus making your ROI that much more attractive).
  3. To control the look and feel of your content on publish. Metadata allows information to pass to your publishing engine.
  4. Grouping and finding content using a taxonomy or subject scheme. Useful for both authors searching for content and end users searching and browsing for content, this strategy can be a really powerful addition to your content.
  5. To run metrics against. Example: Return a count of topics covering a subject matter, or the number of topics updated in the last x months by author a, b, and c. You can get metrics on any metadata you plan for and implement.

What metadata do you need to capture?

The metadata you need to capture depends on your content strategy. A good method is to start with how you’d like your users (external stakeholders) to experience their content and work backwards from there. For example, if you want localized content to display for users who are from a specific geographic location, then you need to build that in. If you want content to display differently for mobile devices, then you need to build that in.

Don’t forget about your authors when it comes to planning your metadata (it helps to think of them as internal stakeholders). Metadata can introduce some major efficiencies when planning, finding, authoring, and publishing content. A good CMS lets authors browse, search, and filter by subject matter, keyword, component, sub-component, or any other piece of metadata. Sometimes some of the metadata might be applied in the CMS itself rather than in the topics or map, so your metadata plan should include an understanding of what and how you’ll be able to leverage metadata using your CMS of choice.

However, at a minimum, think about including topic-level metadata (traditionally placed in the prolog element) that includes:

  • Author
  • Status of the content (for example, approved)
  • Date content was originally created
  • Date content was last updated
  • Version of product (if applicable)

Conditional metadata

Conditional metadata is the most popular use of metadata. The conditional markers on your legacy content should be converted to attributes and their values so you can leverage profiling (publishing for specific users or output types). Not all attributes can work as profiling attributes, so make sure you do your homework when planning your metadata strategy. Also not all attributes are available on all elements.

Conditional metadata on a step element

The .ditaval file goes hand in hand with conditional metadata. This is a processing file used on publish to show/hide attribute/value pairs.

Ditaval file

Publishing metadata

You can use metadata to control the look and feel of your content. A simple example is for table header columns that should have vertical text rather than horizontal text. A piece of metadata can let the stylesheets identify when to display text with vertical alignment.

Table with metadata that indicates some text should be vertical

Best Practices

I’m the first one to admit that managing your metadata can become a bit of a nightmare.  You need to keep an eye on best practices to make sure what you implement is scalable and manageable.

When you think metadata, think map

There are no two ways about it—trying to manage metadata at the topic level is not always efficient. Instead, think about putting some metadata in maps instead.  This lets you change the metadata of a topic depending on the map it is referenced in, making it more versatile.

However, there are downsides to placing metadata in maps. It means you have to duplicate effort because every time you reference the same topic, you must specify the metadata again in each map, which could lead to inconsistencies. It also means that authors can’t necessarily easily see the metadata that might be important for them to know when using or modifying the topic.

Often, some metadata at the map level lets you leverage your content intelligently while the rest should stay in the topic. Each case is unique and you should define this as part of your content strategy, but some examples are shown below.

Keep in mind that metadata that is assigned in DITA topics can be supplemented or overridden by metadata that is assigned in a DITA map, so you can overlap metadata if needed but the map is (usually) boss. For details, see the DITA specification.

Map metadata using the topicmeta element

Keys and conkeyrefs

Some great alternatives to setting conditional or profiling attributes on elements are to use keys and conkeyrefs. These mechanisms take the control out of the topic and put it in the map or in a central location, where it belongs. When you start controlling your content from your map or from a central location, your content becomes both more versatile and more efficiently updated. For example, a topic could swap out some of its content depending on the map in which it is referenced. This can be useful for anything from a term or variable phrase to a table, graphic, or paragraph.

Use of keyref in a sentence

Defining key in map, where the keyword will replace the keyref in the paragraph above

Taxonomy/Subject Scheme

Using the subject scheme map, you can take your metadata to a whole new level. The subject scheme map is a way of introducing hierarchy into your classification or subject scheme, and then being able to leverage that hierarchy intelligently on publish. For example, you can create a subject scheme that defines two types of subjects: hardware and software. Each of these categories would be broken out into sub-categories. So hardware might include headsets, screens, and power cords. By connecting this hierarchical categorization to the topics and maps that hold your content, you can manipulate content at the lower level of categorization (for example, exclude all headsets content) or at the higher level (exclude all hardware content). It also lets you change the user experience of content for end users, so they can easily search through or browse these categories. And that’s just the beginning of what you can do with subject scheme maps.

For more information on subject scheme maps, see Joe Gelb’s presentation on this subject. Although he distinguishes metadata from taxonomy, this is really an arbitrary distinction. Think of taxonomy as a particular kind of metadata with a specific purpose.

Like any metadata effort, planning your taxonomy and subject scheme is essential. For example, identifying all installation content is probably not going to be useful to end users (who wants to see the installation topics for 40 products?) but grouping content by subcomponent could be essential. The trick is to determine what will be useful.

Summary: A careful, methodical approach to including metadata in your conversion project can help you leverage your XML in a way that can be both internally and externally powerful. Use your conversion project as a way to not only transfer your existing metadata to your XML or CMS, but to also enhance your metadata to ensure you have versatile and findable content.