Exact Topic Deduplication

Before You Begin #

Before deduplicating your structured content, the files you wish to compare must already be successfully converted in the Documents page of your Migrate dashboard.

Procedure #

  1. Go to Reuse Collections
  2. Click Create to create a new collection where we can add and compare multiple files.
  3. Enter a name for the new collection, then click Ok.
  4. Click Set Warehouse to rename the folder in which the common topic files will be stored and referenced from.

    Enter a name for the warehouse folder and click Ok

  5. Go back to Documents.
  6. Select the successfully converted documents, then click the Collection dropdown and add the selected files to the collection created previously.
    The files will now be labeled appropriately under the Collection column.
  7. Go back to the Reuse Collections page and select your collection. The Collection Details should now list the files you added. Click Compare to start the deduplication.
  8. The progress bar and percentage show how far along the comparison we are. When completed, we are presented with Collection Details and a User Report with information on our comparison.
    Click Download to download the collection.

    Results #

    In our download, we are presented with our warehouse folder, a folder for each file in the collection, and the User Report we saw on the Migrate dashboard saved as an .html file.

    The User Report contains a list of all the warehoused topics that are now used as references at the listed locations under the Topic References Created column. The folders for each file compared in the collection will contain their respective unique topics. The warehouse folder contains all the deduplicated topics.

    In the example used for the steps above, in the User Report we can see this first warehoused topic is referenced five times across the collection.

    Here is the extracted topic that we will be referencing. It is stored the warehouse folder of our download.

    Here are two instances of the structured content referencing the same warehoused topic successfully. We can also see both maps referencing other shared topics from the warehouse folder.

    We have successfully optimized our structured content by removing duplicate topics, centralizing the location of the common topics, and updating all references in the collection.