Data Mining and Extracting Content From Desktop Publishing File Formats
Reverse Engineering To
Extract Content From Desktop Publishing File Formats
Extract content from desktop publishing file formats like Adobe InDesign and Microsoft Publisher with Markzware PageZephyr desktop search engine to extract DTP
Technology for data mining, data harvesting and examining text
Since 1995, Markzware has been a household name for Graphic Arts software solutions. Markzware invented (United States Patent # 5,963,641) a quality control system for printing, referred to as ‘preflighting.’ Since its founding as a sole-proprietorship in 1992, Markzware has been well regarded and has established significant goodwill for the printing and publishing industries as well as its customers.
For over 20 years, Markzware has reversed engineered many file formats. Markzware specializes in decrypting original source formats such as Adobe InDesign (.indd), QuarkXPress (.qxd) and Microsoft Publisher (.pub). These file types are better known as desktop publishing applications (DTP).
Computers, operating systems, applications and associated documents saved in a proprietary format have a limited lifespan, becoming obsolete and preventing access to data. Documents are often only readable by the system and application used to create the document. PageZephyr is the solution that provides assurance to access to proprietary documents indefinitely.
Ensuring backwards-compatibility and opening older proprietary file-types is difficult, as software manufacturers require the user to ‘upgrade to the latest version’ to access their data. Markzware’s PageZephyr technology resolves forwards and backwards-compatibility problems. PageZephyr users can access business assets and content created in QuarkXPress version 3 up to QuarkXPress version 9, files created in Microsoft Publisher 2002-2010 and Adobe InDesign CS – CS5.
PageZephyr provides a means for corporate assets to be transformed into a state that is accessible and reusable in a format that is easy to repurpose, without the need for the native (proprietary) application.
PageZephyr technology performs a digital forensics process, then homogenizes the proprietary data into a common TEXT or RTF file type. PageZephyr indexes folders and directory volumes, allowing the user to search for combinations of keywords or keyword phrases within proprietary file-types.
By employing PageZephyr, content can be customized with editing, allowing one to create entirely new content. Re-creating and repurposing content from what already exists reduces a significant amount of time and expense, while increasing content value. As an example, this can be done by extracting relevant stories of similar topics within a singular file-type (i.e. Adobe InDesign – .indd) or from within multiple file-types (.indd + .qxd + .pub), then combining and editing those stories.
PageZephyr’s capability to data mine and to harvest text content is not limited to the speaking language in which the document was created. The following is an example of an Arabic DTP layout, highlighting a keyword found in the text, again WITHOUT having to open the native Adobe InDesign document:

PageZephyr displaying an Adobe InDesign document. All the results you need are at your fingertips with Markzware’s solution.
Additionally, PageZephyr can help inform and alert the intelligence community of suspect material used within these highly secure, proprietary and encrypted DTP file layouts such as Adobe InDesign, QuarkXPress or Microsoft Publisher.
We are eager to partner or license our file ‘reader’ technology or to partner with a larger company. If interested, please contact Markzware.
Please visit the PageZephyr product page for more info and access to the free 30-day, fully functional PageZephyr demonstration version. We are not aware of any products competing with PageZephyr, or with Markzware’s Desktop Publishing data conversion tools.