PageZephyr Search® Deep Search Technology
Detection of content from unsearchable
Desktop Publishing file types for
Counterterrorism, Intelligence and Law Enforcement.
By Patrick Marchese, CEO/President
Markzware Company Introduction
Problem: Prevention of search circumvention
Markzware Company Introduction
Markzware, a California, USA based company with offices in South Holland, The Netherlands, is a software publisher with expertise in the worldwide graphics, printing and publishing industries. Founded in 1992, Markzware was created by Patrick Marchese and Ron Crandall and incorporated in 1995. They started with a dream of a superior quality control product for designers, publishers and printers, which created the preflight industry within the graphic arts sector.
In the fall of 1999, Markzware received a U.S. patent with regards to examining, verifying, correcting and approving electronic documents prior to printing, transmission or recording.” Markzware’s flagship product, FlightCheck®, is the embodiment of that patent. FlightCheck dominates the industry it created, serving thousands of professional creative design agencies, print-shops and publishers worldwide.
Markzware’s affordable, yet extremely powerful Adobe InDesign, QuarkXPress and Microsoft Publisher conversion tools are very popular products created by technology developed with their unrivaled skill in decrypting and de-encoding these closed, proprietary file formats. This core technology is created to index and “read” these complicated and highly encrypted page layout applications. Markzware’s search technology has the ability to “x-ray” a document and tell the user exactly what is inside.
During the past 2 years, Markzware has refocused the company to leverage its technology in the area of content search and distribution. PageZephyr Search is highly innovative because it can index content and search content within search resistant, proprietary file types in Adobe InDesign. It makes it possible to perform digital forensic string search on vast archives of these previously unsearchable files, with obvious possible benefits for the Intelligence, eForensics and eDiscovery communities.
THE PROBLEM –
How can you prevent malefactors from using commonly available, proprietary document formats to evade digital forensic string search (DFSS) detection?
There are a number of commonly available Desktop Publishing (DTP) applications whose file formats are proprietary in nature and cannot be queried by the DFSS search engines in use today. This is to say that the text content is not “in the clear,” rather it is encoded in a number of different schemes. In some cases, in addition to the proprietary encoding, the document files are encrypted by the application during the File>Save process.
The most well known and commonly used of these applications are:
- Microsoft Publisher
- Adobe InDesign
- Adobe PageMaker
INDD file formats cannot be opened, scanned or converted with most solutions, yet can be with Markzware’s PageZephyr Search.
Since these encoded formats are not readable by DFSS engines, any that are encountered in the Search Universe are skipped over and any evidence contained therein goes undiscovered. The nature of these DTP applications suggests that they would not necessarily be used for common communication, rather they lend themselves to elaboration, detail and planning documents, making them high-value intelligence resources.
Exacerbating the problem is the fact that documents created by some older versions of these products cannot be read, even by the contemporary versions of the applications. It is not an unreasonable stretch to suppose malefactors may be obtaining and using older, legacy versions to detail their plans; knowing the limitations of current DFSS, and basic commercial technology.
Can past experience shed any light on this?
Similar problems have been mitigated through the use of advanced technology. For instance, It is common knowledge that terrorists have hidden messages within seemingly harmless electronic images, family photos, vacation pictures and so forth. Fortunately, the technology of steganography1 stepped into the breach to help.1
In another example, the threat of cyber attacks via the so-called “zero-day” tactics, are well known: “A zero-day vulnerability occurs when a flaw in software code is discovered and code exploiting the flaw appears before a fix or patch is available. Once a working exploit of the vulnerability has been released into the wild, users of the affected software will continue to be compromised until a software patch is available or some form of mitigation is taken by the user2.”
The “File Format Vulnerabilities” continue to be the first choice for attackers to conduct zero-day and targeted attacks2”
Yet the true “File Format Vulnerabilities” may be the file that most intelligence agencies cannot easily open, index or otherwise search proprietary DTP (Desktop Publishing) file format documents. Indeed, the problem of hidden information is well known; a recent publication from the Office of the Director of National Intelligence (ODNI) states, “This [XML Schema] is provided to address the shortcomings of PowerPoint[’s] …potential security issues introduced by Microsoft’s OLE technology, which introduces the potential for hidden classified information in MS Office file formats3” which is also present in Microsoft Publisher.3
Source: Intelligence Community Metadata Standard for Publications (IC MSP) v4.1 – IP XML Schema PUBS-XML – http://www.dni.gov/ICIS/dsca/pubs/xml/pubs_xml.htm
How widespread is this?
Since these closed formats cannot be searched except by deliberate and extreme measures, the number of these documents containing content of a criminal nature remains largely unknown; as is the traffic volume and widespread usage. One can posit that the availability and popularity of DTP applications and the sheer volume of the documents created presumes a high probability of nefarious use.
QuarkXPress (.QXP) once the main DTP application with a 95% market share, has recently been eclipsed by Adobe InDesign, currently in control of that market with, according to Inc Magazine, about a 75% share of the market4.
Jurgen Kurz, Senior VP of Desktop Systems for Quark stated “[Quark] estimates that well over three million copies of QuarkXPress have been licensed over the years, with more than five billion QuarkXPress documents in circulation worldwide5.” Couple that with Adobe and similar applications, the volume could be twice that or more.5
Source: Macworld http://www.macworld.com/article/1050982/quarkevent.html
In the Middle East, Arabic, Hebrew and Indic languages have long been a problem for these page layout applications. Third party “helper plugins” were required to make Arabic and Hebrew flow and display correctly.
Today in the Middle East, as worldwide, many are switching to Adobe InDesign that has largely solved these text flow issues. Yet, many thousands of users in the Middle East and elsewhere are still creating documents with older versions of these applications. In some cases, these legacy files cannot be read or imported by the contemporary versions of the creator application.
The conventional solution to searching these files requires the installation and maintenance of countless legacy applications and versions along with the antiquated operating systems required to run them. Even with such an elaborate management arsenal, each suspect file must be manually opened before searching. Considering the existing and ever-growing file volumes, it is a daunting, if not impossible, task.
Impossible for most, but not for Markzware’s code breakers…
Back to top
THE SOLUTION –
PageZephyr Search for Indexing and Searching content from proprietary file formats.
The risk that malefactors are passing on sensitive information, right under our noses using Desktop Publishing layout software is far from existential. A possibility that is particularly troubling, considering there is a technology solution readily at hand: Markzware’s content search technology as demonstrated by PageZephyr Search.
What could happen?
Imagine your search universe is revealing hundreds of InDesign files accumulating in a suspected terrorist group’s arena. How do you quickly evaluate the contents of these legacy Desktop Publishing files?
Markzware’s PageZephyr Search can scan, index and report on
encrypted and proprietary file formats in Adobe InDesign CS through CC.
Counterintelligence is time-sensitive. We need to know in real-time if there is a potential threat within any search universe; especially if any file, from among thousands, may contain detailed plans, secret schematics or national security issues. PageZephyr Search saves time with its quick start, tips and shortcuts.
A thorough investigation requires that all data that comes across your listening post be indexed. PDF, Microsoft Word, RTF, eMail and various other text based file formats are readily accessible and are not the problem. These document types are either open standards or their file structures and formats leave the text content largely “in the clear” and thus, they are simple to index, search and paste into XML or other forms.
The current DFSS and Enterprise tools used by government security agencies, law enforcement and forensic scientists do a good job handling the text based document formats but they are ill equipped to manage the search deficits posed by these closed format files.
PageZephyr demonstrates how Markzware’s unique search technology acts as a filter. It enables your agency to scan and reveal the Matching String and Object Identification path, as well as provide the entire content reference material relating to the Matching String for instant contextual analysis. PageZephyr Search can search InDesign without the need to open the document in the authoring application.
PageZephyr displaying an Adobe InDesign document.
All the results you need are at your fingertips with Markzware’s solution. Now that a Matching String is discovered, all or any portion of the document contents can be copied and pasted to RTF, preserving the original formatting, or to Plain Text for use in XML based analytic databases and to comply with the Stewardship provisions of the Intelligence Community Directive 501.
With Markzware PageZephyr Search, you can copy/paste content to formatted RTF or Plain Text.
Once activated, PageZephyr Search continues to watch and index the designated Search Arena. Now these previously closed, encrypted and unsearchable documents are freed and available for digital forensic string search.
Back to top
Markzware’s search technology is a product of years of independent investigation, engineering and development. Users of Markzware products and technology can rely on this independence for continued support and development that is not tied or subject to any 3rd party relationships or licensing.
PageZephyr Search, as demonstrated, is a commercial desktop product for the Apple Macintosh, utilizing the Apple OS/X SpotLight search engine. The Markzware search technology on which it is based is in its formative stages as an API and can be developed and adapted to meet the integration needs of the search solution used in your organization. At present, it consists of C++ readerLib libraries and header files that can be used to integrate into physical systems.
A JNI layer could be written to interface Java code with our C++ based readerLib library. Once that integration work is done, then it would be a matter of calling readerLib APIs to process InDesign files, run queries on the content, and process the results.
Development of an API for Windows and Linux are also in the formative stage to meet the integration needs for those operating systems.
Back to top
PageZephyr Search® is a content search application for highly encrypted and protected file formats in Adobe InDesign. PageZephyr Search was developed using Markzware’s Common Reader Architecture (CRA) technology, the high-performance technology at the heart of Markzware’s current and future products.
Content-intense professions that specialize in managing, locating and re-using large volumes of specific data will substantially benefit from this new Markzware application. PageZephyr Search is especially valuable to businesses such as: advertising, publishing and printing, manufacturing, finance, legal (eDiscovery / e-forensics), information, governmental agencies and intelligence gathering.
PageZephyr Search is the only solution that indexes and locates content created from proprietary or encoded formats, without the need for the original authoring application. PageZephyr Search is compatible with .INDD (Adobe InDesign CS through CC) and has index extension capabilities that can encompass indexing to both attached volumes and networked volumes, to include your entire search universe.
PageZephyr Search allows searching of content through user-selected keywords and strings. Operationally, PageZephyr Search first identifies the matching string located within the documents in the search arena. The results can be filtered by file size, modification date or creation date. The matching string is highlighted in the text displayed in PageZephyr Search, including the entire text of the specific story where the match is located. The text displayed may be copied and pasted for editing or pasted for formatted RTF or Plain Text.
This video demonstrates content search with
PageZephyr Search and advanced search options
Back to top
Markzware has nearly two decades of comprehensive, extremely hard to obtain knowledge on these closed, proprietary DTP file formats. No other company can rival Markzware’s code breaking capability, DTP file knowledge and development evolution.
The case for action…
- Criminal, terrorist and other nefarious use of proprietary, closed and encrypted Desktop Publishing (DTP) file formats pose a threat to National Security.
- The content of these documents cannot be openly indexed, scanned, searched or exported by the digital forensic search tools available today.
- The current lack of a comprehensive DFSS capability means no one can quantify the risk assessment; but the widespread distribution of these products and the volume of existing documents suggests the risk is certain and present.
- Markzware’s search technology can help minimize this national security threat by…
- Indexing and searching this content
- Revealing the matching strings together with all the relevant content and without the native application
- Providing the object identification path of the suspect document
- Delivering these critical services today, with PageZephyr Search on the Macintosh desktop
- Developing a tool kit for integrating the search technology into the DFSS tools in use today and in the future
Back to top
For more information or to schedule a demonstration of PageZephyr Search, please contact the CEO of Markzware, Mr. Patrick Marchese for further information. Write to: email@example.com or telephone 949-756-5100 Ext. 200
- Markzware Patent Information is located on the Markzware Patent page.
- Copyright Information is located on the Markzware License Agreement page.
1805 E Dyer Rd Ste 101, Santa Ana, CA 92705