PageZephyr® Content Search Technology for Counterterrorism, Intelligence and Law Enforcement
|
PageZephyr® Deep Search Technology
Detection and extraction of content from unsearchable
Desktop Publishing file types for
Counterterrorism, Intelligence and Law Enforcement.
By Patrick Marchese CEO/President
Markzware Company Introduction
Problem: Prevention of search circumvention
The Solution
Technology Implementation
PageZephyr Brief
Summary
Markzware Company Introduction
Markzware, a California, USA based company with offices in South Holland, The Netherlands, is a software publisher with expertise in the worldwide graphics, printing and publishing industries. Founded in 1992, Markzware was created by Patrick Marchese and Ron Crandall and incorporated in 1995. They started with a dream of a superior quality control product for designers, publishers and printers, which created the preflight industry within the graphic arts sector.
In the fall of 1999, Markzware received a U.S. patent with regards to examining, verifying, correcting and approving electronic documents prior to printing, transmission or recording.” Markzware’s flagship product, FlightCheck®, is the embodiment of that patent. FlightCheck dominates the industry it created, serving thousands of professional creative design agencies, print-shops and publishers worldwide.
Markzware’s affordable, yet extremely powerful Adobe InDesign, QuarkXPress and Microsoft Publisher conversion tools are very popular products created by technology developed with their unrivaled skill in decrypting and de-encoding these closed, proprietary file formats. This core technology is created to index and “read” these complicated and highly encrypted page layout applications. Markzware’s Common Reader Architecture has the ability to “x-ray” a document and tell the user exactly what is inside.
During the past 2 years, Markzware has refocused the company to leverage its technology in the area of content search, extraction and distribution. The recently released PageZephyr 2.0 is highly innovative because it can index content, search content and extract content within search resistant, proprietary file types such as Adobe InDesign, QuarkXPress and Microsoft Publisher. It makes it possible to perform digital forensic string search on vast archives of these previously unsearchable files, with obvious possible benefits for the Intelligence, eForensics and eDiscovery communities.
THE PROBLEM –
How can you prevent malefactors from using commonly available, proprietary document formats to evade digital forensic string search (DFSS) detection?
There are a number of commonly available Desktop Publishing (DTP) applications whose file formats are proprietary in nature and cannot be queried by the DFSS search engines in use today. This is to say that the text content is not “in the clear,” rather it is encoded in a number of different schemes. In some cases, in addition to the proprietary encoding, the document files are encrypted by the application during the File>Save process.
The most well known and commonly used of these applications are:
- QuarkXPress
- Microsoft Publisher
- Adobe InDesign
- Adobe PageMaker
Unknown file formats cannot be opened, scanned or converted with
most solutions, yet can be with Markzware PageZephyr.
Since these encoded formats are not readable by DFSS engines, any that are encountered in the Search Universe are skipped over and any evidence contained therein goes undiscovered. The nature of these DTP applications suggests that they would not necessarily be used for common communication, rather they lend themselves to elaboration, detail and planning documents, making them high-value intelligence resources.
Exacerbating the problem is the fact that documents created by some older versions of these products cannot be read, even by the contemporary versions of the applications. It is not an unreasonable stretch to suppose malefactors may be obtaining and using older, legacy versions to detail their plans; knowing the limitations of current DFSS, and basic commercial technology.
Can past experience shed any light on this?
Similar problems have been mitigated through the use of advanced technology. For instance, It is common knowledge that terrorists have hidden messages within seemingly harmless electronic images, family photos, vacation pictures and so forth. Fortunately, the technology of steganography1 stepped into the breach to help.
1Source: http://www.continuitycentral.com/news02584.htmIn another example, the threat of cyber attacks via the so-called “zero-day” tactics, are well known: “A zero-day vulnerability occurs when a flaw in software code is discovered and code exploiting the flaw appears before a fix or patch is available. Once a working exploit of the vulnerability has been released into the wild, users of the affected software will continue to be compromised until a software patch is available or some form of mitigation is taken by the user2.”
The “File Format Vulnerabilities” continue to be the first choice for attackers to conduct zero-day and targeted attacks2”
2Source: http://www.sans.org/top-cyber-security-risks/Yet the true “File Format Vulnerabilities” may be the file that most intelligence agencies cannot easily open, index or otherwise search proprietary DTP (Desktop Publishing) file format documents. Indeed, the problem of hidden information is well known; a recent publication from the Office of the Director of National Intelligence (ODNI) states, “This [XML Schema] is provided to address the shortcomings of PowerPoint[’s] …potential security issues introduced by Microsoft’s OLE technology, which introduces the potential for hidden classified information in MS Office file formats3” which is also present in Microsoft Publisher.
3Source: Intelligence Community Metadata Standard for Publications (IC MSP) v4.1 – IP XML Schema PUBS-XML – http://www.dni.gov/ICIS/dsca/pubs/xml/pubs_xml.htmHow widespread is this?
Since these closed formats cannot be searched except by deliberate and extreme measures, the number of these documents containing content of a criminal nature remains largely unknown; as is the traffic volume and widespread usage. One can posit that the availability and popularity of DTP applications and the sheer volume of the documents created presumes a high probability of nefarious use.
QuarkXPress (.QXP) once the main DTP application with a 95% market share, has recently been eclipsed by Adobe InDesign, currently in control of that market with, according to Inc Magazine, about a 75% share of the market4.
4Source: http://www.inc.com/magazine/20100401/can-quark-turn-the-corner.htmlJurgen Kurz, Senior VP of Desktop Systems for Quark stated “[Quark] estimates that well over three million copies of QuarkXPress have been licensed over the years, with more than five billion QuarkXPress documents in circulation worldwide5.” Couple that with Adobe and similar applications, the volume could be twice that or more.
5Source: Macworld http://www.macworld.com/article/50982/2006/05/quarkevent.htmlIn the Middle East, Arabic, Hebrew and Indic languages have long been a problem for these page layout applications. Third party “helper plugins” were required to make Arabic and Hebrew flow and display correctly.
Today in the Middle East, as worldwide, many are switching to Adobe InDesign that has largely solved these text flow issues. Yet, many thousands of users in the Middle East and elsewhere are still creating documents with older versions of these applications. In some cases, these legacy files cannot be read or imported by the contemporary versions of the creator application.
The conventional solution to searching these files requires the installation and maintenance of countless legacy applications and versions along with the antiquated operating systems required to run them. Even with such an elaborate management arsenal, each suspect file must be manually opened before searching. Considering the existing and ever-growing file volumes, it is a daunting, if not impossible, task.
Impossible for most, but not for Markzware’s code breakers…
THE SOLUTION –
PageZephyr for Indexing, Searching and Extracting content from proprietary file formats.
The risk that malefactors are passing on sensitive information, right under our noses using Desktop Publishing layout software is far from existential. A possibility that is particularly troubling, considering there is a technology solution readily at hand: Markzware’s Common Reader technology as demonstrated by PageZephyr.
What could happen?
Imagine your search universe is revealing hundreds of QuarkXPress v4.11 files accumulating in a suspected terrorist group’s arena. How do you quickly evaluate the contents of these legacy Desktop Publishing files?
Some of the encrypted and proprietary file formats that Markzware PageZephyr can scan, index and report include QuarkXPress 4.0 to 8.x, Adobe InDesign CS to CS5, and Microsoft Publisher.
Counterintelligence is time-sensitive. We need to know in real-time if there is a potential threat within any search universe; especially if any file, from among thousands, may contain detailed plans, secret schematics or national security issues. PageZephyr saves time with its quick start, tips and shortcuts.
A thorough investigation requires that all data that comes across your listening post be indexed. PDF, Microsoft Word, RTF, eMail and various other text based file formats are readily accessible and are not the problem. These document types are either open standards or their file structures and formats leave the text content largely “in the clear” and thus, they are simple to index, search and extract into XML or other forms.
The current DFSS and Enterprise tools used by government security agencies, law enforcement and forensic scientists do a good job handling the text based document formats but they are ill equipped to manage the search deficits posed by these closed format files.
PageZephyr demonstrates how Markzware’s unique Common Reader Architecture acts as a filter. It enables your agency to scan and reveal the Matching String and Object Identification path, as well as provide the entire content reference material relating to the Matching String for instant contextual analysis. PageZephyr does all this without the need to open the document in the authoring application.
PageZephyr displaying an Adobe InDesign document.
All the results you need are at your fingertips with Markzware’s solution.
Now that a Matching String is discovered, all or any portion of the document contents can be exported to RTF, preserving the original formatting, or to Plain Text for use in XML based analytic databases and to comply with the Stewardship provisions of the Intelligence Community Directive 501.
With Markzware PageZephyr, you can copy/paste content from
the Story Board or export content in formatted RTF or Plain Text
Once activated, PageZephyr continues to watch and index the designated Search Arena. Now these previously closed, encrypted and unsearchable documents are freed and available for digital forensic string search.
Technology Implementation
Markzware’s Common Reader Architecture is a product of years of independent investigation, engineering and development. Users of Markzware products and technology can rely on this independence for continued support and development that is not tied or subject to any 3rd party relationships or licensing.
PageZephyr, as demonstrated, is a commercial desktop product for the Apple Macintosh utilizing the Apple OS/X SpotLight search engine. The Markzware Common Reader Architecture (CRA) on which it is based, is in its formative stages as an API and can be developed and adapted to meet the integration needs of the search solution used in your organization. At present, it consists of C++ readerLib libraries and header files that can be used to integrate into physical systems.
A JNI layer could be written to interface Java code with our C++ based readerLib library. Once that integration work is done, then it would be a matter of calling readerLib APIs to process InDesign, QuarkXPress or Publisher files, run queries on the content, and process the results.
Development of an API for Windows and Linux are also in the formative stage to meet the integration needs for those operating systems.
PageZephyr Brief
PageZephyr® is a content search and extraction engine for highly encrypted and protected file formats such as Adobe InDesign, QuarkXPress and Microsoft Publisher. PageZephyr was developed using Markzware’s Common Reader Architecture (CRA), the high-performance technology at the heart of Markzware’s current and future products.
Content-intense professions that specialize in managing, locating and re-using large volumes of specific data will substantially benefit from this new Markzware application. PageZephyr is especially valuable to businesses such as: advertising, publishing and printing, manufacturing, finance, legal (e-discovery / e-forensics), information, governmental agencies and intelligence gathering.
PageZephyr is the first and only solution that indexes, locates and outputs content created from proprietary or encoded formats without the need for the original authoring application. The current version of PageZephyr is compatible with .INDD (Adobe InDesign CS through CS5), .QXD/.QXP (QuarkXPress versions 4 through 8) and .PUB (Microsoft Publisher 2002 through 2007). In addition, PageZephyr’s index extension capabilities can encompass indexing to both attached volumes and networked volumes, to include your entire search universe.
PageZephyr allows searching of content through user-selected keywords and strings. Operationally, PageZephyr first identifies the matching string located within the documents in the search arena. The results can be filtered by file size, modification date or creation date. The matching string is highlighted in the text displayed in PageZephyr’s Story Board including the entire text of the specific story where the match is located. The text displayed may be edited and/or exported to formatted RTF or Plain Text.
This video demonstrates the PageZephyr User Interface
This video demonstrates PageZephyr’s Advanced Search Options
SUMMARY
Markzware has nearly two decades of comprehensive, extremely hard to obtain knowledge on these closed, proprietary DTP file formats. No other company can rival Markzware’s code breaking capability, DTP file knowledge and development evolution.
The case for action…
- Criminal, terrorist and other nefarious use of proprietary, closed and encrypted Desktop Publishing (DTP) file formats pose a threat to National Security.
- The content of these documents cannot be openly indexed, scanned, searched or exported by the digital forensic search tools available today.
- The current lack of a comprehensive DFSS capability means no one can quantify the risk assessment; but the widespread distribution of these products and the volume of existing documents suggests the risk is certain and present.
- Markzware’s Common Reader Architecture (CRA) can help minimize this national security threat by…
- Indexing, searching and exporting this content
- Revealing the matching strings together with all the relevant content without the native application
- Providing the object identification path of the suspect document
- Delivering these critical services today, with PageZephyr on the Macintosh desktop
- Developing a tool kit for integrating the Common Reader Architecture into the DFSS tools in use today and in the future
For more information or to schedule a demonstration of PageZephyr, please contact the CEO of Markzware, Mr. Patrick Marchese for further information. Write to: pgm@markzware.com or telephone 949-756-5100 Ext. 200
Patrick Marchese CEO/President Markzware, Inc. http://www.markzware.com http://www.linkedin.com/in/patrickmarchese1805 East Dyer Road, ste. 101, Santa Ana, CA 92705, 800-300-3532
www.markzware.com – www.pagezephyr.com
Other articles you might like:

No Comments
Trackbacks/Pingbacks