Online document search reveals secrets

This ties in very directly to what I found in early July 2003 relative to a University of Tennessee memo posted online!! Not surprising - interesting to think about going back to very simple ASCII documents, also relates to the clean HTML programs (HTML Tidy?).


New Scientist: "Online document search reveals secrets

WEALTH OF CORPORATE SECRETS ON THE WEB
Many documents posted online may contain sensitive corporate or personal information, according to AT&T researcher Simon Byers, who was able to unearth hidden information from thousands of Microsoft Word documents posted on the Web using an ordinary search engine and a random selection of keywords. Byers targeted Word documents because they're so common, but he stressed that other document formats, such as Adobe PDF, may contain similar hidden information. After downloading the Word files, Byer used the free software tools "antiword" and "catdoc" to convert them to plain text. Then, using a simple script he wrote, Byers was able to locate text that had been deleted from the original Word files, including people's names and other personal identifiers, e-mail headers, network paths and text from related documents. "The worst is erased text. This has bitten people surprisingly often," says Bruce Schneier, a security expert with Counterpane. Microsoft Office UK marketing manager Neil Laver says the company is working on ways to better ensure sensitive information is not inadvertently leaked in files. The next version of Office 2003 will include tools that will allow users to remove personal information from documents as well as new "information rights management" software that will enable an author to determine who can read or forward a document. Meanwhile, Schneier recommends converting documents to plain ASCII before publishing online: "I don't know of any programs that effectively clean out the extra text." (New Scientist 15 Aug 2003)