The hidden dangers lurking inside your documents

By: Dr. Guy Bunker, SVP Products at Clearswift
Published: Tuesday, November 10, 2015 - 14:51 GMT Jump to Comments

Communicating with the public is an important part of any public service. Reporting on your successes, analysing opportunities and providing important information ensures that any government is both effective and understood.

Compiling reports often requires dealing with sensitive information relating to individuals, such as criminal records, child protection criteria, and voters’ names and addresses. Once the report is complete, however, data must be made available in an anonymised format or included as part of a broader trend. It must not contain the specific personal details of the individuals.

The idea that personal data would be included in a final report seems ludicrous. Doing so would be embarrassing and potentially dangerous, and no report could surely get through an approval process including such information. Yet this has happened, by mistake, on many occasions in the past. Without intervention it will likely do so again.

How do mistakes happen?

Sensitive data can sometimes be hidden within documents, where most people don’t know to look. This can be as simple as hidden columns, rows or sheets in spreadsheets, or data such as revision history. Comments and authors of the document are often retained within a document’s metadata. Information in metadata is frequently sought and used in phishing and hacking attempts.

The vast majority of people don’t know about these risks, and therefore the potential for exploitation is much higher. For those who are aware it is easy to find things that were thought to have been deleted.

A worst case scenario could be a report around children in care. The unwitting compiler pastes tables with names and addresses into the document for ease of reference, saving as they go. Once finished the tables are deleted from view, but they still exist in revision history or fast-save information.

The report goes out and an IT-savvy viewer searches, finds the information and uses it for ill-gotten gain: blackmail, selling it to those who could abuse it, or sending it to the tabloids. This obviously damages the reputation of the department who released the report, but it is the impact on the individuals which is of paramount importance.

Learning from abroad

‘Hidden’ metadata has already caused embarrassment for governments and public bodies. In August 2014, it was discovered that the Australian Federal Police mistakenly published highly sensitive information on criminal investigations. The police provided documents to the Senate, which were then made publicly available online.

Years later it transpired that the documents contained information about the subjects of criminal investigations and telecoms interception activities which were “hidden behind electronic redactions within the document” and “could, under certain circumstances, be accessed”.

The information included the address of a target of surveillance, the types of criminal investigations and offences being investigated, the names of several officers and other identifying information of individuals connected to investigations.

Incidents closer to home

In April 2015, prior to the UK general election, a letter appeared in the media signed from a number of businesses lending their support to the Conservative Party. This apparently independent endorsement seemed a coup for the party, but later the metadata revealed that the letter had originated from Conservative HQ.

While these incidents have been embarrassing, they have not yet been disastrous. With the increase in freedom of information requests, coupled with the increase in information flow, it is only a matter of time before a more catastrophic incident occurs.

A review of several public-facing websites shows that between 60 and 90% of documents that can be downloaded contain metadata which would be useful to a cyber-criminal. Moreover, in today’s collaborative environment, departments have more than just their own users to worry about.

Metadata relating to organisations can be used as an ‘in road’, particularly if the third party is a smaller organisation with a less complete approach to critical information protection. It could be easier to infiltrate a larger organisation by first accessing a partnership organisation with fewer defences. The end result would be the same, despite the larger organisation never being attacked directly.

Prevention is key

Fortunately, there is a solution which can rapidly reduce the risk. Generally the best approach is to remove (sanitise) all metadata from documents before they are issued. It is rare that metadata is useful to people outside the organisation, and it is more likely to have a damaging effect than a useful one. Unless you really know what you’re doing, the best thing is to strip it out completely.

While this can be done manually, for example using Microsoft Office and other document publishing software, this is only effective if users are aware of the functionality and remember to use it before they send every document.

The other approach is to implement a technological solution that automatically strips out metadata and revision history information when documents leave the organisation, allowing the visible aspects of the document to continue unaffected.

At Clearswift we have recently seen a spike in government interest in this particular issue. We hope that this reflects a growing recognition of the challenges faced by departments. All too often there is a problem without an effective solution, but this time it’s different. In this instance, a problem can be addressed at the source, rather than waiting for a major disaster to drive the need for a solution.

The views and opinions expressed in this article are those of the author(s) and do not necessarily reflect the official policy or position of The Information Daily, its parent company or any associated businesses.



Outdated infrastructure and an increasingly fragmented market threaten the future of technology-enabled integrated care.

County Durham voters back devolution in the North-East, Sir Digby Jones considers run for West Midlands mayor…

The recent launch of The Mayoral Tech Manifesto 2016 on London’s digital future, sets out a clear agenda…

The manufacturing industry is currently facing scrutiny from parties concerned for its survival. Far from facing…

Almost a year ago, I made some predictions for what would take place in government and public sector customer…

Sheffield, Warrington and Doncaster announce cuts, Lincolnshire is held to data ransom, fight begins for West…

Working for an education charity delivering numeracy and literacy programmes in primary schools, I’m only…

Northamptonshire County Council recently received the maximum four star rating from Better connected after putting…

Historically, the entrance of new generations into the workplace has caused varying levels of disruption. The…

Following another commendation for digital services, Surrey County Council's Web and Digital Services Manager,…

We cannot carry on spinning the roulette wheel that is cyber security, knowing that the “castle and moat”…

This week David Cameron wades into row over £69m of cuts planned by Oxfordshire CC; Stoke on Trent plans…