0% found this document useful (0 votes)
11 views

Behind the Data_ Investigating metadata

K

Uploaded by

644resh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Behind the Data_ Investigating metadata

K

Uploaded by

644resh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

2/9/23, 4:02 AM Behind the Data: Investigating metadata

Read Articles & Databases

Watch Films

Listen Podcasts

Learn Guides & Workshops

The Kit
kit.exposingtheinvisible.org

Investigating Visual Media

Behind the Data: Investigating metadata

First published on December 1, 2017

Last updated on July 31, 2020

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 1/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Read in a different language

English

Table of contents:
Expose

Protect

Verify

Working with metadata

Metadata tools

Removing metadata

Additional resources

A guide that looks at how metadata has been used to expose, protect and verify abuses and
excesses of power.

This guide looks at how metadata has been used to expose, protect and verify abuses and
excesses of power. We will then focus on exactly what metadata is contained within what
format and introduce tools to extract, strip and add metadata.

Metadata can be understood as a modern version of traditional book cataloguing. The small
cards stacked in library drawers provide the title of the book, publication date, author(s) and
location on the library shelves. Similarly in the digital sphere, a digital image may contain
information about the camera that took the image, the date and time of the image, and often
the geographic coordinates of where it was taken. Such multimedia-related metadata is also
known as EXIF data, which stands for Exchangeable Image File Format.

The Australian National Data Service provides the following definition: “Metadata can actually
be applied to anything. It is possible to describe a file on a computer in exactly the same way
asone would describe a piece of art on a wall, a person on a job, or a place on a map. The only
differences are in the content of the metadata, and how it refers to the ‘thing‘ being described.”
Metadata is structured information that describes, explains, locates or otherwise simplifies the
retrieval, usage or management of an information resource. Metadata is often called data about
data or information about information.

In an interview with Exposing the Invisible, Smári McCarthy, head of the technology team on the
Organized Crime and Corruption Reporting Project, says that “every information source has
metadata, sometimes it is very explicit, created as part of the documentation process of
creating the data. PDF files, images, word documents, all have some metadata associated with
them unless it has been intentionally scrubbed.”

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 2/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

To illustrate this point, McCarthy describes a small chip contained within all digital cameras
which tracks all the metadata of that device. He explains that all of these chips, known as
Charge-Coupled Device (CCD) chips, basically light-sensitive circuits, come with minor factory
flaws that are unique to the individual CCD chip. This idiosyncrasy means that the data
contained within all images taken with that device, data one would usually ignore and is
invisible to the human eye, becomes a digital ‘finger print’ identifying all images taken with
that particular CCD chip. This highlights the almost omnipresence of metadata as well as the
possibilities of working with it. McCarthy calls metadata “a best friend, it helps with searching,
it helps with indexing and with understanding the context of the information”. But even
metadata enthusiasts like McCarthy admit that metadata can also become “a worst enemy”,
and thus understanding it is crucial not only for people working with metadata, but also for the
wider network of individuals and groups working on sensitive information.

The possibilities of using metadata are multiple and varied. The Australian National Data
Service points out that:

“metadata generally has little value on its own. Metadata is information that adds value to
other information. A piece of metadata like a place or person’s name is only useful when it is
applied to something like a photograph or a clinical sample. There are, though, counter-
examples, like gene sequence annotations and text transcripts of audio, where the metadata
does have its own value, and can be seen as useful data in its own right. It’s not always obvious
when this might happen. A set of whaling records (information about whale kills in the 18th
century) ended up becoming input for a project on the changing size of the Antarctic ice sheet
in the 20th century.”

Michael Kreil, an open data activist, data scientist and data journalist working at OpenDataCity,
a Berlin-based data-journalism agency which specialises in telling stories with open data, says
:

“metadata seems to be some kind of a by-product, yet it can be used to analyse certain
behaviours, of political and social nature, for example. Let’s take something simple as an
example, like a phone call. Making a phone call doesn't seem very important. It's hard to
analyse one million phone calls or one million photos, with the analysis being based on speech
recognition or face detection, both fields still being in a state of technological development. But
it's pretty easy to analyse the metadata contained within them, because metadata has a simple,
standardised format for every phone call: there is the date, the timestamp, the location and
numbers of the caller and the callee. This standard allows us to analyse a huge amount of
metadata in one big database. For example are there instances happening in the population
that are represented in the metadata, such as who has depression or who is committing
adultery?”

Often, this type of metadata is more valuable that the content of the phone call. This metadata
provides information about networks, their scale, frequently visited locations and far more
besides. There is currently no online communication method which does not leave metadata
traces throughout or at some crucial point of the communication process.

Activists, experts, investigative journalists and human rights defenders are increasingly taking
an interest in metadata, as are governments and corporations. Using metadata has proven
https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 3/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

helpful in various cases in fighting corruption, or as a weapon to crackdown on dissidents and


human rights defenders.

It is important to understand how metadata works and how to use it as a tool. It is also vital to
know also how to protect oneself and one’s work in relation to the metadata we generate.
Whether exposed, stripped, or added and verified, standalone or used in cross-reference with
other data found through other sources (conventional or not), metadata is key to today’s
investigative journalism and human rights advocacy especially when it comes to
documentation, to image and video activism as well as for evidence collection. Understanding
metadata and how to use it is crucial for self-protection and the protection of one’s work.

Expose
Metadata is a powerful tool to expose and provide evidence. In 2009, data scientist Michael
Kreil created Tell-all Telephone, a project that generated a visualisation of six months of
German Green Party politician Malte Spitz's telephone data. Michael Kreil Exposing the
Invisible that he had received “an Excel sheet with 36,000 lines of whatever in there, and there
was no tool at all to have a look inside. You could make a simple map, using just the geolocation
data, but you wouldn't see the aspect of time. You wouldn't see the movement. So, I wrote a
small prototype, just a simple map with a moving dot. This was actually the basis of the
application that went online a few weeks later.”

Exposing the individual... and everyone else

The data provided revealed much about Spitz’s behaviour, when he was walking down the
street or when he was on a train, as well as his whereabouts during his private time. Some of
the information was not provided by Spitz’s telecommunication company, like the phone
numbers he called or texted, or those who contacted him. This would have made it easy to not
only identify Spitz’s social and political circles and reveal much about him but also reveal
personally identifiable information about the people with whom he is in contact. Kreil and
Spitz were not granted access to this information, but the telecommunication company does
have access to it and this means that the authorities can also acquire access to all of this
information.

Kreil also used publicly available information, like Spitz’s online behaviour, appointments
announced on the party’s website as well as his tweets to corroborate the data provided by the
telecommunication company. By combining all of this data, it was possible to for Kreil to
pinpoint Spitz’s movements even further and the result provided a thorough analysis of Spitz’s

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 4/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

life and political activities. Kreil hoped to demonstrate how metadata can be used to not only
track an individual’s every move, but also to reveal how (meta)data retention can expose an
individual’s whole social and political network.

Image from the Tell-all Telephone website

More than meets the eye

Illinois Republican congressman Aaron Schock was known as the 'most photogenic'
congressman due, in part, to his Instagram account which featured him in eccentric and zany
poses in exotic locations. He posted pictures of himself jumping into snow banks, on sandy
beaches and in various private planes. The attention his photos attracted led to questions
about where Schock’s public-office-related business trips stopped and his holidays started.
The Associated Press (AP) began an investigation which extracted the geolocation data from
the photos Schock posted and tagged along with his location on his Instagram account and
then compared it to the travel expenses he was charging to his campaign expenses. The AP
analysed his travel expenses, his flight ecords of airport stopovers and the data extracted from
his Instagram account and found that taxpayer’s money and campaign funds had been spent
on private plane flights. It wasn't only Schock's Instagram that was revealing. The account of a

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 5/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

former Schock intern showing an image from a Katy Perry concert with the tag-line "You can't
say no when your boss invites you. Danced my butt off," was connected to a $1,928 invoice paid
to the ticket service StubHub.cm listed as a “fund-raising event” on Schock's expenses. The AP
published their findings in February 24, 2015 and on March 17, 2015 he announced his
resignation from Congress.

Image of Aaron Schock from his, now closed, Instagram account

In most cases, it is necessary to employ a variety of software, tools and resources to make
sense of the extracted metadata and extract meaningful information. A good example of these
creative investigation techniques using metadata is the case of Dmitry Peskov, Putin's
spokesperson. Peskov was questioned about his income as a state official when he was spotted
wearing an 18-carat gold Richard Mille watch, worth almost £400,000. The watch was visible
on his wrist in a photo posted from his wedding. During the ensuing controversy, Peskov stated
that the watch was a gift from his new wife, a claim which was later refuted by a photograph on
his daughter’s Instagram account. There, a photo posted by his daughter months before the
wedding showed Peskov wearing the same watch.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 6/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Images found of Peskov's watch at his wedding and the watch in question the Richard Mille RM 52-01.

Peskov was hit with another scandal when rumours emerged about his spending during his
honeymoon, which he spent with friends and family aboard the Maltese Falcon, one of the 25
most expensive yachts in the world. The weekly rent of the Maltese Falcon far exceeds the
politician’s declared economic means. Bellingcat reports that anti-corruption activist Aleksey
Navalny, who broke the news about the watch with the help of other activists and supporters,
took up the investigation regarding the yacht. By using the yacht's website, yacht-spotting
websites and Instagram photos from Peskov's daughter and one of Peskov's friends, they were
able to provide reasonable doubt of Peskov's denial of personally renting the Maltese Falcon.
Peskov's friend had posted photos of two yachts on his Instagram profile, and by using
VesselFinder, Navalny and co. managed to place the two yachts in the same area as the Maltese
Falcon at the same time. Navalny's team matched a small portion of a door that appeared in a
photo Peskov's daughter posted of herself on Instagram, to a video of the Maltese Falcon
showing the same door with two distinctive marks.

A lot of attention is focused on the metadata that can be extracted from images or from
communications. However, text files can be equally useful for an investigation, or pose an
equal threat as images. In 2005, the former prime minister of Lebanon, Rafik Hariri, was killed
along with 21 others. Though the United Nations investigators used metadata to investigate the
assassination of Hariri by looking through communication metadata they had received from
telecommunications companies, they did not pay attention to the metadata they left behind.
When their long-awaited report on Syria's suspected involvement in the assassination, known
as the Mehlis Report, was published, it caused a stir not only for its findings but for what a
deeper look into its metadata revealed. The metadata attached to the editing changes were
shown along with the exact times they were made. The key changes included the deletion of
names of officials allegedly involved in the assassination, including Bashar al-Assad’s brother
and brother-in-law. This not only jeopardised the (deleted) mentioned individual, and various
governments and international bodies involved in a gravely destabilised region; but the United
Nations and the Mehlis team too. The incident was considered extremely serious and lead to
the UN issuing a response to the concerns regarding the deletion.

Tools

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 7/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

There are many tools available that can be used to reveal the metadata in files and images,
though as can be seen in the case studies, in most cases a wider investigation is required to
make sense of the metadata. See the section on tools for information and description of the
tools.

Relevant tools: Phil Harvey’s ExifTool / Jeffrey’s Exif Viewer /The Exifer / Read Exifdata
/Metadata En Masse / and other softwares that include a metadata viewing feature.

Protect
Metadata is a double-edged sword: it can be extremely useful for investigating social justice
and corruption cases but it is also being used to troll and doxx. Human rights defenders,
women, female journalists & LGBTIQ individuals vocal on social media are all prime targets.
The increased usage of smartphones in protests and mobilisations worldwide has increased
and expanded the risks of sharing one’s location or whereabouts at a certain time, and one’s
identity can be determined through mobile phone tracking using the images posted. The
geolocation data available in images can be used to track anyone and anything, including
endangered species. In a South African reserve, visitors were advised not to disclose the
whereabouts of the animals spotted and to switch off the geotag function on their phones and
social media platforms as poachers and hunters were using this information posted online to
locate animals.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 8/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Image taken by Eleni de Wet in South Africa and posted on her twitter on 4 May 2012.

Metadata: Vice & the fugitive

In 2012, millionaire and controversial computer programmer and developer John McAfee,
founder of McAfee Virus Protection, was arrested based on metadata found on an image posted
by the media company, Vice. Vice journalists gained exclusive access to McAfee and
accompanied him on his escape from an investigation in Belize regarding the murder of one of
his neighbours. Vice not only posted the image, but bragged about their scoop by reporting on
the time they spent with McAfee. When the image was posted with its metadata revealing
where it was taken in addition to Vice’s publishing information on when they had seen him, it
was simple to determine McAfee’s whereabouts. Though the image was most probably sent
from the person who took it in Belize to Vice offices to be later uploaded on their website, it still
retained the metadata of where McAfee was. The Vice journalists in question should arguably
have known how to better protect their sources, as well as the subject of their reports, leading
Vice to issue an official statement about the event.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 9/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Image by Robert King taken from the article "We are with John McAfee Right Now Suckers", published on Vice on
December 3 2012.

One might assume that persons operating in high-risk areas and industries and taking part in
high-risk activities would be more careful about revealing their whereabouts, but this was not
the case for Michelle Obama or US soldiers in Iraq. In 2007, insurgents in Iraq used geotags
from images shared online by US soldiers to attack and destroy several US AH-64 Apache
helicopters. Michelle Obama's Instagram photos were geotagged revealing either her
whereabouts when taking the images, or the whereabouts of the person managing her account.
In both cases, this could and did pose a serious security threat not foreseen by those posting
the images.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 10/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Image taken from Michelle Obama's Instagram published on Fusion

Metadata can be and has been used to curtail freedom of speech an intimidate people online.
For example, it was used to doxx - a practice of targeting individuals for their political views or
personal lives. It has been used to target women activists online, women game developers,
human rights activists and journalists, among others. Managing metadata correctly is crucial
for an individual with a high profile, especially on social media, and those who engage in
political activities or lead their lives in ways that counter the mainstream or the status quo.
The manual “Zen and the Art of Making Tech Work For You" discusses this particular aspect of
metadata with recommendations and resources on the topic written from a gender and tech
perspective.

A project by OpenDataCity also highlights how metadata can be used to put people in danger,
often unwittingly.

“Years later, Balthasar Glättli (a Swiss politician) also wanted an analysis of his data. In the end,
he didn't just give me his telephone data, he gave me everything else that is collected by the
data retention in Switzerland. Additionally, Balthasar had a few problems, because he's also in
the Defence Committee of the National Council. In his metadata was the location of a secret
hideout that he visited. It was secret, but his phone provider collected Balthasar’s locations
and, by publishing this data, some journalists found the hideout and published it. It was too
late to remove it. It’s an interesting thing that when cellphones are tracked all the time, you
should, actually, constantly think about when to switch off your cellphone in your pocket.”

Metadata also takes centre stage in the discussion around intellectual property, especially for
artists. Some websites, like Facebook for example, strip out the metadata to minimise the size
of the files (metadata occupies file space), and to protect the privacy of the users. This was a
point of contention for people retaining intellectual property of their work. Many
photographers, for example, needed to keep the metadata in their photos, especially in this age
of mass sharing online without crediting. Here, the metadata provides a guarantee that the
https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 11/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

artist is assigned the credit they are entitled to for their work. Flickr, on the other hand, retains
and shares the metadata, and though users can deactivate this feature, many are not aware it
exists. On Flickr, a simple click on ‘show EXIF’ under the image reveals a lot of details which
the user themselves might not be aware that they are sharing publicly.

Tools

Various tools can be used to remove the metadata from files and images, and there is always
the option of tweaking the settings of the device or platform used to stop the registry of certain
metadata. But to minimise the risks, it is recommended that one always double check what
metadata is being shared (using the tools recommended in the Expose section), and then strip
away any data left there and not intended for sharing. See the section on tools for information
and description of the tools.

Relevant tools: Phil Harvey’s ExifTool / Metanull / TrashEXIF / and other softwares that
include a metadata editing feature.

Verify
Metadata can also be used to verify information and evidence by 'proving' that a certain event
took place at the time and place it was said to have taken place. In recent years, and with the
viral spread of social media videos and images, verification has proven key to political
participation, not just as a tool to prove something has happened at that time and place, but
also to refute the spread of false videos and images that can discredit movements for social
justice. In the Verification Handbook for Investigative Reporting, Christoph Koettl from
Amnesty International explains how metadata helped verify the participation of the Nigerian
army in extrajudicial killings.

We explored this in more detail in our interview with Harlo Holmes, the former technical lead
on CameraV, and with a tool review of CameraV, a mobile App that enables users to verify
photographs and videos in order for them to be able to be used as part of additional evidence
in a court of law.

“CameraV which begun its life as a mobile App named InformaCam, was created by The
Guardian Project and WITNESS. It's a way of adding a whole lot of extra metadata to a
photograph or video in order to verify its authenticity. It's a piece of software that does two
things. Firstly it describes the who, what, when, where, why and how of images and video and
secondly it establishes a chain of custody that could be pointed to in a court of law. The App
captures a lot of metadata at the time the image is shot including not only geo-location
information (which has always been standard), but corroborating data such as visible WiFi
networks, cell tower IDs and bluetooth signals from others in the area. It has additional
information such as light meter values, that can go towards corroborating a story where you
might want to tell what time of the day it is.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 12/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

All of that data is then cryptographically signed by a key that only your device is capable of
generating, encrypted to a trusted destination of your choice and sent off over proxy to a
secure repository hosted by a number of places such as Global Leaks, or even Google Drive.
Once received, the data contained within the image can be verified via a number of
fingerprinting techniques so the submitter, maintaining their anonymity if they want to, is still
uniquely identifiable to the receiver. Once ingested by a receiver, all this information can then
be indexable and searchable.”

This raises a question regarding the forging and insertion of metadata. Looking at CameraV for
instance, Harlo Holmes talks about this issue and raises an important point about the
trustworthiness of the device used:

“Technically speaking, it’s very difficult for those things to be manually forged. If someone
took the metadata bundle and changed a couple of parameters or data points - what they
ultimately send to us in order to trick us would not verify with PGP, and each instance of the
App has its own signing key. That said, I do realise that devices need to be trustworthy. This is
an issue beyond CameraV: any App that uses digital metadata and embeds it into a photograph
or video is going to have to be a trustworthy device.”

Holmes elaborates on the importance of this trust by explaining that the “verification in
CameraV works the same way as with PGP. Key parties exist because human trust is important.
CameraV easily allows you to export your public key from the App. If you give this key to
someone when they're in the room with you, and compare fingerprints, then you trust that
person's data more than if a random person just emailed you their public key unsolicited. If
organisations want to earnestly and effectively use the App in a data-gathering campaign,
some sort of human-based onboarding is necessary.”

Another useful tool for the purposes of verification is eyeWitness, a tool that allows users to
capture photos or videos through their mobile camera App “with embedded metadata showing
where and when the image was taken and verifying that the image has not been altered. The
images and accompanying verification data are encrypted and stored in a secure gallery
within the App. The user then submits this information directly from the App to a storage
database maintained by the eyeWitness organisation, creating a trusted chain of custody. The
eyeWitness storage database functions as a virtual evidence locker, safeguarding the original,
encrypted footage for future legal proceedings.”

In addition to that, the eyeWitness team includes an expert legal team who will analyse the
received images and identify the appropriate authorities, including international, regional or
national courts, in order to investigate further. In some cases, eyeWitness will bring situations
to the attention of the media or other advocacy organisations to prompt international action.

Tools

Multiple tools and workarounds can be used to verify metadata in files and images; experts
and enthusiasts are constantly coming up with new ways to verify information. It is also
important to note that verification is not always completed simply by using an App, but may in
some cases require cross-referencing the data with other sources and undertaking creative
investigative approaches.
https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 13/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Relevant tools: CameraV / eyeWitness

Working With Metadata


To better understand how to work with and around metadata, it is important to know in
practical terms what is generally meant when metadata is mentioned. Below is a list of the
metadata that may be stored along with different types of data:

Images

The location (latitude and longitude coordinates) where the photo was taken if a GPS-
enabled device, such as a smartphone, is used.
Camera settings, such as ISO speed, shutter speed, focal length, aperture, white balance,
lens type…etc. (please note that some cameras do include the location coordinates)
Make and model of the camera or smartphone.
Date and time the photo was taken.
Name of the program used to edit the photo.

PDF files

Author’s name, usually the name assigned when the program used to create the file was
first installed.
Version and name of the program used to create the file
Title of the document
Certain keywords
Date and time of file creation / last modification

Text files

Depending on the program used to create the document, the data may include:

The names of all the different authors


Lines of text and comments that have been deleted in previous versions of the document
Creation and modification dates.

Videos

Metadata in video files can be divided in two sections

Automatically generated metadata: creation date, size, format, CODECS, duration,


location.
Manually added metadata: information about the footage, text transcriptions, tags,
further information and notes to editors..etc.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 14/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Recommended reading: A thorough overview on video metadata and working with it from
WITNESS.

Audio

Audio metadata is similar to video but more widely used especially to register property of the
file. In addition to that it can include:

Creation date, size, format, CODECS, duration and a set of manually added data like tags,
artist information, art work, comments, track number on albums, genre..etc.

Communication

Metadata in communication depends on the type of communication used (i.e. email, mobile
phone, smartphone..etc). But in general it can reveal the following (if no tools to hide the
metadata are used):

Ids of the sender and the receiver


Date and time of communication
Location
Mode of communication..etc

Tools

There are various ways to extract metadata from files. The options vary according to operating
systems, from tools to plug-ins, to desktop versions or in-browser tools.

Disclaimer: When using online platforms to extract metadata, it is important to keep digital
privacy and security in mind. There is not enough information available to guarantee the
confidentiality of the process. These platform might track your online behaviour, store your
data, or share it with third parties or the authorities.

There are various ways to reveal or look at metadata-methods and tools that will be detailed
later on in the chapter. Some tools can read the metadata of the in-built file information (like
e.g. Photoshop) which means they will show the data in their format. Others have a more
detailed output.

Though metadata can be removed or altered after a file is created, it is sensible to consider
certain elements before creating the file. For example, it may be advisable to change the
settings on your phone, use a certain App, modify user details on the software used, etc. Below
are two examples of using a smartphone’s camera.

Fig. A: Photo taken with an Android phone using CyanogenMod. Does not show the geolocation
or the type of phone used.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 15/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Fig. B: Photo taken with an iPhone. Notice the extra details revealed including address, type of phone, type of camera
and program used.

Metadata Tools

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 16/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

There are various tools for viewing metadata, and the choice of tool may depend on the
objective. In addition to the softwares that include a metadata feature (like Photoshop, Adobe
Acrobat, etc.), below is list of tools to view metadata.

Disclaimer: Please note that to extract and edit metadata, some online platforms might track
your online behaviour, store your data or share it with third parties or the authorities. It is
important to keep digital privacy and security in mind. There is not enough information
available to guarantee the confidentiality of the process.

Phil Harvey’s ExifTool

Compatibility: Windows, Mac OS, and Linux

Proprietary status: Free and open source

Type: Desktop

This tool comes highly recommended, though it might require some effort to navigate since it
depends on command lines. However it is quite comprehensive in the file formats it covers and
the outcome it gives. ExifTool allows the user to read, write and edit metadata. The tool's
website provides information, downloads and workarounds.

Jeffrey's Exif Viewer

Compatibility: Online, no compatibility issues

Type: Use online through a browser

This is an online tool based on Phil Harvey’s ExifTool, with the option of uploading an image or
using the URL of an image online. It does offer a button to be added to Mozilla or Safari
allowing a short-cut for a faster extraction of metadata.

The Exifer

Compatibility: Online, no compatibility issues

Proprietary status:

Type: Use online through a browser

This is an online tool based on Phil Harvey’s ExifTool. It has direct access to DropBox, Flickr
and Google Drive. A user can log in from the Exifer website and edit their images directly from
there. Exifer has a privacy disclaimer stating that: “pictures will be temporary downloaded just
to let you edit them. The temp files will be deleted as soon as you'll refresh the home page of
this site, or automatically after 15 minutes from the download time.”

Read Exifdata

Compatibility: Online, no compatibility issues

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 17/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Type: Use online through a browser

This is an online tool that allows you to upload an image or paste in a URL. Though it allows
users to visit and use the website anonymously, it is worth taking a look at their privacy policy
published on the website which details the type of data they and other third parties collect on
the platform.

CameraV

Compatibility: Android mobile phones

Proprietary status: Free and open source

Type: Mobile phones

CameraV is a mobile App created by The Guardian Project and WITNESS. The V in the App's
name stands for verification and it was created to add a large amount of extra metadata to a
photograph or video in order to verify its authenticity. This piece of software does two things.
First it describes the who, what, when, where, why and how of images and video. Secondly, it
establishes a chain of custody that could be pointed to in a court of law.

Metadata En Masse

Compatibility: Linux, Mac OS

Proprietary status: Free and open source

Type: Desktop

Just as the title suggests, this script allows the extraction of geolocalisation metadata from a
bulk of images. It can be a valuable time-saver when processing large numbers of images. The
script written by the Exposing the Invisible team members should be placed in a file called
geobatch.rb and run in the folder with all the images in it.

TrashEXIF

Compatibility: Mac OS

Type: Mobile Phone

TrashEXIF is an iPhone App that allows users to strip all metadata from images or to control
which metadata should be removed or kept. The App also allows for presetting a protocol to be
applied to all images taken.

Removing Metadata
https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 18/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

There are various ways to remove metadata from files. Here are few suggestions taken from
the Security in-a-Box toolkit.

Pre-emptive

You can prevent a specific kind of metadata like GPS location from being captured by:

Switching off wireless and GPS location (under location services) and mobile data (this
can be found under data manager -> data delivery).

When taking a photo, make sure that the settings of the tag-location from the photo App
is off too.

Using tools like Metanull (for Windows), you can ensure that all metadata is removed
before you share it. This tool is discussed in detail below.

Note: Some files like DOCs and PDFs can hold image files within them. If you do not exercise
the necessary caution, you can scrub the metadata on the document that is holding the image,
but the metadata for the embedded image will be retained! Using Metanull before adding the
image to the DOC will remove all metadata from it beforehand.

Removing metadata from documents and other files

As noted above, other commonly used file types such as Portable Document Files (PDFs) or
word processing documents created by applications such as Microsoft Office or LibreOffice
contain metadata which may include:

the username of the person who created a document

the name of the person who most recently edited saved a document

the date when a document was created and modified.

In some cases, your document might also contain additional personally identifiable
information such as addresses, email addresses, government ID, IP addresses or unique
identifiers associated with personally identifiable information in another program on your
computer.

Some of this information is easily accessible by viewing the file properties (which can be
accessed by right-clicking the file icon and selecting properties). Other information or hidden
data requires specific software to be viewed. In any case, depending on your context, this
information might put you at risk if you are working and exchanging sensitive information.

Removing metadata from PDF files

Windows or MAC OS users can use programs such as Adobe Acrobat XI Pro (for which a trial
version is available) to remove or edit the hidden data from PDF files.

Opening any PDF file with Acrobat will allow you to edit the metadata by going to the File menu
and then selecting properties. Here, you can modify the document author’s name, title,
https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 19/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

subject, keywords and any additional metadata. You can remove information about the
creation time, modification time, type of device used for creation the file, and other hidden
data you don't see by going to the Tools menu, then Protection, and selecting Remove hidden
information.

For GNU/Linux users, PDF MOD is a free and open source tool to edit and remove metadata
from PDF files. However, it doesn't remove the creation or modification time, nor does it
remove the type of device used for creating the PDF.

Removing metadata from LibreOffice documents

In LibreOffice documents, the metadata can be viewed by selecting the File menu, then
Properties. Under the General tab, can click Reset to reset the general user data, such as total
editing time and revision number. You can also make sure that the Apply user data checkbox
on this screen is unchecked, so that the name of the creator is removed. When you are
finished, go to the Description and the Custom Properties tabs to clear any data there that you
don't want to appear. Finally, click on the Security tab and uncheck the *Record change box, if
it's not unchecked by default.

Note: If you use the Versions feature, you can delete older versions of the document which may
be stored there by going to the File menu and Versions. If you use the Changes feature, go to
the Edit menu, then Changes to accept or reject to clear the data relating to changes made to
the document at any time, if you no longer need this information.

Other strategies for scrubbing metadata

Some file types contain more metadata than others, so if you don't want to play around with
software, and the formatting of a file doesn't matter, you can change files from ones that
contain a lot of metadata (such as .DOCs and .JPEGs for example) to ones that don't (.TXTs and
.PNGs for example)

Avoid using your real name, address, company or organisation name when registering copies
of software such as Microsoft Office, Open Office, Libre Office, Adobe Acrobat and others. If you
must give a name or address, use a fake one.

Additional Resources
Metadata Investigation: Inside Hacking Team by Share Lab
Verification Handbook for Investigative Reporting, in particular chapter 7
Making Data Speak, Smari McCarthy's Exposing the Invisible interview on metadata

Header image created by John Bumstead

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 20/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

More about this topic

Access to information Safety Verification Data Mapping Metadata Research

Exposing the Invisible - The Kit


The Kit is a collaborative, self-learning resource that makes investigative techniques
and tools...

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 21/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Access to information Metadata Research Data

Digital Resources for Citizen Investigators: Our recommendations


Many of us currently live and work in a challenging physical isolation imposed on us
by the...

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 22/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Access to information

Harlo Holmes: Metadata or it Didn't Happen


An interview with digital security trainer Harlo Holmes about CameraV, e-evidence
used in courts,...

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 23/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Safety

Technology Is Stupid: How to choose tech for remote working


This article addresses questions about which tech is good, safe and appropriate to use
in these...

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 24/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Safety

Tails
The operating system designed to preserve your privacy and anonymity.

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 25/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Safety

GPG Encryption for Your Emails


A tool allowing you to encrypt the content of your emails if the person you write to
also uses it.

Safety Research

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 26/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Tor Browser
A tool to browse the web securely.

A product of

Tactical Tech

Brunnenstraße 9,
10119 Berlin, Germany

www.tacticaltech.org
Data Use Policy
[email protected]

Connect

Newsletter

Twitter

Facebook

Mastodon

Support us

RSS feed

Visit our other projects

Exposing the Invisible The Data Detox Kit

Data & Politics The Glass Room

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 27/28
2/9/23, 4:02 AM Behind the Data: Investigating metadata

Previous projects

XYZ

Holistic Security

Gender & Tech

Me and My Shadow

Visualising Advocacy

Security in a Box

https://exposingtheinvisible.org/en/guides/behind-the-data-metadata-investigations/ 28/28

You might also like