0% found this document useful (0 votes)
21 views

UAE Smart Data Framework EN - Part 2 Implementation Guide

The UAE Smart Data Implementation Guide provides a structured approach for government entities to manage data effectively, emphasizing the establishment of data governance roles, a Smart Data Roadmap, and a Data Inventory. It outlines ten guiding principles for smart data management, including data as an asset, sharing and re-use, and maintaining data quality. The guide includes five guidance notes detailing processes for governance, roadmap development, inventory creation, prioritization, and data conformance to ensure compliance with the UAE Smart Data Framework.

Uploaded by

manasi.ravindra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

UAE Smart Data Framework EN - Part 2 Implementation Guide

The UAE Smart Data Implementation Guide provides a structured approach for government entities to manage data effectively, emphasizing the establishment of data governance roles, a Smart Data Roadmap, and a Data Inventory. It outlines ten guiding principles for smart data management, including data as an asset, sharing and re-use, and maintaining data quality. The guide includes five guidance notes detailing processes for governance, roadmap development, inventory creation, prioritization, and data conformance to ensure compliance with the UAE Smart Data Framework.

Uploaded by

manasi.ravindra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 86

The UAE Smart Data Framework

Part 2: Smart Data Implementation


Guide
For Government Entities
Version 2.0

Document Date: March 2019

1
TABLE OF CONTENTS
TABLE OF CONTENTS.........................................................................................................................

Introduction.....................................................................................................................................

Context.......................................................................................4

Overview....................................................................................4

Guidance Note 1: Establishing data governance roles and processes.................................................

Overview....................................................................................6

Recommended process for establishing data governance..............6

Role Descriptions......................................................................10

Guidance Note 2: Building a Smart Data Roadmap..........................................................................15

Overview...................................................................................15

Recommended process for developing a Smart Data Roadmap.....15

Scope and content of an effective Smart Data Roadmap..............16

Guidance Note 3: Developing a Data Inventory...............................................................................26

Overview...................................................................................26

Types of data............................................................................26

Recommended process for developing the initial Inventory..........27

Annual review – expanding the inventory....................................31

Guidance Note 4: Prioritization criteria and process........................................................................32

Overview...................................................................................32

Recommended process for prioritizing the inventory...................32

Recommended criteria for assessing priority...............................34

Guidance Note 5: Data conformance process..................................................................................37

Overview of recommended process to ensure data conforms to


standards..................................................................................37

5.1 Classifying data................................................................39

5.2 Formatting data................................................................52

5.3 Documenting a permissions model for shared data.............57

2
5.4 Adding metadata and schema............................................63

5.5 Managing data quality.......................................................69

5.6 Validation and publication of data.....................................76

Appendix A: UAE Federal Open data license....................................................................................81

Appendix B: Data Quality Maturity Matrix – assessment tool..........................................................86

3
INTRODUCTION
Context
This Smart Data Implementation Guide forms part of the UAE’s Smart Data Framework, as illustrated
below.

The Smart Data Framework outlines a common basis for each UAE Government Entity to develop its
own approach for managing data, in ways that provide maximum flexibility for the Entity to respond
to their own business needs yet which also enable a common approach to data classification,
exchange of data, and data quality.

This Smart Data Implementation Guide, structured in a series of five Guidance Notes, provides
guidance, best practice and recommended processes for Government Entities to follow to ensure
they meet the requirements set out in the Principles and Standards of the Framework.

Overview
The diagram on the following page illustrates a typical process that an Entity might go through,
supported by the five Guidance Notes, to implement the Smart Data Framework in a phased process
over time:
1. Establish the Entity’s data governance roles and processes
2. Build a roadmap to set out the key data management and change management actions that will
need to be taken across the Entity
3. Map out key datasets within an Entity-wide Data Inventory (if that does not already exist)
4. Prioritize which datasets need action first in terms of applying the core Data Standards
5. Implement the Data Standards conformance process through a series of ‘sprints’ through which
datasets are aligned with the Standards in a phased and prioritized process over time.

4
This process is a recommended not mandatory one. An individual Entity may decide to follow a
different approach in some areas, provided that this still results in alignment with the UAE Smart
Data Principles and conformance with the UAE Smart Data Standards.

5
GUIDANCE NOTE 1: ESTABLISHING DATA GOVERNANCE ROLES AND
PROCESSES
Purpose This Guidance Note provides Entities with guidance on governance roles and
processes to support implementation of the UAE Smart Data Framework.
When to use At the outset of each Entity’s Smart Data program
Responsibility Entity Management Board

Overview
This Guidance Note recommends good practice on data governance roles and processes, as a guide
for Entities then to tailor to their specific needs. It provides guidance in turn on:
 A recommended process for establishing key data governance roles, identifying suitable
candidates and growing expertise over time
 Sample job descriptions with responsibilities and skills recommended the key roles.

Recommended process for establishing data governance


The diagram below summarises the process that Government Entities are recommended to follow
when establishing governance for their Smart Data program. This is followed by further detail on
each step.

1 2
Director General
and Management Review and commit to UAE Smart
Set up initial4central Data Team 7
Board of Federal Data Principles at board level
Government Entity

Continuous
Establish broader improvement
governance 3 4
Director of Data
roles and Identify suitable
processes Establish governance
numbers of Data
relationships and
Custodians and Data
processes
Specialists
Data Management
Officer
5 6
Facilitate ongoing
Capacity building and consultation,
training workshops and
Data Custodians reviews
and Specialists

6
1. Review and commit to the UAE Smart Data Principles at board level
The UAE Smart Data Framework is rooted in a set of guiding principles, which are summarized
below and described in more detail in Part 1 of the Smart Data Framework: Principles and
Standards. The principles for smart data that every Entity should embed within its own governance
systems and business processes cover the following topics.

UAE Smart Data principles: summary

1. Data as an asset: In order to enable service-oriented government, support evidence-based


decision-making, and promote transparency and citizen engagement, Entities should manage
all their data as a collective national asset, acting as custodians of that data on behalf of the
United Arab Emirates.
2. Sharing and re-use of data: In order to enhance the quality of government services, Entities
should collaborate closely and efficiently to maximize the sharing and of re-use United Arab
Emirates data.
3. Duplication of data: In order to improve customer-centric government services, Entities
should collaborate to avoid duplication and inconsistencies in their data, employing the
concept of a ‘single source of truth.’
4. Open Data publication: In order to provide greater access to information for all users across
the United Arab Emirates, Entities should publish non-personal data openly whenever
possible.
5. Privacy, Confidentiality, and Intellectual Property Rights: In order to secure the broad social
benefits of data exchange while respecting the rights of individuals and organizations, Entities
should protect the privacy of individuals, the confidentiality of organizations, and the legal
rights of intellectual property holders at all times.
6. Open standards: In order to empower government service automation through the sharing
and re-use of data, Entities should utilize open standards to make it easy for others to
discover, interoperate with, and consume their data as a service. This applies to all data, not
just Open Data – because the most efficient way of sharing confidential and sensitive data
between Entities is to make it publishable per open standards.
7. Data quality: In order to enable the efficient and effective delivery of customer-centric
services, improve the accuracy of evidence-based decision-making, and build confidence in
both, Entities should manage and improve data quality over time.
8. Data insights: In order to improve the effectiveness of services and policy as close to moment
of decision and action as possible, Entities should maximize the insights derived from data by
facilitating the collection, analysis, and use of real time or near real time data – both their own
and that collected by others.
9. Collaborative governance: In order to promote greater cross-organizational collaboration and
efficiency, Entities should participate in UAE-wide shared services and collaborative
governance mechanisms for smart data.
10. Continuous improvement: In order to ensure full implementation of the Smart Data Principles
and support standardization of processes, Entities should continually adopt improvements
and manage change over a sustained period of time, focused on creating an open, data-driven
and data-sharing culture.

A key initial starting point should be for the top management team of each Entity to review and sign
up to these principles at Board level, and to identify a senior, empowered member of the Board to
be accountable for leading the Entity’s work to implement these principles.

7
2. Set up initial Smart Data team
It is the responsibility of each Government Entity to decide how best it will operationalize the
principles and standards of the UAE Smart Data Framework within the Entity, and this includes the
choice of data governance roles. The right approach to staffing will vary from Entity to Entity,
dependent on current levels of maturity of data management within the Entity, on the scale of the
Entity’s operations and on how important data is to the functions of the Entity.
However, it is recommended that each Entity establish at least the following roles or their
equivalent:
 Director of Data (DD): a senior and empowered staff member, who will lead the Entity’s
Data program, champion and promote data management processes and effective data
publication and exchange and ensure strategic goals are realised. Ideally, the Director of
Data should be a member of the Entity’s management board; as a minimum, they should be
a senior and empowered individual with an ability to rapidly escalate key risks and issues for
resolution at the highest levels in the Entity. For smaller Entities this role might be
performed on a part-time basis, for example by an existing member of staff but with
additional assigned responsibilities.
 Data Management Officer (DMO): to report and deputy for the DD and lead on the
operational work managing and coordinating required change management, processes and
coordination to ensure conformance with Smart Data Framework standards. This is an
important and full time role and requires the person responsible to spend a significant part
of their time on smart data standard conformance.
 A suitable number of Data Custodians and Data Specialists: to act as business and technical
owners of key datasets and data sources within the Entity. They will understand the
contents and business value of the data, how it was collected and processes and the
accuracy and quality of the data. These could be existing data owners and IT staff with new
responsibilities.

Given that each Entity is different and will have varying existing data infrastructure and processes –
the specific setup must be at the discretion and judgement of the Entity itself. The Entity can choose
to create new positions for each of these or add the titles and responsibilities to existing roles.

The important thing is that the business functions and responsibilities of these roles are carried out
by suitably informed and skilled staff.

We recommend that the appointment of a Director of Data and Data Management Officer are the
first positions filled. The people who fill these roles require a background in data management,
government operations and digital technology – they will lead the Entity’s smart data program and
ensure the Entity’s data is published and exchanged in line with the requirements of the Smart Data
Framework. A background or knowledge of open data is an advantage but not essential. Detailed
job descriptions are given at the end of this Guidance Note.

3. Identify suitable numbers of Data Custodians and Specialists


The DD and DMO should identify or hire Data Custodians and Data Specialists for each core data
source and/or data business unit. The DMO should set up a reporting and communication system for
ensuring that work and processes for meeting the Smart Data Framework requirements can be met

8
across the business units by suitable and qualified personnel (individuals who know about parts of
the Entity’s data systems). Depending on the size of the Entity, there could be many Custodians that
report to a specific department or business unit Data Custodian who in turn reports to the Data
Management Officer.

4. Establish governance relationships and processes


Entities should establish clear governance processes in which all relevant people and teams are clear
on who is responsible, accountable, informed and consulted on all work the Entity carries out to
achieve Smart Data objectives. This should be agreed between the Director of Data and Data
Management Officer.
Entities may wish to consider developing a formal RACI matrix to summarise these processes, setting
out for each one:
 Who is Responsible for managing the process
 Who is Accountable for the business results delivered by the process
 Who should be Consulted on the process
 Who should be kept Informed about it.

5. Capacity Building
Custodians and Specialists should receive training and guidance to help them understand their role
and responsibilities. They should read the Standards and Implementation Guide of the Smart Data
Framework. We also recommend discussing how best to implement the Smart Data Principles with
key representatives of each of the Entity’s business functions. Facilitating such internal discussion is
a key role for the Data Management Officer.

6. Organise and facilitate regular consultations, workshops and reviews


In order to facilitate learning and change management it may be useful to have regular workshops
for data-related roles in the Entity to report on progress and highlight learnings, challenges and
tactics so these can be acted upon and shared across the Entity. The DMO could organize
consultations on the Implementation Guide so that the Data Custodian teams could decide which
parts to use and which to amend and how the various processes should be adopted across the Entity
consistently.

7. Continuous improvement
As the Entity’s data maturity and business processes develop and as the Custodians, Specialists and
Data Management Officer run through multiple sprints of formatting and cataloguing data to ensure
their conformance with the Smart Data Standards, the Entity should continue to refine the roles and
responsibilities of its data staff. The Director of Data should keep the effectiveness of governance
arrangements under review, agreeing changes with the Entity’s Management Board as required.

9
Role Descriptions
The tables below give more detail on the key recommended data management roles for a typical
Government Entity, setting out in turn:
 An overview of the role profile
 The key responsibilities that the role needs to manage
 The skills and competencies needed to do this effectively

Title Director of Data - Reports to Director General of Entity

Role profile The Director of Data is the senior champion and leader for data within the Entity. They
are responsible both for communicating the social, economic and business benefits of
open data and data exchange as well as ensuring the Entity’s conformance with the
Smart Data Framework standards.
They should be responsible for the development of the data strategy and policies that
are relevant to the government entity and supervise the execution and the
implementation of initiatives that contribute to the management of data efficiently and
work on the data exchange between government entities in a safe, secure and reliable
way to develop methods for service delivery and utilization and make it available as
open data to induce innovation.
The role should be fulfilled by a senior employee, with the necessary influence and
authority within the Entity to be effective in the role. This person will also be outward
facing, collaborating and communicating with external stakeholders. The Director of
Data will be the senior point of contact between the Entity and the Federal Data
Management Office, responsible for communications, coordination and escalation.

Key Leadership
Responsibilities
 Overseeing the development of the Entity’s implementation plan and roadmap
for meeting the Smart Data Standard requirements, and directing delivery of
that plan
 Leading a program of cultural change within the Entity aimed at embedding the
Smart Data Principles within the Entity, promoting the new ways of working
and championing the benefits of higher data quality and data exchange
 Performing public outreach and presentations to increase the strategic use of
the Entity’s datasets
 Leading the Entity’s open data initiative
 Leading the Entity’s work on benefit realization: ensuring that the benefits of
open and shared data are maximized, through high levels of adoption and
utilization of the data to improve services and decision-making.
Governance
 Putting in place the necessary roles (with the appropriate skills) within their
Entity, as outlined in this document
 Providing regular reports and conformance information as requested to the
Federal Data Management Office
 Improve collection, usage and exchange of data.
Conformance
 Ensuring that the Entity's data is consistent with the applicable laws and
policies of data in the UAE, and meets the mandatory requirements of the
Smart Data Framework
 Reviewing classified and catalogued datasets to check they are conformant

10
with the Smart Data Framework standards and approving them for publication
and exchange with other Entities
 Reviewing data quality reports and statistics
 Ensuring timely and effective response to queries and feedback from the public
in relation to the Entity's Open Datasets
 Investigating any complaints made in relation to the Entity's Datasets by the
Federal Data Management Office, a data user or a member of the public.

Skills and  Ability to communicate effectively, including ability to explain technical content
competencies to non-technical audiences
 Ability to collaborate and network with subject matter experts, organizations,
and individuals to provide effective enterprise data management
 Experience in technology and data, developing data strategies and overseeing
the improvement of data quality and data exchange
 Ability to formulate and set goals
 Proven ability to lead cross-functional teams at all organizational levels in
dealing with complex issues

Title Data Management Officer - Reports to Director of Data

Role Profile This role is the delivery and operational lead for the Entity’s data program, reporting to
the Director of Data. They will need to deliver and manage much of the work to ensure
the Entity is conformant with Smart Data Framework standards. Their role involves
ensuring the right staff are selected as Data Custodians and Data Specialists and
directing, supporting and reviewing their work on Inventories, Prioritization,
Classification and Data Conformance.
They are responsible for ensuring the readiness, reliability and security of the Entity’s
data and accuracy of the metadata, its availability, accessibility and use in a timely
manner to support the operations of the Entity and guide the analysis of data for
decision-making.

Key Leadership
Responsibilities
 Effective implementation and oversight of all the data management initiatives
and processes needed to deliver on the requirements of the Smart Data
Framework in the Entity
 Providing support and advice to the Data Custodians within their business unit
when classifying data within the scope of their management and assessment
of risks associated with disclosure or exchange
 Supporting the Director of Data to ensure that the benefits of open and shared
data are maximised and participating in the development of the Entity’s data
roadmap
 Provide mentorship and professional development of staff
Governance
 Determining priorities with respect to Open Data publication or Shared Data
exchange.
 Co-ordinating the work of the business unit as it prepares its open and shared
data for publication and exchange
 Support resolution of any issues and problems in data or conformance to the
data standards

11
Conformance
 Preparing regular reports and conformance information as requested to the
DD and Federal Data Management Office
 Administering the process of inventorying, prioritising, cataloguing and
classifying the Entity’s data
 Cascading knowledge about classification principles and procedures to
required roles within their Entity
 Responsible for consistency and quality of data is fit-for-purpose across Entity
 Reduce data duplication across the Entity

Skills and  Ability to coordinate and manage the work of a large and diverse team
competencies  Ability to collaborate with senior management of Business Units, functional
organizations and individuals to provide effective enterprise data management
 Ability to provide data object domain insight and direction to Data Custodians
 Experience in data management and data processes in all its aspects
 Displays understanding of all business processes dependent on data in their
object domain
 Proven ability to create presentations and effectively present to management

Title Data Custodian - Reports to Data Management Officer or Senior Data


Custodian

Role Profile There should be a Data Custodian per dataset or database or data-generating function
within the Entity. This person needs to understand the value and risks associated with
their data so that they can effectively prioritise, classify and catalogue it. They will be
responsible for determining whether the data should be Open or Shared and setting out
the access rules.
Generally, this is a role within a business unit (data generator) who has a business
responsibility (not a technical or a legal one) for ensuring that the data is used
effectively to meet both the business needs of the department and the wider goals of
the Smart Data program. The Data Custodian does not necessarily need to be the
creator or the primary user of the dataset but should understand its value to the Entity.

Key Leadership
Responsibilities
 The management of the assigned data including inventorying, prioritizing and
describing datasets
 Recommending changes to data management policy and procedures, data
quality and the implementation of UAE Data Standards.
 Understanding and promotion of the value of data for Entity-wide purposes
and facilitation of data sharing and integration.
Governance
 Ensuring the quality, completeness and up-to-date of their data
 Working in collaboration with the Data Management Officer to determine
priorities and associated risks of making data accessible by third parties
 Engaging with the external developer community to determine how
enhancements to the data set could facilitate greater levels of re-use.
Conformance
 The collection and updating of the assigned data
 Management of any third party use of the data in accordance with UAE policies
and processes
 Advising and reporting on data management issues

12
 Suggesting the terms and conditions upon which Shared Data should be made
available.

Skills and  Collaboration skills within the business and Data Management Officer to help
competencies provide effective solutions to data issues and problems
 Displays mastery of the portions of business processes executed by their
business area
 Proficiency with MS Office, basic and some more advanced data analysis and
process control methods/techniques
 Understands the fundamentals of data bases and data structures (tables,
hierarchical structures, flat files, etc.)
 Demonstrates understanding of all the functions performed by their business
area
 Displays familiarity with systems used within/by their business area

Title Data Specialist - Reports to Data Management Officer or Senior Data


Custodian

Role Profile This is a role with technical responsibility over data, and in particular with responsibility
for preparing data for publication as open data or for exchange as shared data. Probably
based within IT or database administrator teams, Data Specialists will need to facilitate
between the Information Technology, Information Security and business teams and
ensure the data they are responsible meets the format, schema and quality
requirements in the Smart Data Framework standards.
They should also be able to provide support to the Entity for cross-business definition of
data standards, rules, and hierarchy and refinement of data processes in accordance
with defined standards.

Key Leadership
Responsibilities
 Assists with the resolution of data integration issues as requested by the Data
Custodian
 Assists the Data Custodian in the definition of data requirements and data rules
 Supports projects and initiatives in development and refinement of data
processes and metrics in accordance as requested by Data Custodian
Governance
 Supports definition, approval and execution of the Data Quality program
 Understands the Information Technology Landscape and has the ability to
identify what data is stored in what systems
 Supports efforts to provide data awareness education for senior and upper
management
 Works with the Data Custodian in the identification of root causes of major
data problems and supports the implementation of sustainable solutions
 Resolves routine data problems
Conformance
 Assists the Business Data Custodian with data problem resolution when
requested
 Reviews data deletion and archiving requests for data in their span of
responsibility and forwards to approver with appropriate recommendations

13
Skills and  Displays mastery of fundamentals of problem solving and basic data quality
competencies analysis
 Displays understanding of the portions of business processes executed by their
business area
 Strong understanding of the Systems Development Life Cycle and
methodologies, and familiarity with process improvement frameworks.
 Proficiency with MS Office, basic data analysis and process control
methods/techniques
 Proven ability to work well and contribute to cross-functional teams
 Proven ability to present to peers and supervisors
 Displays familiarity with systems used within/by their Entity
 Displays familiarity with functions performed by their business area

14
GUIDANCE NOTE 2: BUILDING A SMART DATA ROADMAP
Purpose This Guidance Note provides a recommended process, templates and
supporting guidance for each Government Entity to build its own Roadmap for
implementing the UAE Smart Data Framework. An Entity-level Roadmap that
follows this guidance will, if effectively managed, ensure that the Entity:
 Achieves significant business benefits from shared and open data
 Converges its data management practices over time to conform with
the UAE Smart Data Principles and Smart Data Standards
When to use At the start of each Entity’s Smart Data program.
Responsibility Director of Data, with close involvement and Roadmap sign-off by the Entity’s
Management Board.

Overview
A single, undifferentiated approach for implementing smart data across all Government Entities will
not work. There are a number of factors that will influence the content of an Entity’s Roadmap, such
as the type and size of Entity, the complexity of its delivery ecosystem, and the Entity’s level of
maturity in current data sharing practices. The advice in this Guidance Note is therefore not
mandatory and Government Entities should tailor it to their own business requirements.

The Guidance Note recommends good practices on:


 The process that Entities should follow in developing a Smart Data Roadmap
 The scope and content of an effective Smart Data Roadmap, with a recommended template
setting out:
- The purpose of that section
- Issues to address
- Actions that the Entity should take to inform development and documentation of
that Section
- Resources that are available to support the Entity in developing this part of the
Roadmap.

Recommended process for developing a Smart Data Roadmap


The diagram below summarizes the process that the Entity should follow for developing its
Roadmap, illustrating which of the key data governance roles will normally have lead responsibility
for each step of the process.

15
Relationship Use Guidance Note 1
with other parts to establish / recruit Use Guidance Notes 3-5
4
of Smart Data initial key data to inform Roadmap development
Toolkit governance roles

1 3 4 8
Agree resources for Agree Agree
Present Roadmap v1 to key
Roadmap and business Roadmap v1 with Roadmap v2
Director of Data external stakeholders; use
priorities with the Entity’s Management with Management
feedback to inform Roadmap v2
Management Board Board Board

2 5 7
Data Coordinate initial Lead work to review and
Lead work across the Entity
Management implementation of update Roadmap in light of
Officer to develop Roadmap v1 Roadmap actions feedback

6
Data Custodians Work with the DMO and central data team to Implement Roadmap actions in relation to individual
and Specialists ensure Roadmap is aligned with business needs datasets, and feedback learning to DMO to inform
improvements to Roadmap

Throughout this process, you should seek to take an approach which is:
 Iterative and collaborative: You should not develop the Roadmap in isolation or see this as a
one-off exercise. Once an initial Roadmap has been developed, you will want to:
- Share it with other key stakeholders: the Federal Data Management Office, other
Entities addressing similar customer groups, other Entities using your data or
supplying you with data and so on
- Improve it in the light of implementation experience within the Entity.
 The process diagram above summarizes this collaborative process in terms of producing a
first version of the Roadmap and then a second. In practice, the Entity will want to keep the
Roadmap updated on an ongoing basis.
 User-focused: the Roadmap should be user-centric, i.e. it should identify and address the
needs of key internal and external data users and should allow for regular engagement with
them
 Practical: the Roadmap should be achievable within the timeframe, supported with
adequate resources to deliver it and appropriate project management disciplines to ensure
high quality and timely delivery
 Phased: the Roadmap should be developed to be delivered in a phased manner, ensuring
that work is closely informed by Guidance Note 4: Prioritization Criteria and process.

Scope and content of an effective Smart Data Roadmap


Overview
The table below sets out the key elements that should be covered in each Entity’s Roadmap. It is not
obligatory to follow this precise structure, and you may wish to add other elements to the Roadmap.
However, it is important to ensure that – whatever structure is used - all the elements shown below
are covered.

16
Recommended Overview of each section
structure

1. Objectives  Sets out the scope and purpose of the Roadmap, and describes the Entity’s
vision for how it will manage its data in future to align with UAE Smart Data
Principles.
2. Gap analysis  Highlights the key areas in which current ways of working within the Entity are
not currently aligned with the future vision
3. Governance  Describes key governance roles and processes for managing implementation of
the Roadmap.
4. Delivery plan  Describes work streams in the Roadmap, mapping out key milestones,
deliverables, and dependencies.
5. Risks  Sets out the key risks associated with the Roadmap, their likely impact and the
proposed mitigation strategies.
6. Impact  Sets out the key benefits that the Entity seeks to deliver through
measurement implementation of its Smart Data Roadmap:
− How will success be measured and when?
− What will success look like?
− How will learnings be incorporated?

The more detailed tables below provide guidance on the scope and purpose of each of these six
recommended sections of the Smart Data Roadmap, issues to address within it, the actions you
should take to inform its development, and the resources that are available to help you.

Section 1: Objectives
Scope and purpose of the section

In order for an Entity to comply with the UAE Smart Data Framework, it needs to establish a new operating
model for its data management, ensuring that all data is managed as a cross-government asset using open
standards. This means each Entity should consider and fully plan the changes that will need to be made
internally to drive the transformations that are needed.
This introductory section should set the scene for each Entity, so that they can describe:
 The changes that will be delivered, in terms that resonate with the Entity’s own internal and
external stakeholders.
 The benefits that the Entity will achieve through delivery of the plan, and ultimately how these will
contribute towards the wider goals of the UAE Smart Government strategy.

Issues to address
When developing this section of the Roadmap, you will need to consider:
 Why does the plan exist and what is it trying to achieve?
 What are the required behaviours/actions that need to be taken?
 Which areas of the Entity’s activities does this Roadmap cover?1
 What is the Entity’s future vision for how it will manage its data?
Actions you should take to inform development of this section
All team members involved in developing the Smart Data Roadmap should have a clear understanding of
the Smart Data Principles, and of the standards and guidance that exists to support them. So before they
embark upon developing the Entity-level Roadmap they should read all the UAE Smart Data Framework
documents.
1
In general, the Roadmap should cover the whole organization. But there may be cases where it is sensible either to exclude some
elements of the organization, or to cover activities that fall outside the organizational boundary of the Entity, but which nevertheless are
most sensibly covered within the Entity’s Roadmap.

17
And it is vital that the Management Board of the Entity understands the key principles, the degree of
change that is involved, and understands and commits to the Roadmap process. While it will not be
necessary for all top managers in the Entity to read the full Smart Data Framework, it is important that they
review the Smart Data Principles and give a steer on how they see these best being implemented within the
organisation.
It may be worthwhile for your immediate data management team to draft this section, review it with the
Management Board, and then to circulate it to the wider stakeholder community to validate whether this
resonates with them.
Resources to help you
There are a number of key documents that will help the Entity to prepare this section of the Roadmap:
 UAE Smart Data Framework: overview and principles: this sets out the key purpose and principles
of the UAE smart data initiative, and provides an easy to assimilate guide to the business changes
that are required of Government Entities
 The UAE Smart Data Standards: this sets out the minimum mandatory standards that Government
Entities should deliver with their data
 Guidance Notes 3 – 5 of this Smart Data Implementation Guide give detailed ‘how to’ guides on
implementation of the standards. Use of this guidance is not mandatory, but it provides a good
starting point for thinking through the Entity’s own approach.

Section 2: Gap analysis


Scope and purpose of the section
To identify the key areas in which current ways of working within the Entity are not currently aligned with
the future vision.

Issues to address
You should make clear the scale of the changes that will be required – informed by real evidence of gaps
and challenges at two levels:
 The organisational level: to what extent does the Entity have the governance, culture, processes and
infrastructure needed to manage and reap benefits from smart data?
 The dataset level: to what extent are datasets in the Entity currently already aligned with the best
practices set out in the UAE Smart Data Standards?

Actions you should take to inform development of this section


You should:
 Undertake a self-assessment of organisational data capabilities
 Identify a sample of key datasets used by the Entity, and audit them against the seven Data Quality
principles described in [DQ1] Data Quality Principles.
Resources to help you

Appendix B provides a Data Quality Maturity Matrix for auditing quality of a dataset against 5 levels of
maturity – of which Level 3 represents full conformance with all mandatory requirements of the Smart Data
Standards. An organisational self-assessment tool is also available.

Section 3: Governance
Scope and purpose of the section
This section should cover:
‒ Governance model: describing functions, roles and accountabilities – both for ensuring successful

18
delivery of activities and milestones in the Roadmap and also for realisation of the targeted benefits
– and the processes within which these will operate.
‒ Resourcing: staff, financial and other resources that will be deployed in delivering the Roadmap.

Issues to address
When developing this section of the Roadmap, you will need to consider a number of points:
‒ Do we have all roles filled across our data team?
‒ If not, how are we proposing to plug the gaps in the short term?
‒ Do these resources have all the skills and knowledge that they need to carry out their roles
successfully?
‒ What is our own RACI (Responsible, Accountable, Consulted, and Informed) for all data
management processes?
‒ How will we manage our Entity’s involvement in the wider governance for UAE Smart Data?

Actions you should take to inform development of this section


You will need to have a clear data team structure in place, who are appropriately linked to the Entity-level
governance mechanisms.

You will also need to involve the ‘data practitioners’ within your Entity to map out workflow and processes
for data conformance, based on the guidance given in the Smart Data Implementation Guide for
Government Entities.

Resources to help you


Guidance Note 1: Governance roles and processes gives advice on how to establish governance to develop
and deliver a Smart Data Roadmap for a Government Entity.

Section 4: Delivery plan


Scope and purpose of the section

This section should set out:


 The work streams and activities that will be taken forward by the Entity to deliver smart data
 The deliverables that will result from this work, with milestones
 A Gantt plan, providing a graphical illustration of the work streams, activities, and tasks that the Entity
will manage, with key milestones mapped to a delivery timeline, showing how each contribute to one
or more deliverables.
 Key dependencies that the Entity needs to manage, including with other Government Entities and with
private-sector delivery partners.

Issues to address

When developing this section of the Roadmap, you will need to consider a number of points:

 Are we clear on what deliverables we need to deliver?


 Will our resources change over time, and how will this impact our approach?
 Are we starting from scratch, or do we need to take into consideration some existing progress on smart
data in some areas of the Entity?
 Do the suggested work streams (see below) cover all the areas that are relevant to our Entity?
 As we map out specific tasks under each work stream,
‒ Does each of the tasks have a clear owner?
‒ Do we have enough resources to be able to deliver on this plan?
‒ Are the timescales for each task/activity realistic and achievable?
‒ Are we clear on which milestones are cascaded to us by the overall program, and therefore the
dates must remain static?

19
‒ Is the sequencing logical and aligned with the data preparation and management processes?
‒ Are there any other Entity-specific limitations that may prevent us from being successful in
delivering this plan?
A high-level view of the work streams in a typical Delivery Plan is shown on the following page. This is
divided into four main phases:
1. Initiation: when the initial planning is undertaken and governance systems established
2. Inventorying and prioritization: when the Entity pulls together an inventory of the key data assets it
manages, and prioritises which should be addressed first
3. Data conformance: when prioritized datasets are taken through a systematic process to ensure they
meet the mandatory requirements of the UAE Data Standards for Classification, Quality and Exchange,
in a series of ‘data sprints’.
4. Continuous improvement: when the Entity drives forward longer-term improvements (beyond the
mandatory minimum in the UAE Data Standards) – improvements both to data quality and to the level
of re-use of the Entity’s data by data users across the public and private sectors.

20
Set up
data
team Initiation
Initiation
process and
planning
Develop
Inventorying initial Data
Inventory
and Prioritize
prioritization datasets

Classify Classify
data data
Apply Formats, Apply Formats,
Sprint 2, 3
Permissions, Permissions,
Metadata and Metadata and etc
Data Schema
Schema Quality Audit and
complianc Quality Audit and
Improvement Plan Improvement Plan
e Publish Publish
open open
data data
Exchang Exchang
Sprint 1 e shared
e shared
data data

Extending and enriching our open and shared data

Change management and improving data skills


Continuous
Collaboration with other Government Entities to
improvement drive service improvement with Shared Data

Market engagement with Private Sector

Benefit realization

21
Actions you should take to inform development of this section
In order to develop a realistic and achievable plan, you will need to:
 Involve the ‘data practitioners’ within your Entity. They will need to have a clear understanding of the
processes that will need to be followed, using the training materials and policy products for preparation
and cataloguing.
 Undertake sample Data Quality Audits using the Data Quality Maturity Matrix provided in Appendix B,
to get an early sense of the current state of data quality across the Entity and key areas where
improvement will commonly be needed.
 Engage with current users of any data that you currently publish as open data or share with other
Entities, to help understand user priorities.
Resources to help you
Guidance Note 1: Data governance roles and processes gives advice on steps to take during the Project
Initiation phase of the Roadmap.

Guidance Note 3: Developing a Data Inventory and Guidance Note 4: Prioritization criteria and process
give guidance on the steps to take during the Inventorization and Prioritization phase of the Roadmap.

Guidance Note 5: Data Conformance Process gives guidance on the steps to take during the Data
Conformance phase of the Roadmap.

Section 5: Risk
Scope and purpose of the section
This section should do two things:
1. set out the current list of key risks associated with delivering the Roadmap, including their likely
impact and proposed mitigation strategies
2. Set out the process that the Entity will follow to raise and manage delivery risks.
Issues to address
When developing this section of the Roadmap, you will need to consider:
‒ Ownership: who is the most suitable owner for each risk, responsible for driving forward the
agreed mitigating action?
‒ Governance: what is the most appropriate escalation route for any high impact risks that are not
being managed satisfactorily?
‒ Coordination: who within our Entity will coordinate the whole risk management process, to
regularly review the register is being managed effectively?
‒ Tools: how will we manage reviewing and updating our risk register, so that the latest version is
visible to all relevant team members and they can contribute updates quickly and easily?

Actions you should take to inform development of this section


We recommend that you review the resources and guidance on risk described below in ‘Resources to help
you’ with all members of the data management team. Once you have a draft risk register in place, this
should be reviewed with the Entity Management Board – and also with the Federal Data Management
Office in order to facilitate joint working across government to mitigate risks that are common to multiple
Entities.

Resources to help you


The global standard on ICT-enabled, data-driven service transformation (‘The Transformation Government
Framework’, or TGF2) identifies nine Critical Success Factors. These nine CSFs provide an evidence-based
2
Refer to V2 of the standard published by international open standards consortium OASIS in 2014.

22
framework, developed following international best practice research and consultation, into the key reasons
why ICT-enabled change programs such as UAE Smart Data are mostly likely to fail. We recommend that all
Entities review what risks they face in relation to each of these nine categories, using the checklist tool in
the TGF standard.
Strategic clarity Leadership User focus

Collaborative Supplier
Skills
engagement partnership

Achievable delivery Future-proofing Benefit realization

Section 6: Impact measurement


Scope and purpose of the section
This section should set out:
 The key benefits that the Entity seeks to deliver through implementation of its Smart Data
Roadmap
 How and when these will be measured by the Entity.

Issues to address
When developing this section of the Roadmap, you will need to consider several issues:

 What will success look like for our Entity in its use of data?
 How will success be measured and when?
 Who are the likely owners of each of the Entity-level benefits?
 How will learnings be incorporated?
 What systems and tools can be put in place to monitor the ongoing delivery of benefits?
 What quick wins can we deliver early in the program that will start the ball rolling?
 What are the longer term benefits that we are seeking to achieve, and how do we sustain and
embed the business changes required to achieve the desired impacts?

Actions you should take to inform development of this section


Your starting point should be the strategic objectives for UAE Smart Governance as a whole, as described in
UAE Smart Data Framework: overview and principles. These encapsulate the impact that the government
as a whole is seeking to achieve.

Determine how you can best measure your own Entity’s impact on the delivery of these objectives. Seek to
develop success criteria and targets that are SMART:
1) Specific – clear and unambiguous;
2) Measurable – quantifiable;
3) Achievable – realistic and attainable;
4) Relevant – applicable and worthwhile;
5) Time-bound – delivered within a specific timeframe.
Resources to help you
Further advice on developing an effective approach to Benefit Realization is set out in the global standard
on ICT-enabled, data-driven service transformation (‘The Transformation Government Framework’, or
TGF3).

3
Refer to V2 of the standard published by international open standards consortium OASIS in 2014.

23
24
GUIDANCE NOTE 3: DEVELOPING A DATA INVENTORY

Purpose This document describes how to create a list of datasets which are collected,
managed or maintained by the Entity. While it may not be possible to create a
complete list in one step, this Guidance Note helps Entities ensure that the
most valuable data assets are listed as an initial priority, and then to expand
the Inventory over time.

When to use When the Entity has a management structure and a team responsible for data
in place (for example by using Guidance Note 1: Data governance roles and
processes)

Responsibility Data Management Officer, reporting to the Entity’s Director of Data.

Overview
To realize the strategic vision of efficient and effective data management in government that
enables better decisions and better services, each Entity requires a good understanding of its current
data assets and data processes. The first step of this is to produce an inventory of all datasets in the
entity. This allows the entity to identify gaps where data is currently not fit for purpose, to spot and
address duplication and become more standardized.

This Guidance Note provides help in listing and inventorying the data an entity holds and covers:

 What types of data should be considered;


 The process for producing the initial version of the inventory;
 The process for expanding and enriching the inventory over time (annual review).

Types of data
Structured data
Structured data that is machine-readable data such as a table in a spreadsheet, database, and data
on a geospatial map is the main set of data which needs to be inventoried. Entities should list
existing data which the Entity uses, maintains or collects. This could be data frequently used by the
departments within this Entity which might not have a clear owner. It should also include all data
where the Entity is responsible for collecting and updating the data, even if this work is done by
others on its behalf.

Data should be listed in the form of datasets. A dataset consists of data with its metadata. The
metadata provides context and information about the data. Therefore, a dataset should be an
individual object that makes sense as a whole by itself.

A dataset may be a database or spreadsheet along with its name, location, description. It could also
be a map or a table from a report moved to a spreadsheet.

25
It may be more practical to count a collection of data, such as a database, as one dataset or as
several. You should count it as one dataset if the data within a database is:
 thematically related
 easiest to describe as a whole
 Interrelated.

Otherwise, it likely consists of several datasets. The split depends on the existing and potential use of
the data. It’s up to the data owners who understand the data best to make the judgement decisions
on how data should be listed as datasets.

Unstructured data
While we recommend primarily focusing on structured data, unstructured data or information such
as text documents, diagrams, pictures or media can also be important to publish or share effectively.
Entities will generally have a lot more unstructured content and it will be difficult to inventory all of
it, so two key steps are recommended:
1. Identify opportunities to turn unstructured data into structured data. For example, by
putting tables in Word documents into a spreadsheet or seeing if there’s a geodata version
of a map picture available. Then deal with this structured data as explained in the process
below.
2. Identify information that’s particularly relevant for re-use – within the Entity itself or
externally by the public or other Entities. This might be a report, presentation or videos
which impart important information or can be utilized in new ways, and list these in the
Inventory.

Recommended process for developing the initial Inventory


The diagram below summarises the process Entities should follow for developing the initial version
of the Entity’s Data Inventory, illustrating which of the key data governance roles will normally have
lead responsibility for each step of the process. Each step is then described in more detail below.

26
1. Identity a data representative per department within the Entity
Each department or business unit within the Entity should have a named responsible Data Custodian
who has a good understanding of the data their department produces, uses and manages. This
person should be someone who is in a senior role (or appointed directly by a senior role) and
regularly deals with data and is aware of the variety of data which exists within their department.
They may be supported by a Data Specialist who has technical ownership or understanding of the
data. For larger departments that handle a lot of data, this role may be covered by more than one
person. Further guidance on the role of Data Custodian and Data Specialists is set out in Guidance
Note 1: Data governance roles and processes.

2. Ensure each department produces a draft list of datasets


Under the co-ordination of the Data Management Officer, the Data Custodian / Specialist within
each department should go through the following process to develop an initial list of what data they
know or expect to be managed within their department.

There is no need to change or rearrange data before adding information about the dataset to the
list, or to collect any data which is not already held.

A: Record existing data lists

Draw together existing lists of datasets that are collected, maintained or managed by the Entity.
These may include:
 The list of Primary Registries identified by the Federal Data Management Office.
 Data which has been previously requested by other Entities, external bodies, Federal Data
Management Office or other departments within the current Entity.
 Data listed in the Entity’s Information Asset Register (as required by the Information Security
Regulation)
 Existing data catalogs or lists: e.g. data available in a catalog or portal, documentation of
previous information audits, datacenter inventory, management databases or software asset
lists.

B: Brainstorm and list all other datasets

Next the Data Custodian and/or Data Specialist should think about and list any datasets the Entity:
 Collects
 Stores
 Maintains and updates
 Commissions externally

Aim to be as comprehensive as possible, but it is not expected that all datasets will be captured in
the initial version of the inventory. It may be useful to consider:
 Any datasets that have already been openly published or are currently shared with other
Government Entities.
 Obvious or high-value datasets: for example, data which can be used to provide a service to
individuals or businesses (such as setting up a business, hiring a car, managing insurance,
choosing where to live), make government more transparent, or any data which is expected
to be managed by this Entity

27
 What datasets exist for each type of data: e.g. real-time, operational, reference data,
aggregated data (see illustrative table below).

 Any strategic reference data the Entity may hold, see table of examples below:

28
C: Create inventory and prioritize

Within a spreadsheet or table, add the following for each identified dataset from the previous two
steps:

 A name or title for the dataset. If it doesn’t have an existing name that you are aware of,
choose a short descriptive name.
 A brief description to clarify what data is being referred to and its scope.
 The department or business unit responsible for managing the data
 The list of data attributes (normally column headings for tabular data) that are used in this
dataset. Entities do not need to list data attributes for unstructured data.
 The Data Custodian, if known, i.e. the role or person within the department responsible for
the data
 The Data Custodian’s initial assessment of the extent to which this dataset should be a
priority for initial publication as open or shared data (using the process and prioritization
scoring set out in the Guidance Note 4: Prioritization criteria and process).

Example

Dataset Description and any Responsible Data Attributes Priority Data


name notes department score Custodian

Bus All current bus Operations Bus number, bus 34/38 Mark Jones
Transport timetables department stop location,
timetables arrival time,
departure time,
frequency

3. Integrate and validate


Each draft list from each department should be reviewed by the head of that department and Data
Management Officer, and then combined into a single Entity-wide list.

The Data Management Officer should then:


1. Co-ordinate with the responsible Data Custodian to repeat steps A, B and C to identity any
missing datasets that may not sit within any individual department or which have been
missed.
2. Ensure all datasets have an assigned Data Custodian.
3. Check and ensure dataset names are unique and their descriptions are clear.
4. If any information is missing, contact the relevant person to get this resolved.
5. Ask department data leads to confirm that in their view, the inventory includes all existing
open datasets, all obvious datasets and all high-value datasets held by their unit.
6. Identify any core reference data used by the Entity where this Entity is not the owner. Take
these out of the Inventory and forward to the Primary Registries team within the Federal
Data Management Office for resolution. They will decide which Entity has ultimately
responsible for maintaining these datasets and will be the authoritative source of the data. If
this Entity is identified, those datasets will go back in the Inventory.
7. Ensure there are no duplicates in the inventory.

29
4. Prioritize
This Inventory should be taken through the prioritization process described in Guidance Note 4:
Prioritization criteria and process. The Data Management Officer should ensure:

 all datasets have a prioritization score


 the scores have been reviewed and adjusted so that the prioritization of the whole list
makes sense
 All datasets listed on the Inventory have been prioritized on a consistent basis.

5. Assessment and approval


The full initial Prioritized Inventory should be reviewed by the Director of Data who should verify
that:

 the inventory contains a reasonably comprehensive list of data held by the Entity
 no key datasets are missing
 it was carried out by the appropriate staff members
 It contains the prioritization information specified in Guidance Note 4: Prioritization criteria
and process.

Annual review – expanding the inventory


The inventory process should be repeated at regular intervals (for example, annually) in order to:

 Identify new datasets managed by the Entity (these could be completely new or extensions
and reformulations of existing data)
 Respond to user demand for data
 Review the existing inventory in light of publication and sharing: both lessons learned and
feedback received from other Entities, the public, external stakeholders and internal staff.

The process to follow should be similar, but instead of listing all possible datasets it should involve
using the existing inventory as a basis and using each step of the process to see how the inventory
can be expanded or amended. Expansion should cover both:

 Extending the inventory, by adding new datasets


 Enriching the inventory, by increasing the proportion of datasets that have been catalogued
(using the Data Exchange Standard and Data Quality Standard).

30
GUIDANCE NOTE 4: PRIORITIZATION CRITERIA AND PROCESS

Purpose It will not be possible to ensure all the Entity’s datasets meet the Smart Data
Framework requirements at once. This Guidance Note provides criteria and a clear
process for prioritising the order in which datasets should be made conformant to the
standards prior to their publication or exchange.

When to use After producing an inventory of the Entity’s data assets, and before proceeding to take
the initial highest priority group of datasets through the Data Conformance Process
described in Guidance Note 5.

Responsibility Data Management Officer

Overview
This Guidance Note helps Entities focus resources on making the most important datasets conform
to the Smart Data Framework standards first and ensuring there is a clear prioritized plan for which
datasets will be ready for publication and exchange next.
Once a Government Entity has prepared an initial draft of its Data Inventory (using Guidance Note 3:
Developing an Entity-wide Data Inventory), it should not seek to ensure full conformance with the
UAE Smart Data Standards for all its data at once. We recommend prioritising which of the
inventoried datasets it should prepare first for publication as open data or for exchange with other
Entities.
By starting with a subset of its data inventory, Entities can:
 Quickly publish and exchange high value and low effort data
 Go through the process faster, learn from it and adopt desired changes to the process in
future.
The guidance below looks in turn at:
 The recommended process for prioritising datasets
 The criteria which are recommended for use within the prioritisation process.

Recommended process for prioritizing the inventory


The diagram below summarises the process that Government Entities are recommended to follow
when prioritising their data, illustrating which of the key data governance roles will normally have
lead responsibility for each step of the process. Each step is then described in more detail.

31
Relationship Use Guidance Note 5 to start the
Use Guidance Note 3 to prepare
with other parts process of Smart Data Standards
an inventory of the Entity’s 4
compliance for datasets in the first
of Smart Data
datasets
Toolkit sprint

1 2 7
Review the Prioritised Inventory against
Identity whether the Entity has UAE and Entity strategic objectives, and
Director of Data any datasets matching UAE Identify decide on priority groups of datasets to
Primary Registries datasets manage in a series of ‘sprints’
requested by
the FDMO for 5 6
priority Integrate Custodian
Data Prepare a full Entity-
projects lists together and
Management wide Prioritised
validate overall
Officer Inventory
priority

3 4
Assess datasets each
Data Custodians Review complete
Custodian is responsible
and Specialists ordering and make
for against prioritisation
adjustments
criteria

1. Identify primary registries datasets


Identify whether the Entity has any datasets relevant to the UAE Primary Registries as identified by
the Federal Data Management Office. If so, these and any dependent datasets should be given top
priority.

2. Identify datasets required for priority projects


Identify datasets required or requested by the Federal Data Management Office for priority national
projects. This could be for:
 Cross-Entity projects,
 Datasets relevant to achieving national indicators and realizing Smart Governance goals.
Discuss requirements with the Federal Data Management Office and the Smart Data Electronic Platform
team. Place recommendations next on the priority order.

3. Assess Inventory against prioritization criteria


Each Data Custodian should now assess the datasets they are responsible for (using their Data
Inventories list), against the prioritization criteria described this Guidance Note. These criteria assess
both data readiness and the benefit or value of sharing that data, resulting in a balanced score (with
the overall score for ‘priority’ being an equally-weighted average of the scores for benefit and
readiness).
The priority order should be noted in each Custodian’s departmental inventory.

32
4. Review Ordering
Data Custodians should review their complete prioritized list (which includes primary registries,
datasets needed for projects and results of applying the prioritization criteria to the data in their
inventory) and re-arrange as needed. For example, if there are many datasets with the same score,
use judgement to prioritize between them or if a dataset looks out of place and feels like it should be
above or below others, rearrange as needed.

5. Integrate and validate overall priority


The Data Management Officer should review the prioritized inventories of each department and
combine them together. Then, follow steps 1 and 2 to additionally prioritize other datasets which do
not belong within a specific department.
The combined whole prioritized list now needs to be judged for whether the ordering makes sense.
As part of this, the Entity may also want to consider having a spread of types of datasets in its initial
batch of priority datasets – some very easy to publish, some very valuable to test the comparative
effort and impact of taking these through to publication.

6. Add priority to Entity’s full Data Inventory


The Data Management Officer should ensure that the validated prioritization scores are included
within the Data Inventory then reviewed and signed off by the Director of Data.

7. Review and prepare sprints


The Director of Data should satisfy themselves that the prioritized list makes sense and aligns with
the strategic aims of the Smart Data Framework as well as within the aims of the Entity itself. Then
the final approved list should be split into batches of prioritized datasets that make sense to tackle
together through a series of ‘sprints’ through the Data Conformance process described in Guidance
Note 5 of this Implementation Guide.

Recommended criteria for assessing priority


There are two broad sets of criteria to consider:
1. Benefit criteria for evaluating the potential value of opening a particular dataset to
individuals and Private Sector Entities, or sharing it with other Government Entities.
2. Readiness criteria for evaluating the effort involved in getting the dataset ready for
publication or exchange.
In combination, these two sets of criteria help identify the datasets which will have the most impact
with the least effort. They have been chosen because they balance each other. If a dataset is very
high value, but a lot of work must go into making it publishable or reusable, then it is still fairly high
on the priority list, but below data which is both. Similarly, just because the data will be very easy
and quick to publish, does not mean it should be focused on first unless there’s some potential
benefit in its publication and sharing.

33
The tools below provide simple-to-use recommended approaches for quantifying both of these
dimensions.

1. Assessing potential benefit


First, the benefit criteria measure the potential value or degree of benefit that could be created for
individuals, Government Entities and Private Sector Entities in opening up or exchanging each
dataset. This provides a simple way to evaluate the comparative impact that publishing different
data would have on the strategic goals of UAE’s Smart Data program.
Each dataset should be given a score out of 5 for each of the following four questions, recording the
score in the right hand column:

Benefit Question Choose and record answer in score Score


criteria
User demand How likely is it that individuals, 1 – highly unlikely: no evidence and no
for data Government Entities, Private plausible reasons this would be relevant
Sector Entities would want to 2 – unlikely
use or have access to this data? 3 – possible
4 – highly likely: we can think of good
reasons others may want this data
5 – Definite: we’ve already seen requests for
this data
Economic If we open up this data, how 1 – highly unlikely
impact likely is it that private-sector 2 – unlikely
Entities could use it - perhaps 3 – possible
combined with other data - to 4 – highly likely
create commercially valuable 5 – Definite: we have clear evidence that
products and services? private-sector Entities want to exploit this
data commercially
Better services How likely is it that exchanging 1 – highly unlikely
this data will lead to 2 – unlikely
innovations and services that 3 – possible
improve the quality of life for 4 – highly likely
people in UAE? 5 – Definite: we have clear evidence of the
quality of life gains that could be made
Better How likely is it that will 1 – highly unlikely
governance exchanging this data will 2 – unlikely
improve the efficiency, 3 – possible
transparency and accountability 4 – highly likely
of Government Entities? 5 – Definite: we have clear evidence of
efficiency; transparency or accountability
gains that could be made
Total score out of 20:

2. Assessing data readiness


Secondly, the readiness criteria assess the state and quality of the data. This is to establish how
much work is needed to prepare it for publication or exchange. Data which is of high quality, already
documented, up to date and with a clear owner can be more easily published or exchanged. These
criteria help to identify ‘quick wins’ for the Entity.

34
Please assess each dataset for the following, recording the score in the right-hand column:
Readiness criteria Question Scoring criteria (pick most suitable Score
score and 1 – medium if unsure)
Accuracy How accurate is the data? 2 – High accuracy (we review and
check accuracy)
1 – Medium accuracy
0 – Low accuracy (there are known
errors in the data)
Completeness How complete is the data? 2 – High completeness (we have all
the data at current granularity)
1 – Medium completeness
0 – Low completeness (there is
known missing data, or this data
will not make sense by itself)
Timeliness 2 - Latest month / week / year is
How up to date is the data?
available
1 - Data is not time sensitive OR we
have all the data apart from latest
month / week / year
0 - Data is out of date
Validation 2 – Yes, data is published with same
Does the data use a schema
headings / fields (schema) each
or is standardised?
time
1 – The data does not use a schema
AND is not published regularly (i.e.
it is one off data)
0 – Data is regularly updated, but
does not use a set schema
Ownership
Is there a clear specific data 2 – Yes
owner?
0 – No
Description
Does the data have existing 2 – Yes
metadata - that is,
0 – No
information on what the
data is about, how it was
generated etc.?
Accessibility
Is the data already published 2 – Yes
somewhere or available on
0 – No
the web / through an API?
Interoperability
Is the data in an open 2 – Yes
machine-readable format?
0 – No
License
Does the data have a 2 – Yes
license?
0 – No

Total Score out of 18

Combine the two scores to get an overall priority score out of 38.

35
GUIDANCE NOTE 5: DATA CONFORMANCE PROCESS

Purpose This Guidance Note outlines the process that Entities are recommended to
follow to ensure their data conforms with the Smart Data Standards, and is
ready for publication and exchange.

When to use Before publishing data as open data or exchanging shared data with other
Entities. Entities should focus in turn on successive batches of data that have
been prioritised for data conformance in line with the advice in Guidance
Note 4: Prioritization process and criteria.

Responsibility The Director of Data has overall accountability for ensuring effective systems
are in place to manage data conformance, but the lead responsibility for
operating these systems will lie with the Data Management Officer

Overview of recommended process to ensure data conforms to


standards
The diagram below summarises the process that Government Entities are recommended to follow as
they seek to ensure that their datasets comply with the mandatory requirements of the UAE Smart
Data Standards. The diagram illustrates which of the key data governance roles will normally have
lead responsibility for each step of the process – and is followed by a brief description of each step in
the process.

Relationship Use Guidance Note 3


with other parts and 4 to prepare a
4
of Smart Data prioritised inventory of
Toolkit the Entity’s datasets

3
Validate and publish
B Check dataset descriptions and
Director of Data
quality meet Smart Data Standard
requirements

1
A C
Data Oversee and manage the data compliance process of the data Validate and
Publish or
Management custodians and data specialists within each department ensuring add to expanded
exchange data
Officer each dataset is fully described and quality improved Data Inventory

2
Manage datasets
A Classify B Choose C D Add E
Data Custodians Document Manage
each appropriate Metadata & Data Quality Ongoing
and Specialists Permissions
dataset Format develop Schema
(DC1) (DE1) (DE7) (DE2 and DE3) (DQ1)

1: Manage and coordinate


It is recommended that the Data Management Officer is responsible for overseeing and helping the
Entity’s Data Custodians and Data Specialists complete steps 2[A] – 2[E] above, by:
 Establishing a clear internal timetable for completing the data conformance process, aligned
with Entity and Federal milestones for publishing or exchanging data

36
 Ensuring that Data Custodians and Data Specialists are fully briefed on their roles and on the
requirements of relevant Smart Data Framework standards
 Facilitating opportunities for Data Custodians and Data Specialists to come together and
exchange experiences and lessons learned through the process.
A summary is set out below of the steps that need to be taken as part of this coordinated approach:

 Steps 2[A] to 2[E] look at the actions which Data Custodians and/or Data Specialists should
take to ensure that an individual dataset is conformant with the UAE Smart Data Standards
 Steps 3[A] to 3[C] then look at the actions which the Data Management Officer and Director
of Data should then take to validate and approve for publication the datasets that have
come through this process.

2: Management of datasets by Data Custodians and Data Specialists

2A: Classify

The Data Custodian should classify the dataset as Open, Confidential, Sesnitive or Secret, in
accordance with the [DC1] Data Classification specification. Detailed guidance on the process to
follow is given below in Guidance Note 5.1: Classifying data.
Once this is done, you might be left with the original dataset and one or more derived (or ‘child’)
datasets which have been modified to allow an Open classification. Both the original and derived
datasets should be catalogued separately in the following steps.

2B: Choose format

Decide on an appropriate format in which to make the data available that complies with the [DE1]
Data Formats specification and produce a sample dataset in that format. Detailed guidance on the
process to follow is given below in Guidance Note 5.2: Formatting data.

2C: Decide on and document Shared Data Access Permissions

For data which will be exchanged with other Enitities rather than published as Open Data, you will
need to comply with [DE7] Shared Data Access Permissions. This will involve determining and then
documenting the appropriate permissions model. Detailed guidance on the process to follow is
given below in Guidance Note 5.3: Documenting a permissions model for shared data.

2D: Add Metadata

Describe each dataset with metadata ensuring that all Core Metadata fields required in [DE2]
Metadata are complete and as many Optional Metadata fields as can be easily filled in. Detailed
guidance on the process to follow is given below in Guidance Note 5.4: Adding metadata and
schema.

2E: Manage data quality

Assess the data against the [DQ1] Data Quality Principles. The Data Custodian should then:
 Identify and implement any ‘quick wins’

37
 Then develop a longer term plan for improving the quality of the dataset to better meet user
requirements.
The Data Custodian’s work on this will need to feed into broader work on improving data quality in
the Entity, as detailed in Guidance Note: 2 Building a Smart Data Roadmap.
Detailed guidance on the process that Data Custodians should follow is given below in Guidance
Note 5.5: Managing data quality.

3: Validation and publication


Once a dataset has gone through the process described in Steps [2A] to [2E] above, the Data
Management Officer and Director of Data will need to validate and approve the dataset either for
publication or for sharing and exchange with other government entities over the smart data
electronic platform. Further guidance on this – along with guidance on when these decisions need
also to be approved by the Federal Data Management Office) – is given below in Guidance Note 5.6:
Validation and publication of data.

5.1 Classifying data


This Guidance Note outlines the recommended process for ensuring that a
Purpose
dataset has been correctly classified in accordance with UAE Smart Data
Standards.

When to use Before a dataset is published as open data or exchanged with other Entities,
the data should be correctly classified.
The Data Custodian that the Entity has identified as accountable for a
Responsibility
particular dataset should be the lead person responsible for applying the Data
Classification Standard to the dataset.

The Entity’s Data Management Officer is responsible for supporting all Data
Custodians across the Entity as they undertake this task, and for ensuring a
consistent approach at the Entity-wide level.

The Entity’s Director of Data is responsible for reviewing and approving


classifications of all datasets.

Overview
At the start of the Data Conformance Process, it may be that a dataset has already been classified as
Open, Confidential, Sensitive or Secret – because FGEs have been required to use such a classification
for several years (see for example the ‘Regulation of Information Security at the Federal Entities of
UAE Cabinet Resolution’ No. (21), 2013). Previously, however, Entities were free to establish their
own criteria for determining what sort of data they assigned to each class. Now, following
agreement of the UAE Smart Data Standards, there is a common government-wide set of criteria
which all Government Enities should apply. These criteria are intended to enable much greater levels

38
of open data publication and data exchange between organisations than has historically been the
practice in the UAE.

The table below gives criteria for assessing what data falls into each class, with examples. Deciding
the classification level depends on a risk assessment to assess the level of damage that may result
from unrestricted disclosure of the data (to privacy, security, commercial confidentiality etc).

Data classification

Open Criteria:
Data that can be openly disclosed to individuals, governmental, semi-government entities
and private sector for use, re-use and sharing with third parties. This should be the
default classification for all non-personal data, and exceptions to this should have a
documented rationale that clearly explains why open publication of the data would
contravene specific criteria listed below that require classification as Confidential,
Sensitive or Secret

Examples:
Open data can include:
 Real time data: constantly updating data, often high volume and high velocity
Examples include: weather data; footfall through airport; cars passing toll
booths; pollution levels; real-time location data; electricity usage
 Operational data: the records that are made as part of an Entity carrying out its day
to day basis
Examples include: Entity organisation chart; forecast or modelling data; buildings
owned/maintained; budget; staff levels; performance against metrics
 Reference data: authoritative or definitive data that rarely changes about things
Examples include: timetables; names and locations of schools, hospitals, bus
stops; tax codes; land holdings; mapping data; indicators; address data
 Aggregated data: analysed and summarised data, which provides overview
information in relation to other types of data
Examples include: hospital operation success rates; school exam pass rates;
population statistics; housing; tourist numbers by month/year; nationalities of
visitors

Confidential Criteria:
This is the default classification for datasets containing personal data which is non-
sensitive. "Personal data" means any information relating to an identified or identifiable
natural person; an identifiable person is one who can be identified, directly or indirectly,
in particular by reference to an identifier such as a name, an identification number,
location data, online identifier or to one or more factors specific to the physical,
physiological, genetic, mental, economic, cultural or social identity of that person. Non-
sensitive personal data refers to all types of personal information which are not
‘confidential’ (as defined in the criteria for Sensitive Data below).

In addition, data should be classified as Confidential if unrestricted disclosure or exchange


of the data cause No damage to government bodies, companies or individuals such as:
 Adversely affecting or preventing the ability of a Government Entity to carry out its
day to day duties
 No damage to assets, or limited financial loss of an Entity, company or individual
 Limiting the competitiveness of companies and negatively affecting the principle of
equal opportunities

39
 Adversely affecting public safely, criminal justice and enforcement activities.

Examples:
Typically, non-sensitive personal data will include information which is personal but does
not impact on the reputation of the person. Examples include name, date of birth and
address.
Examples of other types of Confidential information include:
 Minutes of meetings, internal regulations and policies, and government-body
performance reports
 Correspondence within a government body or with other government bodies or third
parties
 Financial transactions and financial reports
 Company data such as tenders or contracts which provide for non-disclosure clauses
 Individual’s dealings with the government, which include personal data (details of
ownership of properties of various kinds, commercial or professional licenses,
personal documents, residence permits, visas, and leases).

Sensitive Criteria:
This is the default classification for datasets containing sensitive personal data. Sensitive
personal data are personal data that directly or indirectly reveal an Individual's family,
racial or ethnic origin, sectarian origin, political opinions, religious or philosophical beliefs,
their union membership, criminal record, health, sexual orientation, genetic data or
biometric data

In addition, data should be classified as Sensitive if unrestricted disclosure or exchange of


the data may cause limited damage to government bodies, companies or individuals such
as:
 Infringing Intellectual Property Rights
 A significant decline in the ability of one of the bodies to carry out its functions,
limited damage to its assets, or significant financial loss
 Causing limited damage to companies that could lead to loss of competitiveness, or
loss of some of its core cognitive and intellectual advantages or incurring heavy
financial loss
 Limited damage to the operational effectiveness of the police, security forces, military
forces, intelligence services or the administration of justice
 Limited damage to relations with friendly governments or damages to international
relations resulting in formal protest or sanctions.

Examples:
For example, this might be the details and content of:
 Draft government laws and policies and legislation
 Audit reports of a government body
 Employees’ complaints and investigation minutes
 Staff salaries and performance reports
 Confidential financial expenses
 Data, plans or technical documentation for technological information systems and
networks of a governmental body
 Credit card or bank accounts data
 Judgments, irregularities or violations under investigation relevant to individuals
 Attachment orders over assets and property of individuals and companies.

Secret Criteria:
Data the unrestricted disclosure or exchange of which may cause significant damage to
the supreme interests of the United Arab Emirates and very high damage to government

40
bodies, companies or individuals, such as:

 Disclosing any personal information of a VIP (very important person) or infringing any
Intellectual Property Rights of a VIP
 A significant or noticeable negative impact to the supreme interests of the United
Arab Emirates
 A sharp decrease in the ability of one of the vital bodies to carry out its functions, or
very high damage to its assets, heavy financial loss, clear negative impact on the
image of the body and a loss of public confidence in such body and in the government
in general
 Causing significant damage to private sector entity that have vital and strategic roles
in the national economy, which may lead to heavy financial losses, bankruptcy or loss
of its leading role
 Seriously endangering the safety and lives of certain individuals associated with a
security role (e.g., security forces and police) or as parties to serious judicial cases
(e.g. witnesses)
 Information the disclosure of which would negatively affect the maintenance of
security and the administration of justice, or cause major, long-term impairment to
the ability to investigate or prosecute serious crimes.

Examples:
Examples include details and content of:

 Security reports, minutes or orders


 Sensitive minutes and reports of the Council of Ministers or its committees
 Agreements or contracts of a secret nature between the United Arab Emirates with
other countries or individual Emirates
 Government Entity’s data, plans, operating systems which would significantly damage
the production of energy or water, infrastructure networks or traffic control or
communications systems
 Security forces data, including the facilities, equipment, personnel and operation
systems
 Data and regulations of individuals and entities under control or blacklisted
 Data of control and surveillance systems and entry and movement control systems at
vital institutions
 Data relevant to security detectives, spies or witnesses in serious lawsuits
 Data relevant to Government strategic financial investments of nature (national
companies, investment funds, off-shore companies)
 Attachment or travel ban orders.

It is therefore vital that every dataset prioritised for open publishing and for inter-Entity exchange
has its classification status reviewed against the requirements of the Data Classification Standard.

The diagram below summarises the process that the Government Entity is recommended to follow
when classifying a dataset against the Data Classification Standard mandated in the UAE Smart Data
Framework.

41
Relationship
The current batch of datasets to be classified will be
with other parts
listed in the prioritized data inventory created using
4
of Smart Data
the process recommended in Guidance Note 3 and 4
Toolkit

Director of Data Review and approve all


dataset classifications

1 3
Data
Provide support to Data Custodians as they classify data to ensure a Validate and add to
Management
consistent approach expanded Data Inventory
Officer

2
A B C D E F G H Add
Data Custodians Check for Check for Weigh Assess Consider Identify classifi-
and Specialists Think open
barriers to negative public level of public derivative cation to
disclosure effects interest restriction inventory public data metadata

Step 1: Manage and coordinate

It is recommended that the Data Management Officer is responsible for overseeing and helping the
Entity’s Data Custodians and Data Specialists complete steps [2A] – [2B] above, by:
 Establishing a clear internal timetable for completing the data cclassification process, aligned
with Entity and Federal milestones for publishing exchanging data
 Ensuring that Data Custodians and Data Specialists are fully briefed on their roles and on the
requirements of Data Classification Standard
 Facilitating opportunities for Data Custodians and Data Specialists to come together and
exchange experiences and lessons learned through the process.

Step 2: Classify each dataset

For each dataset, the responsible Data Custodian should classify the dataset using the eight step
process recommended above. These steps that an individual Data Custodian should go through can
also be visualized as a logical decision model, as illustrated below.

42
A.1. THINK OPEN

B.
2. Barriers to SHARED E.5. Assess level of restriction
disclosure? Yes DATA (Confidential,
Restricted, Sensitive,
Confidential Secret)
, Very Confidential

No
No
F.6. Include in public inventory?
C.
3. Negative D.
4. Benefits
effects from Yes outweigh
disclosure? negatives?
G.
7. Create related open data?
No Yes (E.g. with only part of the data)

OPEN
H.8. Add classification to dataset
DATA
metadata

Each of the steps on this process are described below.

2A. Think Open

It is vital to recognize the UAE Government’s strategic commitment to high levels of openness.
When following the steps of this procedure, the default assumption about a dataset should be that it
will be classified as open. Exceptions require a compelling case linked to clear criteria, which should
be documented and then personally signed off by the Entity’s Director of Data.

‘Thinking Open’ is often the most difficult part of the classification procedure, especially if the Entity
is inexperienced with open data. Staff may be concerned that publication will reflect badly on them
where, for example, some of the data may be interpreted as unfavorable, or the data may have gaps
or inaccuracies. It is vital that staff understand that they will be have the backing and support of the
management for the decision to publish data in which problems are later found. Such problems
plague all Entities and all data, and publication should be seen as an opportunity to help find and
improve errors and problems.

For these reasons, it is very helpful if at the start of the data conformance process, the senior
management communicate to the staff their and the Entity’s commitment to openness. The Director
of Data should be available to respond to any concerns raised by staff.

The following steps should be carried out by the person(s) most familiar with the data, such as the
Data Custodian, for each of the datasets in the current batch going through the data conformance
process.

43
2B. Check for barriers to disclosure

There are certain criteria that may preclude a dataset being classified as ‘Open’ and then disclosed as
Open Data. These include two absolute barriers to disclosure. A dataset cannot be Open if its
publication would:

 violate existing legislation or laws; or

 Represent a significant threat to the supreme national interest and/or national security.

Check that your dataset’s publication would not violate one of these conditions. In most cases it
should be obvious if one of these barriers applies, but in cases of doubt you may need to consult
your Entity's legal department. If a dataset is barred from publication by one of these barriers, then
it cannot be classified as Open. Proceed to Step [E] to determine whether it should be classified as
Confidential, Sensitive or Secret.

2C. check for harmful effects of disclosure

If the barriers to disclosure in Step [B] do not apply, then there are other possible harmful effects to
consider before the data can be confirmed as open. Consider whether release of the dataset would
entail a significant risk of one or more of the following by checking whether the answer ‘yes’ to any
of the questions listed in the checklist below.

Risk Questions and notes


A breach of the privacy of 1. Consider whether any individuals can be identified from this data?
any individual
This would apply if the dataset includes data about identifiable individuals
- for example, their address, medical history, date of birth, or tax
information.

Note: A person does not need to be named to be identifiable. If the data


contains information about individuals, even if the individuals cannot be
easily identified, they may become identifiable when the data is combined
with other publicly available information or datasets. Any release of data
at the level of individuals, or small groups of individuals such as
households, is likely to run this risk.

If the answer is yes, then the Entity should:

 Classify the dataset as Confidential if the data relates to Personal


Information or Confidential if it relates to Sensitive Personal
Information

 Try to create a derivative dataset which can be classified as


‘Open’. This could be achieved by anonymizing the data, taking
out data referring to small sample sizes, aggregating or
summarizing the data, or taking out attributes which hold the
personal data. Once one or more derivative datasets have been
created:

 Check whether any of the other risk assessment criteria


apply.

44
 If not, then classify as ‘Open’ and add to the Data
Inventory.

Note: It should almost always be possible to create a version of the data


which does not breach privacy. In many cases, these summary,
anonymized data sets will already exist within the Entity: providing
statistical, analytical and management information for use in running the
service which has generated the more detailed personal data.
A breach of legal rights or If publication of the data would breach an existing legal agreement, the
agreements (such as Non- dataset should initially be classified as Shared and steps taken to see if
Disclosure Agreements, parts could be published openly or such agreements renegotiated to allow
Intellectual Property rights or publication in future.
release of commercially
sensitive information) Specifically, consider the following questions:

2. Does the legal statute under which the Entity is empowered to


collect the data from or relating to a Private-Sector Entity place a
duty on the Entity to keep that data confidential?
If the answer is yes, then the Entity should classify the Data set as
Confidential or Sensitive, depending on the degree of damage that would
be caused by breach of confidentiality (see Step E).

3. Does the Entity have a Non-Disclosure Agreement or other contract


in place with one or more Private-Sector Entities that places
contractual obligations to keep the data confidential?
If the answer is yes, then the Entity should:

 Consider whether it would be helpful to approach the relevant


Private-sector Entities to seek their agreement to voluntarily agreeing
to waive their non-disclosure rights in relation to some or all of their
data
 If not, classify the dataset as Confidential or Sensitive, depending on
the degree of damage that would be caused by breach of
confidentiality (see Step E)
 At any future review points or renewal points in the contract, consider
the scope for re-negotiating the contract to enable greater disclosure
of Open Data in future.

4. Does a Private-Sector Entity hold Intellectual Property Rights in some


or all of the data?
If the answer is yes, then the Entity should:

 Engage with the IPR holder to establish whether it will give consent to
opening up the data, potentially with some license restrictions
 If not, classify the dataset as Confidential or Sensitive, depending on
the degree of damage that would be caused by breach of
confidentiality – unless there is an overriding public interest in
publishing. (See Steps D and E)
 In cases where the Private-Sector Entity’s IPR arises from the
performance of a commercial contract on behalf of the Government
Entity, seek to re-negotiate these contract terms, particularly at any

45
contract review or renewal points.
Note: In answering this question, Government Entities should note that it
is not acceptable to classify a dataset as Confidential on the grounds that a
Government Entity has Intellectual Property Rights in the data, even in
cases where it is currently exploiting that IPR on a commercial basis.
Rather, the dataset should be classified as Open Data, albeit with
consideration given to the nature of the licensing and pricing basis on
which it is made Open.

Risk to the safety of Consider:


individuals and society
5. Would disclosure of this data pose risks to the health and safety of
individuals or to public health and safety?

6. Would disclosure of this data pose other risks to society?


If any risks identified under these two questions are:

 Specific and clear, not general and vague


 Evidence-based

…. then the dataset should be classified as Confidential or Sensitive, with


reasoning documented with sufficient detail that external stakeholders will
be able to understand the rationale and subject it to challenge.

Note: greater transparency is in general a force for social good rather than
a social risk.
Risk of negatively affecting 7. Consider whether disclosure of this data pose risks to the
the administration of justice administration of justice and maintenance of security?
and maintenance of security
If any risks identified under these two questions are:
 Specific and clear, not general and vague
 Evidence-based

Then the dataset should be classified as Confidential or Sensitive, and


reasoning documented with sufficient detail that external stakeholders will
be able to understand the rationale and subject it to challenge.
A significant negative impact 8. Consider whether the disclosure of this data cause significant
on the work and negative impact on the effectiveness with which Entity or other
effectiveness of government Government Entities can deliver its work and objectives?
Note:
 It is not acceptable to treat “potential for Open Data to embarrass the
government because it may reveal poor performance” as a risk under
this heading
 Any risks identified should be:
 Specific and clear, not general and vague
 Evidence-based
 Documented within the Data inventory with sufficient detail that
external stakeholders will be able to understand the rationale and
subject it to challenge.

If one or more of these harmful effects applies:

 Add the risk you have identified to the inventory

46
 Proceed to Step [D].

If none of the above negative effects apply,

 Classify the data as Open and add this classification to the inventory
 Move on to another dataset, or proceed to step [H].

2D. Weigh risk of harm against public interest

If harmful effects of publishing are identified in Step [C], then there is a presumption not to publish,
but they are not absolute barriers to disclosure. In some instances, the public interest in publishing a
dataset may outweigh the negative consequences.

Consider whether there is a high economic value or public interest in publishing the data. For
example, would making the data open:

 Have significant economic benefits, e.g. could the data be used in the provision of new high-
value services?
 Increase transparency of government spending or decision making?

If so, it should provisionally decide whether it would be reasonable and proportionate to publish the
data, in spite of the negative effects identified in Step [C]. The final decision will lie with the Federal
Data Management Office.

If you consider that the public interest outweighs the risk of harm:

 Record this in the inventory


 Classify the data as Open and add this classification to the inventory
 Move on to another dataset, or proceed to Step [H].

If the public interest does not outweigh the risk of harm:

 Proceed to Step [E] to classify the data as Confidential, Sensitive or Secret.

2E. Assess level of restriction

Where a dataset cannot be classified as Open after following Steps [A]-[D], it should be classified as
Confidential, Sensitive or Secret, depending on the damage that would be risked by disclosure.
 Where there are No potential for damage, classify the data as Confidential
 Where the potential for damage is limited, classify the data as Sensitive

 Where the potential for damage is very high, classify the data as Secret.

The [DC1] Data Classification Criteria within the UAE Smart Data Standards sets out the criteria to
be applied when making this classification.

As illustrated below, this classification will affect who can see the data. Confidential data will be
easier to share between officials to whom it is directly relevant, based on their area of work and
seniority. Sensitive data is more restricted with new access permissions requiring explicit approval.
Secret data will have access strictly controlled to named individuals.

47
Where a dataset cannot be classified as Public after following Steps A-D

Easier to share between


Limited officials to whom it is
Restricted directly relevant, based on
their area of work and
seniority

Potential Significant More restricted with new


for Confidential access permissions requiring
damage explicit approval

Very High Access strictly controlled to


Very Confidential
named individuals

2F. Justify exclusions from the Open data inventory

By default, all Confidential and Sensitive datasets should be included in the published version of the
Entity’s Data Inventory. That is, it will be a matter of public record that the Entity holds the data,
even though the data itself will not accessible except from authenticated and authorized users.

If an Entity wishes to make an exception to this, it should demonstrate that simply putting into the
public domain the fact that the dataset exists (as opposed to the data itself) will cause negative
impacts of the type considered in Step [C]. This decision should be agreed personally by the Entity’s
Director of Data.

2G. Consider whether a derivative dataset could be published

Where data has been categorized as Confidential or sensitive a balance needs to be stuck between
the need for confidentiality and the benefits of openness. It may be possible to publish a summary,
redacted version, extract, or other derivative of the data, which would have value as open data but
avoid the negative effects identified at Step [C].

For example, personal data can be removed from a dataset through a range of anonymization
techniques as illustrated below. In order to anonymize (or conceal identity in the information), and
in accordance with the guidelines of the European Union, "Data should be stripped of sufficient
elements so that the author of such data cannot be identified." Specifically, data should be
processed in such a way as to make it impossible to identify a natural person by using "all possible
means and reasonable use." It should be borne in mind that the process of processing information
to strip it of identifiable information is not reversible.

48
Anonymization Process (Hiding Personal Identity)

Example: creating a derivative dataset

Consider a dataset of school students’ educational results. The data would be of value in various ways: for
example, to researchers looking at variation in educational achievement between different genders or
different areas, or economic and social value through an app provided by a startup to help parents compare
different schools. However, the dataset has been labelled as Confidential because the records include
personal information about students and releasing the dataset would breach their privacy.

In this example, there are a number of ways that a derivative dataset could be prepared and published,
depending on the details. It may be that simply anonymizing the records would be sufficient, as individual
students could no longer be identified. If the data is very granular and specific, it may need to be aggregated
or small number suppressed to ensure that individual results or performance can’t be traced to particular
people. In this case, results could be shown by year group, gender and school or with particular attributes /
fields removed.

Data Custodians should therefore consider whether it would be possible to publish a modified
version of the data. If there is the possibility of a derivative dataset that avoids the barriers and
negative effects in Steps [B]-[C], or where the negatives are outweighed by public interest as in Step
[D], then the Data Custodian should list a new derivative (or ‘child’) dataset in the Data Inventory,
noted as such and linked from the original dataset, but classified as Open Data.

This dataset should then also be catalogued following the rest of the Data Conformance process.

There may also be cases where Confidential or Very Confidential datasets could be summarized or
otherwise adapted in ways which, while still not allowing open publication, might enable less
restrictive sharing across Government Entities. Again, if this is the case, then a new dataset should
be created on the Data inventory, at the appropriate lower classification level.

49
2H. Add classification and documentation to dataset metadata, as part of the Data
Inventory

The classifications and supporting reasons process should be documented, in order to inform the
Data Inventory. The classification will form a mandatory part of the metadata for the dataset, along
with the other elements specified in the [DE2] Metadata specification within the UAE Smart Data
Standards. For a smaller Entity this could be done in a standalone document or spreadsheet, but a
large Entity with sufficient technical resources may wish to install their own data catalogue, allowing
data custodians from each department to enter and edit metadata on the datasets for which they
are responsible.

Steps 3 and 4: validation and review

Once all datasets have been fully catalogued, the classification and its supporting documentation
(along with the rest of the Metadata and Format sample) will be collected by the Data Management
Officer and reviewed. The Data Management Officer should then include all relevant results and
metadata for those datasets in the Data Inventory for the Entity.

The resulting catalogued Inventory should then be reviewed internally by the Director of Data. The
Director of Data should confirm that:
 Every dataset being catalogued in the current batch (as identified in the Prioritization
process) has been classified correctly
 Where a dataset has been classified other than Open, a proper consideration has been given
to whether a derived dataset could be recorded as Open or with a less Confidential
classification (evidenced by the documented reasoning)
 The reasons for classifying any data as non-Open are documented in the Data Inventory.

50
5.2 Formatting data
This Guidance Note outlines the recommended process for ensuring that a
Purpose
dataset is correctly formatted in accordance with UAE Smart Data Standard:
[DE1] Data Formats.

When to use Before a dataset is published as open data or exchanged with other Entities, the
data should be correctly formatted.
The Data Custodian is accountable for ensuring the correct formatting of
Responsibility
dataset for which he or she is responsible, but may delegate responsibility for
the work to a Data Specialist.

Process
Each dataset that is published openly or exchanged with other Entities by a Government Entity
should comply with the [DE2] Data Formats specification. To achieve this, we recommend the
responsible Data Specialist should:
1. Identify the type of dataset that is being prepared for conformance
2. Choose an appropriate format to match that type
3. Produce a sample dataset which can be easily shared, shown and approved
4. Lastly: add the format to the dataset metadata and continue with conformance process

Relationship Use Guidance Note 3


Follow rest of Data
with other parts and 4 to prepare a
4 Compliance Steps in
of Smart Data prioritised inventory of
Guidance Note 5
Toolkit the Entity’s datasets

Director of Data

4
Data Validate and add to
Oversee and manage the data compliance process of
Management expanded Data
the data custodians and data specialists
Officer Inventory

1 2 3

Data Custodians Identify Choose Produce


and Specialists type of appropriate sample
dataset format for type dataset

51
1. Identify type of dataset
The first step in choosing a data format is to determine what kind of data you are dealing with.
Different types of data have different properties and need to be formatted in different ways.

Tabular data

Most government data are tabular data. If the data you are dealing with is a list, or would make
sense to record in a spreadsheet then it is almost certainly tabular data.
Tabular data consists of rows, each of which is an individual record in the dataset, and columns, each
of which represents one field of the record. For instance, a dataset about schools might be:
Unique Id Name Highest age Lowest age
AB292 Blue Water High 11 5
HG383 Green Tree Academy 11 5

The second row containing “Blue Water High” is the record about Blue Water High, and the column
titled “Highest age” contains the data from the highest age field about each school.

Geospatial data

Geospatial data relates to information about how you would draw things on a map.
We know that data is geospatial when:
 It contains the coordinates used to point to something on a map - for instance a latitude and
longitude pair - for example the location of parking spaces, or public libraries.
 It contains the shape that we would draw onto a map to represent a particular area. For
instance: data about the catchment area for a school; the boundaries of an electoral district;
administrative regions for school districts; or zoning areas for planning permission.

Real time or service data

Real time data is generally provided immediately via an API (Application Programming Interface) that
can be consumed by other software applications. Data is real time if it changes so frequently that
most questions you would ask about it would be quickly out of date.
One example would be the status of trains on a rail network, or information about current flights -
departures, arrivals and delays at an airport.
Data being provided to power real-time services which frequently access or need to update records
automatically should also be provided via an API.

Structured non-tabular data

Some data is structured, but does not fit into a tabular form in a natural manner. If your data is
hierarchical or contains many levels, then it is likely structured non-tabular data. Examples would
include the organization chart for your department, or a project plan.

52
2. Choose an appropriate format
Once you have identified the type of data you are dealing with in any specific dataset, use the
following table sets out format requirements for common types of data to use:

Data type Mandatory Recommended

Tabular data CSV CSV and Excel file with definitions


and commentary
Geospatial data (the coordinates
CSV
and information about the point)
Geospatial data (shapes of areas)
GeoJSON or KML GeoJSON and KML

Real time data or data used for Via an API


responsive services
An appropriate open, machine readable
Structured non-tabular
format, conforming to an open standard
where available.
E.g. JSON, XML, RDF, GTFS.

Unstructured data Using an open format where such


exists

In addition to the above generic criteria, there are format-specific criteria detailed below.
Data Format Conformance requirements
CSV data The format of a CSV dataset will be conformant if:
 It contains a header row which includes the name of the column
 The formatting of dates or numbers is consistent throughout the whole file
 It does not include empty rows
 It does not include rows with missing or extra cells
 It does not use header names more than once in the same file
 It does not include any commentary or explanatory text
Structured non The format of structured non tabular data is conformant if:
tabular data  It conforms to a pre-existing open standard for representing such data, such as GTFS,
Popolo, the Schema.org job posting standard
or
b) It is in a valid open machine-readable standard such as JSON, XML and:
 The structure of this data is clearly documented and published alongside it
 The structure of the data is appropriate for re-use given the nature of the domain
to which the data relates.

Geospatial data The format of a geospatial dataset will be conformant if:


 It is published in valid GeoJSON or KML (see Smart Data Implementation Guide for
further advice and recommendations on validating conformance).

53
Real-time and An API is conformant if the API endpoint and API documentation is available.
service data API documentation should include:
 Clear reference information providing the functions, remote call and methods for
the API
 Guidance to help developers experiment with the API
 Information about security, versioning and rate limiting so users can plan their
commitment to using the API

Entities may provide an API to their data in addition to publishing or exchanging the data in
one of the other formats.

Alternative formats

If you want to use an alternative format not listed above, you should have a clear reason for why
that is the most appropriate format to publish in. You should also check you’ve selected the most
open, standardized and machine-readable format available to meet the requirements.

3. Produce a sample dataset and check it’s well formatted


Once you have determined the format, the next stage is to produce a sample dataset.
This is likely to be a file, or collection of files, that represent what you would publish each time that
the dataset is updated. It could be the full data file or some of the data. If the data is constantly
updating it will be a particular slice of the data.
Ensure your sample is well formatted. The table below recommends tools and guides for avoiding
common formatting mistakes:

Data Format Tools


CSV data  Linting tools (programs that analyze code, data, etc. for potential errors) such as
http://csvlint.io/ are available for CSV data and can be used to identify errors with files
early.

 If a field contains a comma, a line ending or a double quote then the field is escaped by
wrapping it in double quotes. Within a field that is escaped like that, any double quotes
are doubled up.
Geospatial data  Linting tools are available for GeoJson such as http://geojsonlint.com/ which will help
catch errors in your data files. For KML it is possible to validate your data against the KML
Schema (https://developers.google.com/kml/schema/kml21.xsd?csw=1)
 High quality open tools exist to convert geospatial data between formats, and can be
included in automated dataset generation pipelines to easily publish in multiple formats.
 One good example is Ogr2Ogr http://www.gdal.org/ogr2ogr.html
Structured non-  For many types of common dataset there exist open standards for representing that
tabular data information as structured data which should be re-used as much as is possible.
 Examples of such standards include:
- Schemas found on http://schema.org/
- The Popolo data standard for people, organizations and voting
http://www.popoloproject.com/
 Non tabular structured data should in general use JSON, unless there is a clear reason to
use an alternative format, such as a common standard in an alternative format (e.g. GTFS
for transport data)
Real-time data  APIs should be designed to meet the requirements for your use-case and with privacy and

54
and APIs security built in. Where possible, ensure data minimization – giving access to the smallest
amount of information required for the service outcome or to enable a decision. For
example, sending ‘yes’, ’no’ or ‘not found’ in response to a query of whether a citizen or
user is over 18 or has a valid driving license instead of sending personal information.
 Guidance on good practice when designing and documenting APIs can be found in here:
- UK Government Service Manual
- US White House API standards
 An example of data API documentation for the UK Government Registers is here.

The sample is representative of what will be available for users. Its purpose it to help the Data
Custodian, Data Management Officer, Director of Data and the Federal Data Management Office see
that the data:
- Conforms to the Smart Data Framework standards
- Makes sense as a dataset

4. Continue with data conformance process


Record the format and add the sample to the expanded inventory and add the chosen format to the
appropriate metadata field. This ought to be reviewed by the Data Custodian responsible for the
data and Data Management Officer.

55
5.3 Documenting a permissions model for shared data
This Guidance Note outlines the recommended process for determining who
Purpose
may access a dataset and with what level of access in conformance with the
[DE7] Shared data access permissions specification.

When to use When preparing Confidential or Sensitive data for exchange with another Entity
for the first time. Also when responding to future requests for additional access
permissions.

Responsibility Data Custodian.

Process
For each dataset in the current prioritized batch being catalogued, the responsible Data Custodians
need to ensure that the requirements of the [DE7] Shared data access permissions specification are
met in respect of Confidential and Sensitive Data. The following process is recommended.

Relationship Use Guidance Note 3


Use Guidance Note 5.4 to add
with other parts and 4 to prepare a
4 metadata – including metadata on
of Smart Data prioritised inventory of
permissions
Toolkit the Entity’s datasets

Director of Data

4
Data Oversee and manage the data compliance process of the data
Management custodians and data specialists within each department Establish longer
Officer ensuring each dataset is fully described and quality improved term processes
to review and
respond to
1 2 3 requests for new
Baseline the Shared Data
Data Custodians Check against Document an initial set of
current Access
and Specialists UAE Privacy Shared Data Access
permissions Permissions
Principles Permissions
model

1. Baseline the current permissions model


Your dataset will already have a set of permissions associated with it, even if this is simply current
practice rather than a documented policy. So the starting point is to document the baseline
position around who is permitted to access the data:

 Who currently has access to the dataset within the Entity?


 Are these specific named individuals only, or groups of individuals (e.g. people in a specific
business unit of the Entity, or people in a specific grade or function)
 Are there any restrictions on the purposes for which these people may access the data?
 What kind of access do they have? Is it full access to the dataset, access to specific entries or
records, or ability to edit/update records?

56
 Similarly, are there individuals or groups of people in other organisations who are permitted
access to the data, and on what basis?

2. Check against UAE Privacy Principles


The [DE6] Data protection and privacy specification sets out a set of UAE Privacy Principles which all
Government Entities should seek to apply when managing data that contains personal data or
commercial data. Having documented current practices around granting access permissions to data,
you should check that those practices are compliant with these Privacy Principles. Particular issues to
consider are:

 Does the Entity have the consent of the data subject to share their data with all those people
who currently access it?
 If not, is the dataset covered by sector specific regulations which mean that such consent is
not required?
 Are there controls in place to ensure that people permitted access to the data may only use
it for specified business purposes?
 Is the level of access proportionate to the stated purpose? (For example, if an official has a
business need to check whether an individual is over 18, they should be permitted yes/no
query access to the data rather than being able to see the date of birth of the individual.)

Entities should embed the following UAE Data Privacy Principles in their data management practices,
and in those of third parties contracted to manage data and services on their behalf.
Data Privacy Description
Principles
1. Consent  Personal Data in relation to individuals and Commercial Data in relation to Private
Entities should not be disclosed or shared without the data subject’s consent.
 When providing a service to an individual or a Private Entity, Government Entities
should seek the consent of that data subject for the data to be exchanged with
other Government Entities for the purpose of enabling any Government Entity to
provide services to the data subject without the need for the data subject to
provide the same information again.

2. Transparency  Data subjects should be informed - at the point of data collection - when and by
whom their data is being collected, why it is needed, and how it will be used.

3. Purpose  Data should only be used for limited and explicitly stated purposes and not for
any other purposes without first gaining informed consent from the data subject.
4 Proportionality  When data is requested and stored, the type of data collected should be the
minimum required to carry out the stated purpose, individual users of the data
should only be given the minimum access to that data that they need, and the
data should not be kept for longer than is necessary for that purpose.
5 Personal access  Data subjects should be enabled to:
and control - Access and take copies of data that is held about them
- Correct inaccuracies in data that is held about them
- Request removal of data that is held about them, but is no longer relevant
or applicable to the business of the Entity

6. Security  Collected data should be protected by robust and tested security safeguards
(technical and organizational) against such risks as loss and unauthorized access,
destruction, use, modification or disclosure.

57
 To help achieve this, Government Entities should

- Apply the latest version of the UAE’s information security standard


- Ensure compliance with the Payment Card Industry (PCI) Security Standards
for information systems that store or process credit card data
- Ensure that cloud suppliers to the Government Entity meet ISO/IEC 27017
Cloud Security Standards and ISO/IEC 27018 Handling of Personally
Identifiable Information Standards.
- Notify the Federal Data Management Office in the event of any perceived
conflict between other provisions of this Standard and the standards listed
in the three bullets above..

7. Sectoral  Each sector has its own laws and regulations, some of which relevant to the basis
compliance on which data can be shared with other entities or with public. Examples of these
laws include the United Arab Emirates Penal Code, the Copyrights’ Act, and the
Telecommunications’ Act.
 Entities should ensure they comply with both relevant sectoral regulations and
this Standard, and should notify the Federal Data Management Office in the
event of any perceived conflict.
8. Documentation  Entities should document who is permitted to access each data set, either in the
form of the [DE5] Open Data License (for all Open Data) or through a
documented set of [DE7] Shared Data Access Permissions.
 Entities should produce and maintain privacy metadata in relation to these access
permissions, as part of their broader work on [DE2] Metadata, and store this in
their Data Inventory.

9. Awareness  Entities should develop an awareness programme for their data privacy policy,
which shall be disseminated to all staff within the Entity who manage data (both
from business and technical areas) in order to remind them of the Entity's
obligations and their personal responsibilities concerning data privacy.

10 Accountability  Entities should establish and publicise effective complaints and redress
mechanisms for data subjects who believe they are failing to manage their data in
accordance with the above principles.

3. Document an initial set of Shared Data Access Permissions


Develop a documented set of initial Shared Data Access Permissions. Normally, this will simply
codify the existing data sharing practices that are in place for the dataset - perhaps modified
following the privacy conformance assessment at Step 2. These Shared Data Access Permissions
should cover:

 Who may have access to the shared data. These permissions may be given to either:
- Named individuals
- One or more classes of individuals, such as government employees:
 In a specific professional function (such as finance, HR, operations, IT)
 In a specific grade
 In specific positions (such as Head of Finance)
 In specific Entities, or departments within Entities
 With specific levels of security clearance
- A combination of the two.

58
 What purpose this access is for. This documentation is particularly important to ensure
conformance with the ‘purpose’ and ‘proportionality’ principles of [DE6] Data protection
and privacy and to enable effective auditing.
 The level of access that they may have. These permissions (which may be different for
different data users) should specify whether access to the dataset is permitted as:
- Query-only access
- Read-only access
- Read-write full access.

Key elements of this documentation should then be codified in the metadata for the dataset – see
Guidance Note 5.4: Adding metadata and schema – as the dataset moves to the next stage of the
Data Conformance process.
Government Entities should embed the following Access Permission Principles in their data
management practices,
Access Permission Description
Principles
1. Entities should  Access to shared data shall be approved by the Government Entity responsible for
facilitate cross- that data.
government  However, data ownership does not mean the monopoly of data by any Government
data sharing of Entity, or entitle it to obstruct the reasonable needs of other parties to access that
their data data in pursuit of their legitimate functions. This means that:
- Whenever a Government Entity wishes to use data that is owned and managed
by another Government Entity, the data-owning Government Entity has a duty
to respond rapidly and positively to that request
- Data-owning Entities have a duty to invest in systems and process which
facilitate rapid, effective and secure data-sharing – in particular in respect of
datasets that have been identified as Primary Registries
- Government Entities may not charge other Government Entities to access their
Shared Data.

2. Use of the  Wherever possible, Government Entities should exchange data via the Federal
Federal Electronic Data Platform, or Emirate-level electronic platforms that are securely
Electronic Data inter-connected with the Federal Electronic Data Platform. For Federal Government
Platform Entities use of the Federal Electronic Data Platform is mandatory, and exceptions to
this require prior written approval from the Federal Data Management Office.

3. Data Sharing  Government Entities that share and exchange non-open data with other Entities
Access should document (and record within their [DE2] Metadata):
Permissions - Who may have access to the shared data. These permissions may be given to
should be either:
documented
 Named individuals
 One or more classes of individuals, such as government employees:
 In a specific professional function (such as finance, HR, operations,
IT)
 In a specific grade
 In specific positions (such as Head of Finance)
 In specific Entities, or departments within Entities
 With specific levels of security clearance
 A combination of the two.

59
- What purpose this access is for. This documentation is important to ensure
compliance with the ‘purpose’ and ‘proportionality’ principles of [DE6] Data
protection and privacy and to enable effective auditing.
- The level of access that they may have. These permissions (which may be
different for different data users) should specify whether access to the dataset
is permitted as:
 Query-only access
 Read-only access
 Read-write full access.
 For most data users, query-only access that returns the minimum necessary
information will be sufficient for their business purposes. Access permissions should
therefore be designed to give access to the smallest amount of information required
for the service outcome or to enable a decision. (For example, sending ‘yes’, ’no’ or
‘not found’ in response to a query of whether a citizen or user is over 18 or has a
valid driving license instead of sending personal information.)
4 Access to  Entities should establish systems to ensure that:
shared data - A shared dataset can only be accessed by identified individuals, who have been
should be appropriately authenticated as being permitted such access under the terms of
secured and the Data Sharing Access Permissions
audited
- All Shared Data access via electronic platforms should store an audit log of what
data was accessed, when and by whom.

Mandatory actions

In applying these principles, each Government Enterprise should:

 Develop a detailed Data Sharing Plan, setting out how they will implement the Data Exchange
Principles described in this Standard, including any investments in systems and processes that
they will need. They should share this Plan with the Federal Data Management Office.
 Respond in writing within a reasonable time to requests for data sharing from other Entities,
giving either:
- Agreement to the request, and a clear timetable for implementation
- A refusal of the request, accompanied by a clear rationale for the refusal that is rooted in
the principles of this Standard.

 Notify the Federal Data Management Office of all requests from other Entities for sharing and
exchange of Confidential or Sensitive Data. This includes requests both for access to a dataset
which is currently not shared, and requests to add new individuals or classes of individual to the
Shared Data Access Permissions for a dataset that is already being shared across Entities.
Notification should be made as follows:

- Approved Confidential: For Confidential data sharing requests from other Entities which
the data-owning Entity approves, it may notify the Federal Data Management Office after
the event, for example by giving a quarterly update on all data-sharing initiatives it has
approved.
- Refused Confidential and Sensitive for data sharing requests which the data-owning Entity
proposes to refuse, it should inform the Federal Data Management Office at the same time
as declining the requesting Entity, documenting its rationale for declining the request. The

60
office has the power to issue binding decisions to change data access decisions in the cases
of dispute between Government Entities or with third parties.
- Approved Confidential: Given the extra sensitivity of such data, where a data-owning Entity
believes that sharing Confidential Data with another Entity is in the public interest and
follows the principles of both this Standard and DE6] Data protection and privacy, it should
consult the Federal Data Management Office before giving approval.

Recommended actions
When implementing Access Permission Principle 5 (“Access to shared data should be secured and
audited”), Government Entities are recommended to make this audit functionality openly available
for use by individual data subjects. This means:
 Configuring electronic platforms and supporting business processes so that individual data
subjects (citizens, residents and businesses) can see an audit trail of who accessed their data,
and for which documented purpose (excluding security service or law enforcement access)
Providing mechanisms by which data subjects can raise concerns / escalate if they believe access has
been misused.

4. Establish longer-term processes to modify Shared Data Access Permissions


In addition, the Data Custodian should work with the Data Management Officer to establish longer
term processes to review and respond to new requests in future for data access – for example, from
other business units within the Entity or from other Government Entities who wish to access the
data for new business purposes that were not envisaged within the initial Shared Data Access
Permissions.

The service standard set out in the [DE7] Shared Data Access Permissions specification is that
Government Entities should respond to such requests within rreasonable period, giving either:

 Agreement to the data sharing request, and a clear timetable for implementation
 A refusal of the request, accompanied by a clear rationale for the refusal that is rooted in
the principles of the [DE7] Shared Data Access Permissions specification.
In certain cases, the Specification requires this response to be agreed with the Federal Data
Management Office.

In the early stages of sharing a dataset with other Entities, such responses will be managed on a
case-by-case basis. As the Entity develops more experience of assessing the privacy implications of
data sharing, it will increasingly want to codify that experience into a set of rules that can be
simplified and automated -to speed up access permission management in a risk-based way

61
5.4 Adding metadata and schema
This Guidance Note outlines the recommended process for ensuring that a
Purpose
dataset has all required metadata in accordance with UAE Smart Data
Standard: [DE2] Metadata and [DE3] Schema.

When to use Before a dataset is published as open data or exchanged with other Entities,
the data should be appropriately described so that it is discoverable and users
understand its reliability.
The Data Custodian is accountable for ensuring the dataset contains all
Responsibility
required metadata. The Data Specialist is responsible for providing a data
Schema if applicable.

Process
For each dataset in the current prioritized batch being catalogued, the responsible Data Custodians
and Data Specialists need to ensure the Metadata and Schema and Data Quality requirements are
met. The following process is recommended.

Use Guidance Note 5.5 to develop


Relationship Use Guidance Note 3
a Data Quality Improvement Plan
with other parts and 4 to prepare a
4 for the data – then add any further
of Smart Data prioritised inventory of
quality-related metadata to the
Toolkit the Entity’s datasets
Data Inventory

Director of Data

4
Data Oversee and manage the data compliance process of the data Validate and add to
Management custodians and data specialists within each department expanded Data
Officer ensuring each dataset is fully described and quality improved Inventory

1 2 3
Ensure all Add existing / Add Recommended
Data Custodians Mandatory easy to fill Metadata and a Data Schema
and Specialists Metadata is Recommended for Primary Registries and
added Metadata high priority datasets

1. Ensure all Mandatory Metadata is complete for the dataset – this should include the title,
description, subject, format, size, publisher, custodian, classification, access permissions, license,
coverage (temporal and geospatial) as well as the data files and last updated timestamp.

Metadata field Definition / description Requirement


level

Discoverability Title Brief descriptive name for the dataset. Should Mandatory
communicate subject and scope.

62
Description A description of the dataset. This could provide Mandatory
more detail about what the data contains and what
it’s about, how and why is was collected, any
known errors or limitations. Ideally the description
covers all the relevant context that would be useful
for users to help them decide if this data is fit for
their purpose.

Subject The top level theme or category for the data. For Mandatory
example: health, transport, business, education.
This should be a pre-defined taxonomy or
vocabulary that is common across UAE. It could
have one level or include sub categorizations.

Tags Keywords related to this dataset, such as: ‘schools’, Recommended


‘location’, ‘class sizes’, ‘parking’.

Technical Data files Links to or uploads of the data relevant to this Mandatory
information dataset. Might be in multiple formats (for example
as CSV and Excel).
If providing an API, ensure the API endpoint and
API documentation is linked to or uploaded.

Format Describes the technical format in which the data is Mandatory


currently held (e.g. CSV, GeoJSON) – See [DE1]
Formats for more guidance. May be auto-filled if
publishing in a catalogue depending on ingestion
method.

Size of the dataset Size of the dataset files (in MB, kB, etc.) If using a Mandatory
platform to publish the data, this can be configured
to display automatically.

Schema The schema defines the parameters of each Recommended


attribute in the data. This should describe the
attributes, clarify whether each is required, the
type (string, number, date), the vocabularies used
(if any) and so on. More detail can be found on
creating a schema in [DE3] Data Schema.

Last Updated Timestamp of when this dataset was last updated. Mandatory
If using a platform to publish the data, this can be
configured to display automatically.

Unique identifier Each dataset published or exchanged through an Recommended


(URI) electronic platform or on the web should have a
unique identifier. Ideally this would be a public
identifier (such as a URI).

Source Publisher The name of the Entity that owns the dataset. This Mandatory
should be in the format “Entity, Business Unit”, for
example “TRA, Wireless Networks & Service
Section”.

Custodian Name of the Data Custodian responsible for this Mandatory


dataset.

Contact information The email address or web form that should be used Recommended

63
to contact the Entity for queries, feedback or
requests concerning this dataset

Source system Name of the source system (upstream database) Recommended


this data comes from, if applicable.

Applicability Classification The Data Classification of the dataset, as defined in Mandatory


the Classification Standard.
Should be one of: Open, Confidential, Sensitive or
Secret

Temporal Coverage Indicates the earliest date that the data in this Mandatory
start date dataset relates to.
Should use ISO 8601 date format.

Temporal Coverage Indicates the latest date that the data in this Mandatory
end date dataset relates to.
Should use ISO 8601 date format.

Geographic coverage Region covered by this data – for instance, the Mandatory
name of a city, district, council or country.

Language Language used in the dataset. Should use ISO 639 Mandatory
codes.

Access License Link or copy of the license terms under which the Mandatory
data may be used. By default, this should use [DE4]
Open data licensing for Open Data.

Access Permissions Documentation of who has access to this dataset Mandatory


and the level of access they have, as described in
the [DE7] Shared Data Access Permissions
standard.

Personal data? Does this data contain any personal data? Yes/No. Recommended
Personal data means any information relating to an
identified or identifiable natural person; an
identifiable person is one who can be identified,
directly or indirectly, in particular by reference to
an identifier such as a name, an identification
number, location data, online identifier or to one
or more factors specific to the physical,
physiological, genetic, mental, economic, cultural
or social identity of that person.

Sensitive personal Does this data contain Sensitive personal data? Recommended
data? Yes/No.
Sensitive personal data are personal data that
directly or indirectly reveal an individual's family,
racial or ethnic origin, communal origin, political
opinions, affiliations, religious or philosophical
beliefs, their union membership, criminal record,
health, sexual life, genetic data or biometric data.

Intellectual Property Description of any IP contained in the data and Recommended


conditions/ rights of use and distribution.

64
Reliability Provenance Details of how the data was collected, processed, Recommended
redacted or amended.
Data provenance documents the sources, inputs,
organizations, systems, and processes that have
formed and influenced the data, in effect providing
a historical record of the data and its origins. This
allows data-dependency analysis, awareness of
limitations and coverage, error detection and
auditing. It helps other users (including the Entity)
understand the limitations and level of trust they
can place in the data.
The Entity should aim to standardize its
provenance descriptions over time and provide it in
a machine readable method – for example by using
the World Wide Web Consortium standard for data
provenance.
Example of what to include in methodology:
 Where the data came from (survey, third party
etc.)
 Sample size (if survey)
 Data collection method (face-to-face
interviews, online, requests from authorities)
 Exclusions (what data what not included and
why)
 Statistical aggregation methods used (small
number suppression, averaging, etc.)
The office of National Statistics in the UK regularly
publishes provenance and methodology data which
can be reviewed for a real data example.

Publishing Frequency The rate at which the data in the dataset will be Recommended
updated. Responses should correspond to a value
contained in the Dublin Core Collection Description
Frequency Vocabulary (that describes frequency
periods from “triennial” through to “continuous”).
Updates are expected to be additional data files
following the same schema, but new temporal
coverage (e.g. latest month).
To ensure good practice, the Data Custodian
should:

1. decide on a publication schedule


appropriate to the dataset
2. create a publication calendar for that
dataset
3. establish a specific role or individual with
responsibility for publishing the dataset on
that date

This allows users build reliable processes, tools and


services using the data.

Known issues Description of any known errors or limitations of Recommended


the data. For example if there was unreliable data

65
collection or if particular fields are unvalidated and
rely on the data subject self-reporting.

Data completeness Description of any known gaps in coverage of the Recommended


data. Are there missing geographic areas or time
periods for which there is no data?

2. Add any existing or easy to fill Recommended Metadata fields out of tags, schema, unique ID,
contact information, source system, provenance, publishing frequency, known issues and data
completeness as well as details on whether the data contains personal or sensitive personal data
or intellectual property and associated terms of use.
3. Add all Recommended Metadata and a data Schema, using the relevant standards, for Primary
Registries datasets or high priority (as defined in the Guidance Note 4: Prioritization Criteria and
Process), structured and regularly updated datasets. These might be datasets needed for cross-
entity service delivery projects or which have been frequently requested by users and deliver on
strategic objectives.

Establishing a schema allows for automatic validation of datasets, as well as making it


significantly easier for third parties to build tooling around data and re-use it.

SQL databases will already have a schema in place, although Entities may want to ensure these
are fit for purpose by modelling the data to be stored and deciding on the relationships,
vocabularies, validation and range(s) to be applied.

Structured non-tabular (e.g. JSON) data should provide a schema in JSON Schema format
according to the specification here: http://json-schema.org/.
Tabular (e.g. CSV) data should be expressed as a JSON Table Schema according to the open
specification here: https://frictionlessdata.io/specs/table-schema/.
A human-readable version of a schema looks like:

A machine-readable version in JSON of this schema looks like:

66
Source: http://csvlint.io/schemas/530b16c163737676e9260000

Having completed this process, proceed to assess whether the dataset meets the requirements of
the Data Quality Standard, following the process described below in Guidance Note 5.3 Managing
data quality. If Guidance Notes 5.1, 5.2, 5.3 and 5.4 have been followed properly, then all the
mandatory quality requirements will already now be met. But there may be additional steps you
take in improving data quality which will generate additional metadata requirements.

5.5 Managing data quality


Purpose This Guidance Note provides Entities with guidance on the process and steps
for improving and managing data quality over time in a structured, prioritised
and appropriate to the intended and potential use of the Entity’s data assets. It
covers both:
 initial steps to meet minimum quality requirements ahead of initial
publication of open data or initial exchange of shared data
 longer term actions to drive forward data quality
When to use At the start of each Entity’s Smart Data program
Responsibility  Data Custodians at the level of individual datasets
 Data Management Officer for Entity as a whole

67
The Entity-level context for data quality
Managing data quality should be a strategic priority for the Entity as a whole. Having discoverable,
reliable, trusted and well managed data enables the Entity to be more efficient, effective and
accountable. It allows for the automation of processes as well as enabling better service delivery and
decision making. The [DQ3] Data Quality Improvement Plan specification requires all Government
Entities to develop a Data Quality Plan. Guidance on how to do this at an Entity-wide level is given
in:
 Guidance Note 1: Establishing data governance roles and processes , which gives advice on
specific roles which will have key responsibilities for data quality and suggest there needs to
be at minimum:
- A full time dedicated role of a Data Management Officer or equivalent who day-to-day
manages and tracks the quality of the Entity’s data
- Data Custodians and Data Specialists who own and are responsible for the data quality
of the data assets they manage, undertaking data quality assessments and data
cleansing
- A Director of Data to set the strategic direction and requirements for data quality
management across the Entity
 Guidance Note 2: Building a smart data roadmap , which gives advice on how to embed data
quality within a broader roadmap for managing and improving data

Managing the quality of an individual dataset


The [DQ1] Data Quality Principles makes clear that data quality should be ‘appropriate for purpose’.
This means that the requirements for quality in respect of a specific dataset depend on the current
and potential use of that dataset – not all data needs to be at the highest quality levels. Therefore, it
is recommended that before assessing and planning data quality improvements, the Entity has a
good understanding of its data assets and their function and importance for the Entity as well as
other potential users. Use Guidance Note 3: Developing a Data Inventory and Guidance Note 4:
Prioritization criteria and process to do this.

The Data Quality Principles are listed in full below.


UAE Data Quality Description
Principles
Ownership and  The data is managed by an accountable Data Custodian, who is responsible for the
authority quality of the data and ensuring it meets user needs.
 The data being published or exchanged is not a copy, but is made accessible at source.
Users are accessing and re-using the original data, thus reducing duplication and errors.
 Entities should identify duplicate versions of data and designate a specific owner and
authoritative source of that data. Further, they should ensure other functions re-use this
authoritative and maintained source directly, for example, via an API or downloading a
copy from where the data lives as needed and not redistributing their copy.
Accessibility  The data is easy to find and use, because it:
- Has comprehensive metadata for discoverability
- Uses appropriate open machine readable formats (reducing the requirement to buy

68
specific proprietary software and ensuring it is interoperable with other data) and,
- Is made available for bulk download or via an API either on the web or through a
platform with reliable lasting permanent access that is supported over time.
Accuracy  The data is sufficiently accurate for its intended use and any gaps, known limitations,
approximations or errors are clearly described so that re-users understand the
limitations of the data.
 Users both inside and outside the Entity should have a way to communicate their
requirements for greater accuracy and have those acted on by the Data Custodian /
Data Specialist responsible.
Descriptiveness  Data has context to that potential re-users know what is in the data and how reliable it
is so that they can effectively judge whether it is fit for their purpose.
 This means all datasets should have associated metadata and ideally a schema
specifying the ranges and values of each field.
 Re-users should be able to understand how the data was created and processed, it’s
temporal and geographic coverage, granularity and limitations.
Timeliness  Data is published or made accessible in real-time or soon after the data has been
generated. The data being published for re-use as open data or exchanged with other
Entities should be the same data as that being used for its intended purpose within the
data generating Entity.
 If it is regularly updating data (such as a monthly report) the update schedule should be
clear in the metadata and should be reasonably followed closely to ensure re-users can
rely and trust this data for operational needs and decision making purposes.
Completeness  The data should make sense as a complete dataset. It should be usable without
requiring other data (other than Primary Registries data) to make sense or use of it. This
means data should be published or exchanged as datasets which are comprehensive and
relevant missing records should be flagged.
Validation  The data should be valid and effort made to ensure it is accurate and reliable over time.
 This means for core and frequently updated data:
- Using a schema
- Having a clear data model with unique identifiers for the main objects in the
data (for example National ID for citizens)
- Regularly cleaning and testing the data to remove errors or duplicates.

69
Not all of these Principles will need to be applied in full to every dataset in order for data to be
appropriate for purpose. Data quality may be appropriate for current purposes even if one or more of
the principles is overlooked. (For example, data may be collected with a lower accuracy level to provide
data in a timely manner if time is a priority.) This means that these Principles should be balanced
against the importance and intended use of the relevant data.

However, some quality characteristics are essential in order to enable effective data publication and
exchange. These characteristics have been built in as mandatory elements of the Data Exchange
Standards, as summarised in the table below.

Core quality requirements Detailed standard which sets out mandatory


requirements in this area
Ensuring data is published or exchanged in appropriate [DE1] Data Formats
formats
Recording and publishing mandatory metadata: the title, [DE2] Metadata
description, subject, format, size, publisher, custodian,
classification, access permissions, license, coverage
(temporal and geospatial) and last updated timestamp
Publishing and validating against a data schema for high [DE3] Schema
value, structured and regularly updated datasets as well as
any primary registries the Entity holds

Once roles are in place and there is a prioritized inventory of the Entity’s data assets, the assigned
Data Custodians should first assess the state of data quality, then assess the quality level required by
data users and then make a plan to close the gap between the two over time, as illustrated below.

Relationship Use Guidance Note 1 to Use Guidance Note 3 and


with other parts ensure the Entity has 4 to prepare a prioritised
4
of Smart Data roles and responsibilities inventory of the Entity’s
Toolkit for data quality in place datasets

Director of Data

6
Data Oversee and support the data quality audit, user needs gathering and
Management quality improvement plan. Ensure plans are ambitious, well designed and
Officer achievable, as well as tracking and reviewing progress. General data
quality
maintenance of
1 2 3 4 5 core data
Define Create Quality Report and functions
Data Custodians Perform a Gather input
and Specialists required Improvement track against
data audit from users
quality level Plan targets

70
1. Assess current data quality by performing a data audit

Use the Data Quality Maturity Matrix provided in Appendix B to assessing the dataset against the
UAE Data Quality Principles, looking in turn at:
- Does the data have a clear owner? Is this the authoritative source of data?
- How accessible is the data?
- How accurate is the data?
- How well described is the data?
- Is the data up to date and has a publishing schedule?
- Is the data complete? Can it be used and make sense by itself?
- Has the data been validated against a schema or checked for duplication, errors, and
inaccuracies?

The Data Quality Maturity Matrix defines, for each of the seven Data Quality Principles, five levels of
maturity:
 Level 1: Initial – unmanaged data, no owner, no open format, no metadata etc
 Level 2: Partially conformant – the dataset has an identified owner and is making progress
towards conformance with the Data Quality Standard
 Level 3: Conformant – the dataset meets all core requirements of data quality and UAE Data
Standards
 Level 4: Improving – the dataset meets all core requirements and also is implementing
additional good practices
 Level 5: Optimizing – data quality fully meets the needs of current and potential future users,
with clear systems for driving continuous improvement.

Ownership and authority


Accessibility
Accuracy

Descriptiveness
Timeliness
Completeness
Validation

The detailed tool for use when completing this matrix is at Appendix B of the UAE Smart Data
Implementation Guide.

2. Collect input from existing and potential users


Run consultations and workshops with existing internal and external users of data. Ask whether the
data is fit for their purpose, whether they have any problems or concerns with the data and whether
they have any requests around data quality. Record and group the results by most
voices/concerns/requests.

It is also important to assess the needs of potential users who may not yet have access to the data or
would use the data if it was of a higher quality or reliability. Therefore, it is recommended to invite
input and feedback from potential future users. The Entity should publish its Data Inventory on its

71
electronic portal, including datasets that have not yet been prepared for publication and exchange,
in order to give potential users visibility of its data assets. It should also provide online channels for
data users to give feedback on their priorities for expanding the number of datasets that are
available on the portal and improving the quality of existing open data.

3. Define and determine required data quality per dataset or data source
Using the feedback from users, define what good data quality looks like for the dataset. Develop a
documented statement of Data Quality Requirements – including what the appropriate target level
should be for each element of the Data Quality Maturity Matrix provided in Appendix B. Record any
specific measures or indicators which are key to ensuring the data is reliable and fit for purpose for
the majority of users.

4. Create a plan to close the gap between existing and required quality level
Create a plan to reach defined quality level. This may require new processes, change management,
upskilling and training, better tools or other steps. Ensure your plan is ambitious, but realistic.
Milestones should be specific, measurable and time bound with clarity on who is responsible for the
milestone being achieved and how this will be measured and tracked.
Data Specialist and the Entity’s IT and security teams should create automated tracking for
quantitative measures such as % of metadata complete, use of open machine-readable formats, % of
publishing frequency dates realized, whether datasets have a schema, results of data validation
against schema, results of scripts to check duplicate records or unconformity data entries etc.
Data Custodians should also track qualitative measures such as feedback from users and impact of
use of data.

 Embedding the [DQ1] Data Quality Principles across all data within an Entity will require a
phased, prioritised and Entity-wide plan of action.
 Development and delivery of such a plan is a mandatory requirement for Government Entities.
 An effective Data Quality Improvement Plan will be prioritized, baselined, user-focused, SMART,
managed and reported as described in the table below.

An effective Description
Data Quality
Improvement
Plan is
Prioritised  The Plan should focus first on driving up the quality of data needed for Primary
Registries, the Entity’s own core business functions, and other high priority datasets.
 Within these priority areas, it should focus first on fixing known quality issues – and in
particular focused on ensuring that the Core Quality Standards are met.
Baselined  The Entity should ensure that its plans are informed by Data Quality Audits that give a
clear assessment of current performance against the [DQ1] Data Quality Principles.
 These should include quantitative and qualitative measures of data quality, including
both use of the [DQ2] Data Quality Maturity Matrix and additional measures that are
relevant to the specific dataset.
User-focused  For all priority datasets, the Entity should develop clear statements of Data Quality
Requirements. These should be evidence-based and reflect the documented quality
needs of users.
 In developing these user requirements, the Entity should engage with existing internal
and external data users – but also consider the wider potential re-use of their data
(either as open data or shared data exchanged with other Entities).
 These Data Quality Requirements should specify and define required data quality

72
measures for their different types of data sources and business processes, aligned with
the [DQ1] Data Quality Principles.
SMART  For each priority dataset, the Entity should:
- Identify the gaps between the current baseline perfromance level as revealed in
the Data Quality Audit and the data quality requirement as expressed by users
- Set quantitative and/or qualitative targets for improvement. Targets should be
SMART (Specific, Measurable, Achievable, Relevant and Time-bound)
 A on-size-fits-all approach to data quality targets across the Entity is not
recommended: rather these should be related to the current and potential use of the
specific dataset, to ensure the quality is appropriate for that use.
Managed  The Entity should set out an overall Entity-wide plan for how it will deliver its targets
for quality improvement. This should include:
- Establishing clear accountabilities for data quality, at the Entity-wide level and for
each dataset
- Establishing systems and processes that guarantee data quality as part of the
normal business activity of the Entity
- Building data quality requirements into any contracts and outsourcing of data
management or data generation
- Assessing data quality of third party suppliers (which could include another
government Entity, business partner, customer, service provider or other
stakeholder), and performing spot checks (ideally against Service Level
Agreements with the data supplier).
Reported  Entities should establish systems to track and report on data quality status, with the
Entity’s Management Board receiving regular progress reports (for example, on a
quarterly basis) showing progress across the Entity as a whole and by individual
business units.
 Ideally, elements of this reporting will be automated and managed in real-time, for
example through:
- automated reports on data quality indicators (such as completeness of metadata
fields, use of schemas, success rate of validation, use of open machine readable
formats, update frequency % met, etc)
- Regular analysis of structured data against its data schema.

5. Report and track data quality against targets


Data quality should be tracked and assessed against required level as well as being reviewed
regularly (at least annually). User requirements may change over time and the Entity should respond
to this.

6. General data quality maintenance


Outside of meeting specific user-needs-based data quality requirements, Entities should ensure the
overall health of their data. This means regularly reviewing and improving how data is modelled, use
of schemas and data entry validation processes, use of unique identifiers, links and use of reference
data (vs. introducing un-authoritative copies), instances of record duplication, errors or uncompliant
data entries (for example mis-spellings or different date/telephone formats being used for the same
data) and so on.
Entities should decide on their own processes for managing this. An example end-to-end data
cleansing process is detailed below. Typically this should be seen as an iterative process that should
be repeated to improve and maintain data quality as business and technical requirements change.

73
Data-cleansing step Description
1. Extract data from Data profiling tools perform complex analysis on data, and to perform this
operational data analysis directly against live data sources is not recommended. Data extraction
sources for profiling may be performed using separate ETL tools, or may be a capability of the data
profiling tools themselves.
2. Perform data This shall occur as part of a regular data audit process, enabling data quality
profiling analysis issues to be identified. The output of data profiling shall be used to build the
technical knowledge base for data cleansing.
3. Build cleansing The cleansing knowledge base includes mappings and correction rules that may
knowledge base for be automatically applied. For example, the range of mobile phone formats
each data profile identified by data profiling may include (nnn) nnn nnnn, +nnn nnnnnnn, nnn
nnn-nnnn. The knowledge base should include the rules for converting these
formats into a single format.

A knowledge base may include the ability query external data services, such as
telephone number validation, reference data management systems, and data
enriching systems, such as an Emirates ID service to provide more Citizen profile
data.

Physically, the knowledge base may be one or more systems, and may include
master data management tools, reference data management tools, and vendor
specific data cleansing solutions.
4. Automated Automated cleansing may be performed in batch against live systems, typically
cleansing using out of hours, and subject to sufficient testing. The size of the batch chosen
knowledge base should be determined by the smallest batch of data that can reasonably be
completed within the time window allowed.

The choice of records that form part of the each cleansed batch shall be
defined, for example, through order of insertion, age based (newest/oldest)
first, or most active records first.

Automated cleansing can also be applied to data extracts; however, the plan to
refresh the live data with the cleansed data should be considered carefully to
avoid conflicts where the live data has since changed.
5. Interactive data Automatic matching will reject data that cannot be cleansed. The Data
cleansing Custodian shall use this rejected data to perform manual cleansing. The
recording of cleansing decisions should be fed back into the knowledge base to
improve the quality of automated matching. This iterative cycle will initially
occur often during the development of the knowledge base.
6. Automated Automated cleansing services can then be delivered as interactive services,
cleansing services allowing information systems to have data validated and cleansed at the point
of data entry. For example, a CRM system for capturing a citizen's name and
address may make a service request to the automated cleansing service to
enrich the address, validate the telephone number, and match the individual
citizen with their other records stored in datasets elsewhere within the Entity.

74
5.6 Validation and publication of data
Purpose This Guidance Note provides Entities with guidance on the process and steps
for validating and then publishing datasets that have been prepared for data
conformance using Guidance Notes 5.1 to 5.5.
When to use Before publication of Open Data or exchange of Shared Data
Responsibility Data Management Officer, reporting to the Director of Data who has overall
accountability for conformane with UAE Data Exchange Standard.

Once the relevant Data Custodian has taken a dataset through the process described in Guidance
Notes 5.1 to 5.5 (that is: classify; format; document the permissions model; add metadata and
develop schema; manage data quality), the Data Management Officer and Director of Data will need
to validate and approve the dataset either for publication or for sharing and exchange with other
government entities over appropriate electronic networks.

First, the Data Custodian / Specialist should provide all of the relevant information (on classification,
format, metadata and quality) to the Data Management Officer in a single ‘conformant dataset’ with
the conformant data sample file, the metadata, and the details of the business processes needed to
support quality publication (what needs doing, who is responsible, timelines).
This complete dataset should be reviewed by the Data Management Officer and added to an Entity-
wide Data Inventory. The Data Management Officer should check that each dataset:
Has a classification. In cases where the classification is not Open, then a clear rationale for 
this is documented and an Open derivative dataset provided
 Has a data quality assessment report
 Has a sample dataset in the appropriate format
 Contains all Mandatory Metadata as defined in the [DE2] Metadata specification
 Has easy-to-add and appropriate Recommended Metadata
 That the above have been provided by a qualified person familiar with the data
The Director of Data should then satisfy themselves that each dataset is conformant to the Smart
Data Framework standards. In most cases, this decision will be taken within the Entity itself by the
Director of Data.
In some cases, however, the UAE Smart Data Standards require that the approval of the Federal Data
Management Office is given ahead of publication or exchange. In particular, the consent of the
Office will need to be given in advance of publication for:
a) Any Open data which the Entity believes it should charge users a fee to access, despite the
general principle that open data will be published as Open Data
b) Any data that Entities wish to exchange which has been classified as Confidential.

A summary of these approval requirements is given below.

75
Approval to publish Publication platform

By the Federal Government Entity, once


Open Data the Director of Data has validated
compliance with UAE Data Standards Entity’s open
Public data
data Public Data the platform
By the Federal Data Management Office,
Entity wishes to as required by the [DE] Commercialisation
charge for and fair trading standard

By the Federal Government Entity, once


Restricted Data the Director of Data has validated
compliance with UAE Data Standards
Shared Government
data Service Bus
By the Federal Data Management Office,
Confidential Data as required by the [DE] Shared Data
Access Permissions standard

Once satisfied that the relevant approvals are in place, the Director of Data should communicate to
the relevant teams that the compliant datasets can be published as open data or exchanged with
external Entities. As illustrated above, publication to the smart data electronic platform will be
through one of two routes depending on the type of data being published.
For Open data, compliant datsets should be published through the data-owning Entity’s online
portal. By default, this should be as Open Data - except in exceptional circumstances where an
access fee is charged following the approvals process illustrated above and in conformance with the
[DE5] Data commercialization and fair trading specification. The portal should:
 Include a full list of datasets on the Entity’s Data Inventory (including as yet unpublished
data, in order to facilitate feedback on future publication priorities from data users)
 For Open Data, the portal should:
- Provide a clear and user-friendly Open Data Licence, which complies with the [DE4]
Open Data Licensing specification giving a clear and unambiguous license to use and
distribute including for commercial purposes. (The UAE Federal Open Data License
at Appendix A is recommended as the ideal way of meeting this requirement.)
- Enable anonyomous access to the data, without requiring users to register any
personal details or fill out forms
- Not charge any access fees
- Not discriminate between types of user
- Provide data in open, machine-readable formats that comply with the [DE1] Data
Formats specification, or enable direct API access to the dataset
- Provide bulk download functionality for data and gurantee a level of permanence by
not breaking URIs and ensuring original URIs redirect if the dataset location (URL)
changes.
- Provide metadata that complies with the [DE2] Metadata specification
- Provide measures of data quality, including use of the [DQ2] Data Quality Maturity
Matrix

76
- Provide online mechanisms for users to give feedback about data quality and to
express their views on future priorities for expanding the number of datasets that
are available on the portal and improving the quality of existing open data
 Or, in exceptional cases where the Federal Data Management Office have approved the
Entity to charge an access fee for Open Data, the Entity should:
- Ensure that the rationale for charging, and the principles that the Entity applies to
ensure fair and competitive provision in line with the requirements of the [DE5]
Commercialization and fair trading specification, are published clearly on the
website
- Provide access to effective complaints and redress mechanisms, again in
conformance with the [DE5] Commercialization and fair trading specification.

For Shared Data, the Entity should use the Government Service Bus as the key platform for
exchanging its data with other Entities. No charge is required for Entities to integrate with the
Government Service Bus. For Shared Datasets that have been identified as Primary Registries, the
data should be made available:
 Via API
 Under the terms of a Service Level Agreement setting out the expectations that data users
can have in terms of the Entity’s commitment to the quality of the data
 With privacy ‘designed-in’. This means that the API should give access to the smallest
amount of information required for the service outcome or to enable a decision. (For
example, sending ‘yes’, ’no’ or ‘not found’ in response to a query of whether a citizen or user
is over 18 or has a valid driving license instead of sending personal information.)

Charging data users to access raw Open Data


 Open Data which a Government Entity collects and manages in the course of its normal duties
should be published as Open Data with no access fee
 Where there is demand from data users (whether from other Government Entities, the private
sector or individual citizens) for access to data that the Government Entity does not currently
collect and that would require significant additional action and investment by the Government
Entity, then there may be a case for charging fees to data users in order to help finance this
investment.
 For example, there may be cases when a Government Entity wishes to develop a public-private-
partnership (PPP) funding model, aimed at ensuring up-front investment in the infrastructure
needed to enable the collection and dissemination of richer, smarter, more real-time data that is
relevant to the purposes of the Government Entity, but which otherwise could not be obtained
by the Entity. In such cases, business models might involve:
- Allowing the PPP to charge end users for data to deliver a revenue stream into the PPP on a
time-limited basis before the data is eventually made fully Open
- Providing the PPP with exclusive rights to utilise the data commercially in ways other than
direct charging of end users, again on a time-limited basis before the data is eventually
made fully open.
 Such cases will be exceptional, not routine, and should be approved in advance by the Federal
Data Management Office.

77
 Approval to charge for access to raw Open Data will only be given when this is clearly in the
public interest, and where it is not feasible for the Government Entity to collect and publish the
data without charging access fees.
 In making this determination, the Federal Data Management Office will take into account that all
Government Entities are expected - as part of their routine operations and investment planning
– to continually improve the methods, quality and timeliness of the data they collect without
seeking to charge data users for this.
 Whenever, in such exceptional cases, a Government Entity does charge fees for access to raw
Open Data it should follow these principles:
Raw Data Description
Commercialization
Principles
1. Public interest  Charged-for Open Data should be accompanied by a clear published
explanation on the Government Entity’s electronic portal of why such charging
furthers the goals of UAE Smart Government and is in the public interest.

2. Fair pricing  Open Data should be available to all users on a fair, reasonable and non-
and discriminatory basis.
conditions
3. Account-  The Entity should establish and publicise effective complaints and redress
ability mechanisms for third parties who believe that it is failing to comply with the
above principles.

Developing and marketing commercial value-added data services


 In general, the Government believes that the private sector is best placed to create commercial
data services, and will not seek to do so itself.
 There may be cases however where a Government Entity may provide such commercial services
in the public interest – for example, in order to demonstrate the commercial opportunity, as
part of its efforts to foster the market for re-use of its data.
 Such cases should be approved in advance by the Federal Data Management Office, following
receipt from the Government Entity of an evidence-based submission showing how the
proposed service complies with the principles set out below.
 Whenever, in such exceptional cases, a Government Entity does develop and market value-
added data services for commercial gain, it should publish and follow these principles.

Value-added Data Description


Commercialization
Principles
1. Public  The provision of commercial data services should be accompanied by a clear
interest published explanation on the Government Entity’s electronic portal of why such
commercial services further the goals of UAE Smart Government and are in the
public interest.

2. Fair  The Government Entity should ensure that it does not have an unfair advantage
competition over third parties who might also wish to market similar services. In particular, this
means:
- Publication of the underlying Open Data: the Government Entity should
publish as Open Data on its electronic portal the underlying Open Data that it
is using to create value-added services. This publication should be

78
undertaken at the same time as, or before, launch of the Government Entity’s
value-added data service.
- No use of Shared Data: for the purposes of developing a value-added data
service, the Government Entity should only use data that has been classified
as Open Data.
- No use of public funds: in order that Private Entities may compete on a fair
basis in the provision of commercial services using Open Data, the
Government Entity should set fees for any value-added data services in ways
that at least recover the full costs of providing those services, including a
reasonable return on investment, and should ensure that the provision of
these value-added data services is not based upon anti-competitive support
or funding from other Government Entities.

3. Fair pricing  The Government Entity should make the value-added data services available to all
and users on a fair, reasonable and non-discriminatory basis
conditions
4. Account-  The Entity should establish and publicise effective complaints and redress
ability mechanisms for third parties who believe that it is failing to comply with the above
principles.

79
APPENDIX A: UAE FEDERAL OPEN DATA LICENSE
The Federal Open Data License is shown below in two forms:

 A user-friendly, plain language summary for publication on the web pages of Open Data
Portals
 The detailed License terms which support this and which should be available for download
from Open Data Portals and linked from the summary.

Federal open data license: summary

This is a summary of (and not a substitute for) the license that applies to Your use of Information
accessed via [name of Open Data Portal] ("License"). A copy of that license may be accessed
<here>.

1. Overview

We grant you a worldwide, royalty-free, perpetual, non-exclusive license to use and re-use the
Information that is available under this license freely and flexibly, subject to the conditions below.

2. You are free to:

 copy, reproduce and communicate to the public the Information in any format

 adapt or modify the Information

 exploit the Information for both commercial and non-commercial purposes

 permit third parties to use the Information

3. You must comply with the following terms:

! not use the Information in any way that is unlawful and/or misleading to the general
public

! Include the name or identification of the author and retain any copyright notice featured
in the original material

! Where possible, include a URL or hyperlink to the Licensed Material


These are important conditions of the License and if you fail to comply with them the rights
granted to you under this license may be withdrawn.

4. Exemptions:
The License does not permit the use of:
X any trademarks associated with a Database or with the Open Data Platform

X any images (including logos, graphics or photographs) which appear in a Database

80
Federal open data license: full text
When you access and use the Information, you accept and agree to be bound by the terms and
conditions of this License in connection with your use of the information provided by the License
issuer.

Article (1) Definitions:

Federal Government Entity Means any Ministry, authority, department, public body,
independent body, public institution, federal government council
or any other governmental or public institution of the federal
government of the United Arab Emirates;

license Means the general license agreement as updated or amended by


the license grantor; the legal terms under which the original
materials are made available to disclose;

Original Materials Means all the contents of any database (or any part thereof) that
includes any data, content, work products or other materials that
have been collected in the database and made available to
disclose by the license grantor to users under the terms of this
general license; or derived materials;

Modified Materials Means any work in any medium (whether currently produced or
to be created in the future) created by entity or created by any
other recipient that incorporates or uses any original information
or material either alone or in conjunction with materials from
another source and as an independent product;

Derivatives Materials Means any work in any medium (Currently known or to be


created in future) created by entity or created by any other
recipient incorporating, using or quoting any original information
or material that is subject to copyright and similar rights in which
the copyright, Conversion or other modification of the original
articles in such a manner as to require authorization under
copyright and similar rights;

You or the conscience of the Means the individual or entity using the original material to
addressee develop modified or derivative materials under this general
license;

License grantor Means the person/persons or federal entity/entities that granting


rights under this general license;

Copyright and disseminating Means the rights granted by copyright and / or similar or related
rights closely related to copyright including performance,
broadcasting and recording of sound and rights in databases or

81
literary collections;

The Use Means copying, reproducing, and making copies or disclosing


material that you do through a medium or process requiring
permission under this License, communicating with various users,
modifying, adjusting and preparing modified or derivative works,
and any other work that is Confidential or may become
confidential in the future under copyright whether in original or
other material

The User Means any user of information other than you;

User License Means the license you apply to the user of the modified materials
or derivative materials in accordance with the terms and
conditions of this general license;

Participation Means the disclosing of material to the public through any means
or process requiring permission under this license, such as
copying, public disclosing, public performance, distribution,
dissemination, media or import, and making material available to
the public, including by means of access to materials In the place
and time of their choice;

Exceptions and Limitations Means any exclusion or other restriction on copyright and similar
rights applied to your use of the Original Materials;

Article (2) Scope of license rights:


2.1 Grant of license

2.1.1 Take into consideration the terms and conditions of this General License,

2.1.2 The license grantor grants you a free and non-exclusive license to download
information over the Internet and technology media

2.1.3 The License grantor authorizes you to exercise the licensed rights in all media and
formats currently produced and known or to be created later, and to apply the
necessary technical modifications on.

2.1.4 This General License may not be sublicensed and irrevocable for the exercise of rights
under this License in order to copy and share original materials, in whole or in part, or
to produce, reproduce and share modified or derivative materials.

2.2 The License grantor waives and/or agrees not to endorse any right or authority or to prohibit any
entity or individual from making the technical modifications necessary to exercise the licensed
privileges, including necessary technical modifications and performing any authorized variations
in this section and does not result in any modified or derivative materials.

Article (3) Terms and Conditions of Use

82
3.1 Terms and conditions of Use:

3.1.1 You must ensure use is not contrary to UAE or international laws.

3.1.3 Reference should be made to the source of the original material

3.2 You may allow users to use the original materials and, if you do so, you must comply with the
terms of this license and you are prohibited from displaying or imposing any additional or
different terms or conditions on the use of that information by any other user.

3.3 This license does not cover the use of:

3.3.1 Personal data within information such as identity documents such as passport number
or national identity;

3.3.2 Information that not disseminated and not disclosed with the consent of the license
grantor;

3.3.3 Rights of third party that license grantor have no authority to disclose;

3.3.4 Any images (including logos, drawings or photographs) that disclose within the original
materials;

3.3.5 Information subject to other intellectual property rights, including patents, trademarks
and design rights

Article (4) the terms of the license


Your employment of the licensed rights is subject to the following conditions:

4.1 Attribution:

4.1.1 The user may share the original materials by considering the following:

4.1.2 Retain any copyright notice contained in the original material;

4.1.3 Inclusion of an electronic link or hyperlink to the original material in reasonable form,

4.2 Sharing of modified and derivatives materials:

4.2.1 This license grant users the right to redistribute, modify, change, and quote from your
materials whether for commercial or non-commercial purposes as long as they
associate/attribute your original material to your name.

4.2.2 This license grant users the right to modify, improve, and create new derivative
materials from previously modified or derivative materials, whether for commercial or
non-commercial purposes, as long as they authorize their new derivative works with
the user's license under the terms of this license.

4.2.3 You must comply with the requirements stated in Section 3 and include them in the
User License if the contents of the entire database or portion of the original material
are shared.

83
Article (5) Disclaimer of warranties and limitation of liability
5.1 The License grantor is not responsible for any damage or misuse suffered by third parties as a
result of the use of such data and does not guarantee the continuity of the availability of such
data or part thereof, nor shall it be liable to users of such data and any damage or loss they may
suffer due to reuse.

5.2 It is prohibited to sell or resell any original information have been used in accordance with this
License for any fee or amount of money or for any form of compensation or reimbursement.

Article (6) - Duration and Termination


6.1 This general license shall apply throughout the period of copyright and similar rights licensed
therein. If you do not fully adhere to the terms of this license, your right to use the original
materials under this license shall be revoked and the terms of this license will remain in force
notwithstanding such cancellation.

6.2 The license grantor has rights to disclose the original materials under separate terms or
conditions or discontinue the disclosing of the original materials at any time; however, the
terms of this license shall remain in force notwithstanding such cancellation.

6.3 The terms of the License shall remain in force after termination of this General License.

Article (7) - Other terms and conditions


7.1 The license grantor shall not be bounded by any additional or dissimilar terms or conditions
unless explicitly agreed to do.

7.2 Any arrangements, considerations or agreements relevant to the original materials not
mentioned in this License shall be considered separate and independent from the terms and
conditions of the General License.

7.3 This General License shall not be construed as derogation, restriction, prohibition or imposition
of conditions on any use of the original materials which may be made legally without
permission under this General License.

7.4 It is not permitted to disclaimer of any condition or provision in this General License and
neglecting compliance to unless expressly agreed to by the license grantor

7.5 Nothing in this General License shall constitute or be construed as a restriction or waiver of any
privileges or immunities applicable to the license grantor or to you, including immunity from
legal procedure in any jurisdiction or authority.

7.6 This License is governed by and construed in accordance with the laws of the United Arab
Emirates

End of license

84
APPENDIX B: DATA QUALITY MATURITY MATRIX – ASSESSMENT TOOL
Quality
Principle 1 = Initial 2 = partially conformant 3 = Conformant 4 = Improving 5 = Optimizing

Ownership and The dataset has no clear A named Data Custodian takes As at Level 2. In addition: As at Level 3. In addition: As at Level 4. In addition:
authority accountable owner within the personal responsibility for the - The Data Custodian has - Feedback mechanisms - There is clear evidence
Entity. Multiple data users quality of the data. engaged with current and have been established to that effective processes
keep and manage duplicate The Data Custodian has potential future users of allow data users to request are in place to enable
versions of the data. undertaken a baseline the data to understand quality improvements user-driven continuous
assessment of current data and document their Data - In the case of a Primary improvement.
quality, documenting known Quality Requirements, and Registry, the dataset is - In the case of a Primary
quality issues. is managing a plan to close now widely used as the Registry, the dataset is
any gaps between current single authoritative source now accompanied by clear
and required quality levels of data. There are no Service Level Agreements
- For a data set used by duplicate versions of the for data users
multiple organizations, data managed elsewhere.
systems and processes
have been established to
ensure that it can be
managed as a Primary
Registry (ie able to provide
the data as a service to all
relevant users).

Accessibility4 The data is inaccessible by third The data is at least one of: The data is accessible through As at Level 3. In addition, As Level 4. In addition, the
parties because it is: - Published on the web or both: - Published data is available dataset is linked to other
- Not published on the web via an API - Publication on the web or for bulk download relevant data to provide
or currently shared with - Available to external users via an API - The data uses URIs / URLs context.
other Entities. in an open machine- - Publication in an open to enable others easily to
readable format. machine-readable format. link their data to it.

Accuracy Accuracy issues in the dataset There are significant accuracy Known accuracy problems are As at Level 3. In addition, the Data accuracy is fit for purpose

4
The maturity levels for accessibility draw on the Five Star deployment scheme for open data developed by Sir Tim Berners-Lee, but expanded to cover shared data as well as open data. An open data set scoring 1
to 5 on the Five Star model would score the same on the accessibility dimension of the UAE Data Quality Maturity Model.
(errors, gaps, limitations) are problems with the data, but documented and explained to Entity is actively reaching out to for both existing and potential
either unknown, or known to these are documented and data users. Accuracy level is potential data users to uses of the data, based on
be very significant. explained to data users. adequate for current use or understand how accuracy clear, documented user-
purpose. improvements could support research and feedback.
new use cases for the data.

Descriptiveness The dataset has no metadata or The dataset has some The dataset has all mandatory The dataset has all mandatory As at Level 4. In addition, the
schema. metadata or a schema metadata. metadata and it has a schema. dataset has all the additional
describing the data. In the case of a Primary recommended metadata.
Registry, it has a schema.

Time- For The dataset is out of date. The dataset is regularly As at Level 2. In addition, the As at Level 3. In addition, As at Level 4. In addition, data
liness updated updated, on a timescale that dataset has a publishing guarantees exist that the most updates are managed in real-
datasets meets the needs of current schedule which is being met in up to date data will be available time, with publication or
users. practice and which is included in future over a specified exchange occurring at the same
in metadata for publishing period. time for internal data users and
frequency. external data re-users.

For one- The dataset is out of date, to The dataset is out of date, but The dataset is recent enough to [No Level 3 for one-off [No Level 4 for one-off
off the point where it has no useful still has some use value for meet all the needs of its users. datasets] datasets]
datasets value. users.

Completeness There is significant missing data There is missing data in the Data completeness is fit for As at Level 3. In addition, the Data completeness is fit for
in the dataset or coverage is dataset or coverage is poor and current purposes. It makes Entity is actively reaching out to purpose for both existing and
poor and this is not this is documented and sense as a dataset, can be used potential data users to potential uses of the data,
documented. explained to data users. by itself or in combination with understand how more based on clear, documented
reference data and any missing complete data coverage could user-research and feedback.
records or gaps are support new use cases for the
documented and explained. data.

Validation No validation of data. For creating/recording the As at Level 2. In addition, all As at Level 3. In addition, this is As level 4. In addition, regular
data, use is made of fields which do not need to be encoded into a schema against data cleaning is carried out to
vocabularies or validated fields free text are now validated (e.g. which the data can remove duplicate records and
(e.g. ensuring phone numbers address lookup from postcode, automatically be validated. errors across data systems.
are in a conformant agreed checking a number is entered
format). when a number is expected).

- 86 -

You might also like