I-cloud-data-management
I-cloud-data-management
http://news.cnet.com/2300-10805_3-10001679-10.html?tag=mncol 2
DATA MANAGEMENT IN LARGE-SCALE ENVIRONMENTS
• Definition
• Querying and exploiting
• Storage (persistency)
• Manipulation
• Efficient retrieval (indexing, caching)
• Fault tolerance (recovery, replication)
• Maintenance
Peta
10 15
Exa
Zetta 10 18
Yota 10 21
10 24 tape
magnetic
RAID
Data Volume
Hardware
3
DATA MANAGEMENT: STATE OF THE ART AND SOME CHALLENGES
Database Services
unbundled services
(tailored DBMS)
extensible
new functions
Architecture
Peta 10 15 Data
Exa 10 18 structured un- semi- Models n-tier
Zetta 10 21 structured structured
Yota 10 24 client-server
centralized
Data Volume tape
Application
reactive ubiquitous
magnetic computing
real-time
RAID adaptable
Hardware
¡ Use of memory and computing capacities of all computers and servers distributed in the world communicated by a network (e.g. Internet)
DATA MANAGEMENT IN SMALL-SCALE ENVIRONMENTS
5
CURRENT SCENARIO
6
CLOUD PRINCIPLE
7
CLOUD PRINCIPLE
8
CLOUD AIMS
9
ON DEMAND, SELF-SERVICE, PAY AS U GO MODEL
¡ Billing is based on resource consumption: CPU hours, volumes of data moved, or gigabytes of data stored
10
INFRASTRUCTURE MODELS
You manage
Applications Applications Applications Applications
You manage
Data Data Data Data
Runtime Runtime Runtime Runtime
Managed by vendor
Middlewar Middlewar Middleware Middleware
You manage
Managed by vendor
e e
O/S O/S O/S O/S
Managed by vendor
Virtualization Virtualization Virtualization Virtualization
Servers Servers Servers Servers
Storage Storage Storage Storage
Networking Networking Networking Networking
SOFTWARE AS SERVICES
SERVICE ON DEMAND
14
INFRASTRUCTURE AS SERVICE
¡ Delivers basic storage and computing capabilities as standardized services over the network
¡ Servers, storage systems, switches, routers are pooled and made available to handle workloads that range
from application components to high-performance computing applications
¡ e.g., Joyent (http://www.joyent.com/), virtualized servers – performance on-demand infrastructure
¡ The infrastructure is programmable: developers specify how to configure and interconnect virtual
components, how virtual machine and application data are stored and retrieved from a storage cloud
15
INFRASTRUCTURE AS A SERVICE
16
VIRTUALIZATION
17
VIRTUALIZATION
¡ Full virtualization is a technique in which a complete installation of one machine is run on another
¡ A system where all software running on the server is within a virtual machine
¡ Applications and operating systems
¡ Means of accessing services on the cloud
¡ A compute cloud is a self-service proposition where a credit card can purchase compute cycles, and a Web
interface or API is used to create virtual machines and establish network relationships between them
18
PROGRAMMABLE INFRASTRUCTURE
19
PROGRAMMABLE INFRASTRUCTURE
21
HIGH PERFORMANCE UNDERLYING SUPPORT
¡ Concurrency
¡ Inherent concurrency of cloud computing where ¡ Message passing
autonomous processes interact by exchanging messages
¡ Primary parallel programming model for cloud computing
¡ Provides control flow to respond to unordered events
¡ Inherent performance, isolation with points of interaction
¡ Supports processing of independent streams of requests
¡ Requires adequate interfaces between asynchronous
¡ Parallelism communication of messages and synchronous control flow
of procedure calls
¡ Cloud computing runs on parallel computers on client and
server side ¡ Erlang (www.erlang.org) integrates message passing
¡ Higher level programming models such as transactional constructs to existing languages
memory and deterministic execution
22
HIGH PERFORMANCE UNDERLYING SUPPORT
¡ Performance
¡ Shared resources running across large number of
¡ Distribution computers and complex networks
¡ Integrate replication, concurrency, and quorum ¡ Make performance a first class programming
solutions on a mainstream programming model abstraction
¡ http://labs.live.com/volta
23
HIGH PERFORMANCE UNDERLYING SUPPORT
¡ Defect detection
¡ High level abstractions
¡ A system is resilient it can tolerate failures of its
¡ Google’s Map Reduce or MS Dryad are higher level
components
programming models that hide the complexity of
¡ In the cloud: computers, communication network, other writing a server-side analytic application
services and the data center in which it turns
¡ Hide complexity of data distribution, failure detection,
¡ Detect failures, respond to them minimizing the effect, notification, communication and scheduling
restore the service when possible and resume
¡ Optimization
execution
24
PLATFORM AS A SERVICE
¡ Encapsulates a layer of software and provides it as as a service for building higher level services
¡ Platform integrating an OS, middleware, application software and development environments
¡ xVM hypervisor virtual machines including netBeans, Sun GlassFish Web stack and support to languages like Perl and Ruby
¡ Encapsulated service exporting an API able to manage and scale itself to provide a given level of service
¡ Google Apps Engine serving applications on Google’s infrastructure
25
CLOUD IN A NUTSHELL
1. Cloud Software as a Service (SaaS)
PIVOT
ISSUES 1. Private cloud
DEPLOYME 2. Public cloud
1. Virtualization NT MODELS 3. Hybrid cloud
2. Autonomics (automation)
4. Community Cloud
3. Grid Computing (job scheduling)
09/10/2015 26
MANAGING DATA AS A SERVICE
27
DATA MANAGEMENT WITH RESOURCES CONSTRAINTS
STORAGE
SUPPORT
RAM
ARCHITECTURE &
RESOURCES AWARE
Algorithms
Systems
Efficiently manage and exploit data sets according to given specific storage, memory and computation
resources
28
… WITH RESOURCES CONSTRAINTS
Distribution and organization of
Swap memory– disk data on disk
Data transfer
Query and data processing
on server
• Efficiency => time cost
• Optimizing memory and computing
cost
Persistency support
Data centre
Enabling virtualisation
platform
¡ Old model
¡ Query the world”
¡ data acquisition coupled to a specific hypothesis
¡ New model
¡ “Download the world”
¡ data acquired en masse, in support of many hypotheses
¡ E-science examples
¡ astronomy: high-resolution, high-frequency sky surveys, …
¡ oceanography: high-resolution models, cheap sensors, satellites, …
¡ biology: lab automation, high-throughput sequencing, ...
HOW TO MAP ARCHITECTURE IN CLOUD ?
How to “map” the components of the reference architecture to (virtual) machines in the cloud.
Computing resources
architectures
Vinayak borkar, Michael J. Carey, Chen Li, Inside “big data management”:
ogres, onions, or parfaits?, EDBT, 2012
Execution platforms
DM systems
37
CLOUD AWARE APPLICATIONS ARCHITECTURES
¡ Good plan for dividing data with tools implementing master/worker or other parallelization patterns
¡ Data partitioning techniques, real – time analysis
¡ Data physics: balance between local data processing and data transfer costs
¡ Combine data and computing power, e.g. virtual machine location and data storage location
38
APPLICATION REFERENCE ARCHITECTURE
Reference'Architecture
Client
Web'
Server
App'Server
SQL records
DB'Server
get/put block
Store
BUT IS IT VALUABLE ? AND HOW ?
Adobe
Browser Adobe Air Mobile Games ...
Flex
Servers
of utility
Doc
provider Doc
Doc
Doc Doc
DB
App1
App1
App1
DBApp1
Internal & External Data
SOME COMMENTS …
¡ Data management applications are potential candidates for deployment in the cloud
– industry: enterprise database system have significant up-front cost that includes both
hardware and software costs
– academia: manage, process and share mass-produced data in the cloud
42
DATA ACQUISITION
43
DATA ACQUISITION
44
DATA ACQUISITION
45
DATA INTEGRATION IN THE CLOUD
¡ Resource consuming model focussing on the technical and economic conditions to be fulfilled to
access potentially unlimited resources
¡ Integrating and processing heterogeneous data collections, calls for efficient methods for
¡ correlating, associating, and filtering them considering their variety (i.e., different formats and data models)
¡ quality, e.g., trust, freshness, provenance, partial or total consistency.
46
DATA INTEGRATION IN THE CLOUD
¡ Quality of service (QoS) requirements expressed by data consumers and Service Level Agreement
(SLA) contracts exported by data services
¡ Cloud providers that host these collections and deliver resources for executing data processing and
integration processes
¡ SLA- based data integration for better meeting user requirements related
¡ to the conditions in which data is delivered and integrated 47
¡ Producers characterized by location, provided content type and topic, access conditions (e.g. cost, inscription, or
exchange unit), and content production time window
¡ Consumers characterized by location, interests during a time interval, maximum cost of the consumed content, or
resources to get the service, and QoS requirements (availability and how critical it is to consume a given type of content)
¡ Producers and consumers
¡ Have subscriptions to different cloud providers for dealing with content storage, processing and exchange 48
¡ Can ask to minimize the transfer of personal data when they share/consume content
MOOC SCENARIO
¡ MOOC
¡ Aims at being privacy respectful of the producers and consumers participating in courses
¡ Uses privacy preserving techniques to let users share content anonymously according to the level of trust associated to
data providers
49
¡ Data providers can also wish to give restricted data access credentials w.r.t. to their trust level, when their
data are used within an integration process
PROBLEM STATEMENT
¡ How can the user efficiently obtain results for her queries such that they meet her
QoS requirements
¡ they respect her subscribed contracts with the involved cloud provider(s)
¡ they do not neglect services contracts
¡ Particularly, for queries that call several services deployed on different clouds
Integration can be done enforcing all/some specified conditions
Matching data providers with requests and QoS preferences with SLA’s can be computationally costly
à results should be capitalized for further integration requests 50
PROBLEM STATEMENT (II)
Energy provision
Agreed Hub
Requirements
SLA1 Service
Agreed
Service SLA
Agreed Service
SLA Agreed
SLA
How to be sure that all the agreed SLAs are respected while satisfying the user?
51
PROBLEM STATEMENT (III)
Agreed
SLA1
Energy provision
Hub Service
Agreed
SLA
Service
Agreed Agreed Service
SLA2 SLA Agreed
SLA
Propose an SLA guided continuous data integration and provision system as a DaaS
¡ Integrated SLA computation out of the Data agreed SLA
¡ Optimized and adaptable data collection, query rewriting and integration according to user
preferences
¡ Learning based data integration mechanisms
53
HOW TO EXPLOIT SLA FOR INTEGRATING DATA
List of English poetry content providers that can provide commented Emily Dickinson poems
that
QoS are close to my city: and
preferencesuser are≤ labeled
⟨cost as experts,
$1, freshness where the
= “any”, total cost= is“certified”,
provenance less than 1 dollar,
location
using only trustful services
= “close”, duration = 7 days, privacy- preserving=“reputation-based”⟩.
EVALUATION
4 Agreed
INTEGRATION
SLA
Integrated Cloud1 Data
SLA Agreed SLA providers
s3
QUERY (user
3 subscription) s1
REWRITING s2
Query + QoS preferences
IaaS
s2
SaaS
CHOOSING Cloud2
2 Agreed SLA
SERVICES
(user
PaaS
Derived subscription)
SLA Cloud3 s5 s2
s1
Agreed SLA
DERIVING (user
1 subscription) Service
SLA providers
SLA-Service
directory
A BSTRACT
INTERCLOUD LAYER
CHALLENGES BEHIND AN SLA-GUIDED APPROACH
¡ Agreed-SLA:
¡ Its content should allow to match user preferences wrt to service features
¡ a service-centric monitoring for service static and dynamic deployment conditions
¡ Challenge: How to compute coarse grained measures with fine grained ones?
¡ Derived-SLA
¡ Guides the way the query will be evaluated, and the way results will be computed and delivered
¡ Helps learning for further data integration operations
¡ Challenge : How to consider in real time the Agreed SLA clauses in the rewriting algorithm, especially for
dynamic clauses?
56
DERIVED SLA
¡ Set of measures that correspond to the user preferences computed as a function of different
static, computed as a function of different measures
¡ Inequations that have to be solved during the execution of a service composition.
¡ Guides the way the query will be evaluated, and the way results will be computed and delivered
¡ User preferences statement measures are used for defining a derived SLA
• storageSpace: 1GB
REWRITTING QUERIES
Javier Espinosa
LAFMIA (UMI 3175)
France
60
[email protected]
http://www.vargas-solar.com/data-management-services-cloud