InterSystems IRIS Data Platform-Unified Platform For Powering Real-Time Data-Intensive Applications-Whitepaper
InterSystems IRIS Data Platform-Unified Platform For Powering Real-Time Data-Intensive Applications-Whitepaper
along with large sets of historical and reference that support continuous delivery and DevOps
data — without delay. methodologies.
n Delivering new and innovative business
services, n Support a range of data models and
n Provide these functional capabilities in a cost
representations including relational, document, effective manner, without needing to hire a staff
n Increasing revenues,
key-value, object, and unstructured text. of experts in a broad range of disciplines.
n Improving customer experiences,
n Create seamless, real-time composite processes
n Streamlining operations,
that integrate disparate applications and data
n Identifying and decreasing risk, sources.
n Complying with new and ever-changing
industry regulations, and
n Reducing costs.
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 1
Enabling the Real-Time For real-time applications that rely on change data Applications that require real-time analytics
Organization capture (CDC) processing, organizations reported on live data from a variety of sources are being
that 96 percent of their CDC processes take more implemented in virtually every industry:
Technology-industry analyst IDC recently
than a minute before the data can be analyzed,
interviewed more than 500 enterprises worldwide
and 65 percent take more than 10 minutes. That n Financial services, for compliance with
across a variety of industries. Over 75 percent
is too slow for critical real-time use cases, where mandatory state and federal regulations, fraud
reported that their inability to analyze current live
milliseconds matter. detection, and risk management initiatives
data was actively inhibiting their ability to execute
on new business opportunities. And more than half n Discrete manufacturing / original equipment
said it was limiting operational efficiencies.1 manufacturing, for predictive maintenance
n Shipping and logistics, for real-time container
The research found that 64 percent of companies and shipment tracking
have delays of five days or more before they can n Retail, for customer and visitor targeting and
analyze operational data when using ETL (extract,
personalization
transform, load) processing to move the data from
their operational systems into a data warehouse.
n Public safety, for situational awareness for first
responders
n Healthcare, for personalized and proactive
treatments at the point of care
Figure 1: Average time to move operational data to InterSystems IRIS Data Platform™ delivers what is
the analytic database via ETL needed. It can incorporate multiple, disparate, and
Source: 3rd Platform Information Management Requirements dissimilar data sources; support embedded real-
Survey, IDC, October, 2016, n=502
time analytics; easily scale for growing data and user
volume; interoperate seamlessly with other systems;
and provide flexible, agile, DevOps-compatible
deployment capabilities.
1
“Choosing a DBMS to Address the Challenges of the Third Platform” (IDC, 2017)
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 2
InterSystems IRIS Data Platform InterSystems IRIS Data Platform provides these key Hybrid Transactional/Analytic
features: Processing to Support Real-Time
InterSystems IRIS Data Platform is a complete,
unified platform that simplifies the development, Applications
n Hybrid transactional/analytic processing to
deployment, and maintenance of real-time, data- At the core of InterSystems IRIS Data Platform is
support real-time applications
rich solutions. It provides concurrent transactional a proven, enterprise-grade, distributed hybrid
and analytic processing capabilities; support for n Multiple data models transactional/analytic processing (HTAP) database.
multiple, fully synchronized data models (relational, n Embedded and open analytics It can ingest and store transactional data at very
hierarchical, object, and document); a complete high rates while simultaneously processing high
n Apache Spark integration
interoperability platform for integrating disparate volumes of analytic workloads on real-time data
data silos and applications; and sophisticated n Business Intelligence (BI) (including ACID-compliant transactional data) and
structured and unstructured analytics capabilities n Ability to incorporate advanced analytics into non-real-time data. This architecture eliminates the
supporting batch and real-time use cases. real-time processes delays associated with moving real-time data to a
The platform also provides an open analytics different environment for analytic processing.
n Natural Language Processing (NLP)
environment for incorporating best-of-breed
analytics into InterSystems IRIS solutions, and it n Interoperability
InterSystems IRIS’s ability to deliver high
offers flexible deployment capabilities to support n A unified development environment performance at scale for HTAP is made possible by a
any combination of cloud and on-premises number of technological innovations.
n Flexible deployment options
deployments.
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 3
Better Sharding The result is that InterSystems IRIS achieves
InterSystems IRIS provides a powerful and efficient consistent high performance, efficiency, and
approach to performing queries on large data sets. reliability, even for complex queries involving
An InterSystems IRIS sharded cluster can distribute multiple tables. In contrast, many other database
workloads and data sets horizontally across a tier of platforms that support sharded architectures rely
application servers, partitioning the data in specific on broadcasting the entire table, which can result in
large tables across multiple nodes (called data performance penalties and timeouts.
shards2 ).
Since sharding creates disjoint partitions of the
data, each data server’s cache is fully independent,
Sharding can benefit a wide range of and adding data servers linearly increases the
applications but provides the greatest gains cluster’s overall memory. Therefore, through
for use cases involving one or more of the appropriate sizing, InterSystems IRIS can achieve
following: the performance benefits of in-memory databases
without requiring all data to fit in memory.
n Queries scanning very large data sets
n Complex queries on large data sets
n High data-ingestion rates and/or volumes
InterSystems IRIS Traditional InterSystems IRIS
ECP Application Servers Sharding Intelligent Inter-Shared
(Cache Distribution) Communication
With Distributed Caching
Figure 3: Intelligent Inter-Shard Communication for Analyzing Large, Distributed Data Sets
2
A data shard is an InterSystems IRIS instance that stores one horizontal partition of each sharded table defined on the cluster’s shard master. The node hosting this instance is called a shard data server.
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 4
An InterSystems IRIS sharded cluster provides
additional performance advantages:
C++ / JAVA / PYTHON / ANSI SQL / SPARK Access
n The transparent parallel load capability of the
RELATIONAL OBJECT DOCUMENT KEY-VALUE TEXT
InterSystems IRIS Java Database Connectivity
(JDBC) driver supports the use of Java-based Multi-Model Panoramic View
tools for very fast data ingestion, in parallel across
the shards.
n When large, multiuser query workloads would
create a bottleneck on the shard master, a tier of
application servers can be added in front of the
shard master to scale for user volume through Enterprise Cache Protocol
distributed application logic and caching. Data-Aware Intelligence
3
Cosharded data refers to distributed data that is partitioned on a common key.
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 5
Multiple Data Models
InterSystems IRIS is built on a true multi-model database. This means
Analytic Queries on Distributed Data the data is stored once and can be accessed via multiple data models,
including relational and object models, which are always synchronized. This
eliminates the need to duplicate data or provide mappings between different
representations (e.g., object-to-relational mapping). The ability to natively
Cache Cache
support multiple data types enables organizations to model, store, and use
Events and Transactions
data in the most appropriate format and representation, for flexible solution
Shard Master
development, higher performance, and reduced complexity.
Cache Cache
How Important are the New Data Types?
Rows
(Rating scale: 1 = Not very important, 5 = very important)
Data Data
Relational
nal 4.31%
oT)
Data from the Internet of Things (IoT) 4.30%
ces
Streaming data from external sources 4.27%
ata
Sensor data 4.22%
Figure 5: Horizontally Distributed HTAP
phs
Graphs 4.22%
lue
Key Value 4.17%
InterSystems IRIS supports direct shared memory writers and client/server age
Video/audio/image 4.17%
distributed SQL processing simultaneously to support high-performance ect
Object 4.16%
concurrent transactional/analytic use cases. As a result, InterSystems IRIS can nts
JSON documents 4.13%
reliably process and analyze real-time data in combination with data stored in Geospatial data 4.10%
distributed and partitioned data sets, in less time and at lower operational cost.
For high availability of both non-sharded and sharded tables, all nodes
Figure 6: Importance of Supporting Various Data Types in a Data Platform
storing data can be mirrored. Compute nodes can be easily added and
removed to support user workload fluctuations. InterSystems IRIS provides Source: 3rd Platform Information Management Requirements Survey, IDC, October, 2016, n=502
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 6
Embedded and Open Analytics Apache Spark Integration
InterSystems IRIS supports a wide range of analytics to meet the varied requirements of today’s data- Apache Spark is a high-performance, open-source
intensive, real-time applications. InterSystems IRIS provides embedded state-of-the-art analytics cluster-computing framework and is often used
capabilities for distributed SQL, BI, and NLP and can incorporate a wide range of third-party and open- when performance on large distributed data sets is
source analytics packages as needed. critical. Apache Spark can be 100 times faster than
Apache Hadoop (MapReduce), and many common
machine learning and statistical algorithms are
available.
“Fast Data” “Big Data” Business Advanced Natural Language
Analytics Analytics Intelligence Analytics Processing InterSystems IRIS integrates directly with Apache
Spark via a shard-aware native Spark connector, so
that InterSystems IRIS applications can incorporate
Spark processing, and Spark applications can
incorporate distributed data from InterSystems
IRIS. The Apache Spark connector presents the
data shards of an InterSystems IRIS sharded cluster
as a native partition for the highest performance.
The connector is aware of the partitioned nature
of the InterSystems IRIS database, allowing the
Apache Spark worker nodes to automatically
connect directly to the shards, and work in parallel
on disjoint pieces of data. These parallel, direct
Figure 7: InterSystems IRIS Embedded and Open Analytics Capabilities connections also allow much higher throughput
(since less data needs to be passed through each
connection) and support high-speed data ingestion
Advanced analytics technologies are rapidly
to the sharded cluster.
gaining adoption. These approaches and
According to a 2017 survey of large
technologies include machine learning, predictive
businesses by research firm Gartner, 45% of
analytics, artificial intelligence, and real-time big-
the 1,931 respondents said they planned to
data processing frameworks like Apache Spark.
use data mining and predictive analytics,
39% planned to use Apache Hadoop or
In addition to its real-time (HTAP) and big
Spark, and 25% planned to use the advanced
(distributed) data processing capabilities,
analytics capabilities provided by Apache
InterSystems IRIS provides the following analytic
Hadoop or Spark.5
capabilities and integrations.
5
Rita L. Sallam, et al., “Survey Analysis: BI and Analytics Spending Intentions, 2017” (Gartner, 2017)
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 7
Business Intelligence external tool or the performance-inhibiting passing
of data across systems. This integration enables 46% of large businesses planned to
InterSystems IRIS provides fully integrated
predictive models created by data scientists and incorporate sentiment analysis of
capabilities for BI modeling, analysis, and end-user
other specialists to be seamlessly incorporated into unstructured content into their applications
dashboards. A BI model represents dimensions that
data-processing pipelines and business processes in 2017.6
are meaningful to the business, including aggregate
within InterSystems IRIS.
concepts (such as product line, sales area, market
segment, and so on) and numeric measures (such
as revenue, expenses, year-to-year growth, defect
Natural Language Processing
rate, and so on). An InterSystems IRIS BI model InterSystems IRIS provides NLP capabilities
can be based directly on transactional data and that infer meaning and sentiment from natural
other data that might be needed. A fully automated language text. InterSystems IRIS can automatically
GUI App Analytics
synchronization option avoids the need for ETL identify concepts and relationships in text without
processing. Drag-and-drop analysis capabilities requiring upfront work or domain knowledge.
enable nontechnical users to examine the data at These advanced NLP capabilities are embedded in
REST SQL
any level and perform complex queries with ease. InterSystems IRIS and can be included in business
InterSystems IRIS dashboards can display live processes, enabling organizations to include
business metrics and give restricted analysis options information from notes fields, social media, and
NLP NLP Domain
to users. other sources in data-rich applications. engine SQL Index
UIMA annotation store
Since there are many different kinds of
Ability to Incorporate Advanced specialized NLP tools, each with a specific type
Analytics Into Real-Time Processes of functional or domain applicability, some
applications may require these tools to be used Figure 8: InterSystems IRIS Natural Language
Organizations can incorporate predictive models Processing Capabilities
in sequence. InterSystems IRIS supports the
created by data mining and machine learning
Apache Unstructured Information Management
algorithms using external tools and applications
Architecture (UIMA) standard, which enables a
through InterSystems IRIS embedded support for
standards-based pluggable NLP pipeline to be
the Predictive Model Markup Language (PMML).
defined and executed. Apache UIMA support brings
PMML is an XML standard that fully defines all the
open interoperability to the NLP capabilities in
parameters of a predictive model developed using
InterSystems IRIS.
an external analytics application or framework.
When a PMML model is loaded into InterSystems
IRIS, native code is generated to allow execution
of the model in real time, without requiring any
6
Rita L. Sallam, et al., “Survey Analysis: BI and Analytics Spending Intentions, 2017” (Gartner, 2017)
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 8
schemas that exist between applications or services.
enable collaboration between the lines of business
Interoperability Application developers can create seamless and IT, resulting in faster development of solutions
InterSystems IRIS provides a complete set of business processes that connect with internal that meet business requirements, and easier
native integration and interoperability features. and external data sources, applications, and modification and extension of existing processes.
It provides out-of-the-box connectivity and data services. InterSystems IRIS provides graphical The embedded role-based workflow engine
transformations for a wide range of packaged tooling to visually diagram processes, rules, and supports manual interactions in business processes,
applications, databases, industry standards, workflows, allowing developers to focus on the automating the distribution of tasks among users
protocols, and technologies. Flexible data- logical interactions between systems, minimizing and incorporating their decisions and actions.
transformation capabilities enable InterSystems concerns about application interfaces, adapters,
IRIS to resolve differences in semantics and data or middleware mechanisms. The graphical models
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 9
Since InterSystems IRIS includes embedded Flexible Deployment Options Conclusion
database and analytics capabilities, sophisticated
InterSystems IRIS provides a simple, intuitive way InterSystems IRIS is a complete, unified data
analytics can be seamlessly incorporated into
to provision and deploy services on cloud-based platform that simplifies the development,
business processes, leveraging data stored in
and on-premises infrastructures. InterSystems deployment, and maintenance of real-
the database as well as real-time data. All data,
IRIS delivers the benefits of infrastructure as code, time, data-rich solutions. InterSystems
including in-flight data or data associated with
immutable infrastructure, and containerized IRIS provides concurrent transactional and
long-running asynchronous transactions, can
deployment of InterSystems IRIS-based applications. analytic processing capabilities; support for
be automatically persisted in the database and
It eliminates the need for major investments in new multiple, fully synchronized data models
available for reporting and analysis.
technology and associated training, as well as trial- (including relational, hierarchical, object,
and-error system configuration and management and document); a complete interoperability
The platform supports a wide range of standards
efforts. platform for integrating disparate data
used in various industries, such as healthcare,
silos and applications; and sophisticated
financial services, retail, and telecommunications,
InterSystems IRIS allows organizations to take structured and unstructured analytics
including REST architectures and web services (e.g.,
advantage of the efficiency, agility, and repeatability capabilities supporting both batch and real-
JSON, XML, XPATH, XSLT, SOAP, and DTDs).
that cloud computing and containerized software time use cases. The platform also provides an
offer, without requiring major development open analytics environment for incorporating
or retooling. It can also provision and deploy best-of-breed analytics into InterSystems
Unified Development Environment InterSystems IRIS configurations on existing virtual IRIS solutions and offers flexible deployment
The unified graphical and code-based environment and physical clusters, and it supports deployment capabilities to support any combination of
of InterSystems IRIS delivers a consistent of containers on enterprise-level operating system cloud and on-premises deployments.
representation of diverse programming models, platforms, including preexisting infrastructure and
programming interfaces, and data formats, commercial cloud platforms. InterSystems IRIS is being used in multiple
providing a single development environment across industries to help deliver a range of important
all functionality. strategic and operational benefits, by
leveraging more data while eliminating delays
between event, insight, and action.
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, Data-Intensive Applications Page 10
InterSystems.com