0% found this document useful (0 votes)
12 views

2.data Acquisition& Data Integrations, 3.unstructured Data

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

2.data Acquisition& Data Integrations, 3.unstructured Data

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

1.1.

Data Acquisition System


What is Data Acquisition?
A Data Acquisition System, often abbreviated as DAQ, consists of sensors, measuring
instruments, and a computer. Its purpose is to gather and process essential data for unde-
rstanding electrical or physical phenomena. This system plays a crucial role in tasks like
monitoring heating coil temperature to evaluate efficiency in achieving desired levels.

Data acquisition, also known as the process of collecting data, relies on specialized software
that quickly captures, processes, and stores information. It enables scientists and engineers to
perform in-depth analysis for scientific or engineering purposes. Data acquisition systems are
available in handheld and remote versions to cater to different measurement requirements.
Handheld systems are suitable for direct interaction with subjects while remote systems excel
at distant measurements, providing versatility in data collection.

What Does a Data Acquisition System Measure?


Data acquisition systems are versatile tools used for measuring a wide range of parameters,
primarily derived from analog signals. These measurements play a crucial role in various
applications and are converted into digital format to enable computer processing.

Some common parameters that data acquisition systems measure include current, voltage,
strain, frequency, pressure, temperature, distance, vibration, angles, digital signals, weight,
and more. To measure specific parameters accurately and effectively, specialized sensors or
modules may be used.

With the combination of data acquisition modules and appropriate sensors or transducers,
nearly any required parameter can be measured efficiently. This adaptability makes data
acquisition systems highly customizable for diverse measurement needs and allows for
specialization when necessary.

Importance of Data Acquisition Systems


Data acquisition systems hold significant importance across various fields and industries for
several reasons:

 Accurate Data Collection: The precise and consistent gathering of data from various
sensors and sources is facilitated, resulting in reduced potential for human error and
ensuring the integrity of the collected information.

 Real-Time Monitoring: Systems that acquire data provide real-time insights into
processes. This enables prompt responses to changing conditions, leading to improved
safety and enhanced operational efficiency.

 Research and Development: They provide crucial data for experiments, simulations,
and the creation of new technologies and products, supporting research endeavors
effectively.
 Environmental Monitoring: The acquisition of data plays a crucial role in
environmental studies. It aids in evaluating pollution levels, climate conditions, and
the impact of human activities on ecosystems.

Medical Applications :In the realm of medical applications, these systems play a vital role.
They diligently monitor a patient’s vital signs, aid in accurate diagnosis, and contribute to the
advancement of medical devices and treatments.

Automation:In automated systems, data collection plays a pivotal role as it enables machines
and processes to operate efficiently without human intervention. This foundational aspect of
automation ensures seamless functioning and optimal performance.

Data storage and retrieval play a crucial role in ensuring the availability of historical data
for analysis, compliance, and auditing purposes. By facilitating seamless storage and easy
access to information, this process enables organizations to effectively analyze past.

 Energy Management: In energy-related fields, systems for collecting data play a


crucial role in monitoring the production, distribution, and consumption of electricity.
This facilitates efficient energy management and promotes conservation practices.

Components of Data Acquisition System

Components of Data Acquisition System

 Sensors: Devices that gather information about physical or environmental conditions,


such as temperature, pressure, or light intensity.

 Signal Conditioning: To ensure accurate measurement, the raw sensor data


undergoes preprocessing to filter out any noise and scale it appropriately.

 Data Logger: Hardware or software that records and stores the conditioned data over
time.
 Analog-to-Digital Converter (ADC): Converts analog sensor signals into digital
data that computers can process.

 Interface: Connects the data acquisition system to a computer or controller for data
transfer and control.

 Power Supply: Provides the necessary electrical power to operate the system and
sensors.

 Control Unit: The management of the data acquisition system involves overseeing its
overall operation, which includes tasks such as triggering, timing, and
synchronization.

 Software: Allows users to configure, monitor, and analyze the data collected by the
system.

 Communication Protocols: The transmission and reception of data between a system


and external devices or networks is known as data communication.

 Storage: For storing recorded data, there are a range of options available, including
memory cards, hard drives, or cloud storage. These provide both temporary and
permanent storage solutions.

 User Interface: This system allows users to interact with and control the data
acquisition system effectively.

 Calibration and Calibration Standards: To ensure accuracy the sensors and system
are periodically calibrated against known standards.

 Real-time Clock (RTC): Accurate timing is maintained to ensure synchronized data


acquisition and timestamping.

 Triggering Mechanism: Data capture is initiated based on predefined events or


specific conditions.

 Data Compression: Efforts are made to reduce the size of collected data for storage
and transmission in remote or resource limited applications.

Basic Types of Data Acquisition Systems


There are some basic types of data acquisition systems given below :

1. Digital Data Acquisition Systems

Digital Data Acquisition Systems (DAS) are crucial for gathering and processing data from
sensors, instruments and sources in a format. They offer benefits across industries. By
digitizing analog signals these systems ensure accuracy. Minimize data loss during
transmission and storage. Typically comprising components such as ADCs, microcontrollers
and data storage units digital DAS provide real time data for analysis and control purposes.
This enhances the efficiency and reliability of processes significantly.
Digital Data Acquisition Systems

Moreover digital DAS offer versatility in handling sensor types while seamlessly integrating
into computer based control and monitoring systems. Consequently they have become tools,
for research, industrial automation, medical monitoring, environmental studies among other
fields. Their capacity to efficiently gather, analyze and share information plays a role, in
making informed decisions and enhancing processes across different fields.

2. Analog Data Acquisition Systems

Analog Data Acquisition Systems (DAS) play a role, in fields as they enable the conversion
of real world analog signals into digital data for analysis and processing. These systems
consist of sensors that capture analog data like voltage or current along with signal
conditioning circuitry that filters, amplifies and preprocesses the signals. To facilitate storage
and analysis by computers or microcontrollers analog to digital converters (ADCs) are used
to convert these analog signals into a format.

Analog Data Acquisition Systems

Analog DAS finds applications in fields like industrial automation, environmental


monitoring, scientific research, and medical diagnostics. It ensures data accuracy, precision,
and reliability, allowing organizations to make informed decisions, control processes, and
monitor critical parameters. These systems are versatile tools that bridge the gap between the
physical world and digital data analysis, facilitating advancements in various domains.

Considerations When Setting up a Data Acquisition


System
When establishing a data acquisition system, there are several important factors to consider:

 Sensor Selection: The appropriate sensors or transducers that accurately capture the
data needed should be carefully chosen. Factors such as measurement range, re-
solution, and sensitivity need to be considered in order to make an informed decision.

 Signal Conditioning: The quality of acquired data can be enhanced by employing


signal conditioning techniques. This may include amplification, filtering, or other
necessary adjustments.

 Noise Reduction: Measures can be implemented to minimize noise interference,


which has the potential to distort data. One effective approach is through the use of
shielding, grounding, and filtering techniques.

 Calibration: Regularly, it is important to calibrate sensors and measurement


equipment in order to maintain their accuracy and reliability.

 Data Storage: The decision to be made is regarding an appropriate method for data
storage. One should consider options such as on-site storage, cloud-based solutions, or
a combination of both.

 Data Transmission: If applicable, it is important to create a plan for transmitting data


from remote sensors to a central system. This plan should prioritize reliable and
secure communication methods.

 Power Supply: To prevent any loss of data or system failures, it is essential to ensure
a stable and reliable power supply for both the sensors and data acquisition
equipment. This will guarantee uninterrupted functionality.

 Environmental Conditions: Considering the environment in which the data


acquisition system will operate is crucial. It is important to safeguard the equipment
from extreme temperatures, humidity, and other environmental factors.

 Data Processing: Define how data will be processed, analyzed, and visualized. Select
appropriate software tools and algorithms for data analysis.

 Data Security: Implement security measures to protect sensitive data from


unauthorized access or breaches. Encryption and access controls may be necessary.

 Scalability: The system should be designed with scalability in mind, considering the
future expansion of data or addition of sensors.

 Regulatory Compliance: Ensure that the data acquisition system complies with
relevant industry standards and regulations, especially if it involves sensitive or
regulated data.
1.2 Data integration
Data integration, on the other hand, is a broader concept that encompasses the process of
combining data from different sources, often with different structures, formats, or semantics,
into a cohesive and unified data view. It pulls data out of a database and puts it back into
another system, such as a data warehouse or a data lake.

However, the data integration process may involve real-time data ingestion, data
transformation, data enrichment, data replication, and data consolidation. It aims to create a
single, consolidated, easier view of data that can be used for analysis, reporting, or storage in
a data warehouse or a data lake.

Key Differences:

1️⃣ Scope: Data acquisition primarily focuses on obtaining and ingesting raw data, while data
integration encompasses the consolidation and transformation of data to create a coherent
dataset.

2️⃣ Timing: Data acquisition is often an ongoing process, continuously collecting new data as
it becomes available. Data integration occurs after data acquisition and involves combining
and transforming the acquired data.

3️⃣ Transformation: Data acquisition involves minimal data manipulation, mainly focused on
standardizing formats and structures. Data integration, however, involves complex
transformations, such as cleaning, deduplication, normalization, and joining disparate
datasets.

4️⃣ Purpose: Data acquisition establishes the foundation

for data-driven insights and analytics. Data integration ensures data consistency, quality, and
accessibility, enabling effective decision-making and business intelligence.

Implementation of Data Integration

Understanding data integration in IoT


Data integration in the context of IoT refers to the process of gathering, consolidating, and
transforming data from various IoT devices, sensors, and systems into a unified format for
meaningful analysis. Data integration presents a holistic view of scattered data in a singular
space, improving accessibility and decision-making speed.

Techniques and Technologies for Data Integration:

Let’s look at some of the fundamental techniques and technologies used in data integration
across IoT systems:

Message queuing and publish/subscribe systems


Message queuing systems, such as Apache Kafka and RabbitMQ, and publish/subscribe
systems, like MQTT, are widely used in IoT data integration. These systems facilitate
efficient and reliable data transmission between IoT devices and systems. Messages are
published by senders and received by subscribers, enabling real-time communication and data
flow between different components of the IoT ecosystem.

For example, in a smart home scenario, a temperature sensor publishes data on room
temperature changes, and an HVAC system subscribes to this data to adjust the heating or
cooling accordingly.

Application programming interfaces (APIs)

APIs provide standardized interfaces and protocols for integrating data from various sources
in IoT systems. They enable data exchange and seamless communication between devices,
platforms, and systems. APIs define the rules and formats for requesting and exchanging
data, making it easier to integrate diverse data sources.

For instance, a weather API may allow an IoT weather station to retrieve real-time weather
data and integrate it into a smart irrigation system. This integration enables the irrigation
system to adjust watering schedules based on weather conditions.

Data integration platforms

Data integration platforms offer comprehensive solutions for managing and orchestrating data
integration workflows in IoT environments. These platforms provide ETL functionality to
extract, transform, and load data from multiple sources.

They often include visual interfaces and zero code, drag-and-drop capabilities for designing
integration workflows, allowing users to define data mapping, transformation rules, and data
quality controls. These platforms help organizations simplify the complexities of data
integration in IoT and ensure consistency and reliability in the integrated data.

These techniques and technologies for data integration in IoT provide the necessary
infrastructure and tools to handle the complexities of integrating diverse data sources,
ensuring reliable data transmission, standardized data formats, and efficient data
management. By leveraging these techniques, organizations can harness the full potential of
IoT data and derive valuable insights for enhanced decision-making and improved
operational efficiency.

Advantages of data integration in IoT environments


Data integration plays a vital role in maximizing the benefits obtained from IoT
environments.

Enhanced decision-making: By integrating data from diverse IoT sources, organizations


comprehensively understand their operations. This integrated data provides valuable insights
for making well-informed decisions. Analyzing patterns and trends across multiple data
streams allows organizations to make accurate predictions, identify optimization
opportunities, and manage risks effectively.

Improved operational efficiency: Data integration optimizes operations and enhances


efficiency. Organizations can identify bottlenecks, inefficiencies, and redundancies by
consolidating data from various devices, sensors, and systems. For example, in a factory,
integrating data from sensors embedded in production lines, supply chain systems, and
quality control processes helps identify areas of inefficiency and streamline workflows.

Seamless connectivity and interoperability: Data integration enables smooth connectivity


and interoperability in IoT environments, allowing effective communication and
collaboration among diverse devices, protocols, and platforms. For instance, in the healthcare
sector, data integration facilitates seamless sharing of patient data between medical devices,
electronic health records (EHR) systems, and healthcare providers.

Real-time monitoring: Data integration enables real-time monitoring of IoT systems,


providing immediate visibility into the status, performance, and health of assets. It allows
data teams to detect anomalies, deviations, or potential failures in real-time, enabling
proactive maintenance and predictive analytics.
2.Unstructured Data Storage
Storing unstructured data is the first step of data analysis.

But what constitutes unstructured data? Is it really unstructured? Let’s take a closer look.

Although unstructured data has an internally predefined structure, it does not follow a fixed
data model. Unstructured data may not always fit into a structure predefined by a structured
database or data table. Here are some examples:

 Social media data—Social media text such as comments and feedback are unstructured, but
social media data like friends, followers, and likes are structured.
 Email—The body copy is unstructured, whereas the “to,” “cc,” and “subject” fields are
structured.
 Multimedia—This can be represented in multiple ways, including vectors, bitmaps, GIFs,
frames, and so on, making them unstructured.

Unstructured data forms about 80% of big data. Businesses use various unstructured data
analysis techniques and tools to get insights from unstructured data. However, storing
unstructured big data is complex because of its usually high volume, variety, and velocity.

Unstructured data storage challenges


No fixed format

Suppose you have to store details about all the employees of an organization. One employee
may own many cars or have more than one child. Another may not have either of these.
Because of this, each employee has characteristics that others might not have, and we don't
necessarily require all of the fields for all of the employees.

In a relational database, we would be creating fields for each of these, many of which might
be unused. In addition, if we later want to add new fields, like car insurance details, we'd
need a schema change and downtime. With no predefined format of unstructured data, this
could soon become a nightmare.

Scaling issues

As the amount of unstructured data keeps increasing, traditional storage systems may not
scale out. Adding more resources (disks) to the system will increase the cost—and you
cannot do so indefinitely, because the data will again outgrow the number of disks. Scaling
out is not easy with a relational database—the system performance suffers because the table
joins across nodes become too complex.

Complex data retrieval and querying

If you just dump all the big data into a storage system, not knowing what to do with it, the
data will lie there without adding any value. For example, once you store multimedia data,
you may not get an efficient way to find, update, or even delete it, even with indexing.
Therefore, to handle unstructured data, you need storage infrastructure that can scale out and
provide efficient data management. A good example of such storage is an object database,
where the entire data is an object and has metadata and a unique id to easily identify data.

Storage requirements for unstructured data


Companies should strategize storing unstructured data during the planning phase of a big data
project. The storage infrastructure should be agile, cost-effective, scalable, and cater to a
wide range of use cases.

Consider the following requirements for unstructured data storage:

Flexibility

The data model should be flexible to accommodate new fields and data types with minimum
impact on existing schema or data, thus requiring no downtime.

The article NoSQL explained details how NoSQL databases, like MongoDB, are flexible
enough to store vast amounts of data in varied formats.

Purpose

If your workload is mainly analytics, you need a robust storage system that supports low
latency and faster data updates. Cloud storage would be a good option for this purpose as
opposed to an on-premise system.

Easy access to archived data

Data archiving prevents data loss, and reduces the cost of primary storage. Data that is old but
still required should be stored in such a way that it’s easy to retrieve and doesn’t increase
overall storage cost.

Scalability

The storage system should be horizontally and vertically scalable at all times without any
data loss. Modern storage systems like AWS and Azure provide automatic scaling depending
on the application requirements.

A NoSQL database is a good approach that satisfies all the above unstructured data storage
requirements. To handle scalability and online archiving capabilities as the data continues to
grow, cloud-based databases like MongoDB Atlas, a database-as-a-service like MongoDB
clusters, and data lakes like MongoDB Atlas Data Lake are excellent options.

Best options for storing unstructured data


Now that you understand the requirements of unstructured data storage and the challenges
that relational databases pose for storing unstructured data, we’ll discuss some robust ways to
store unstructured big data.

You can store unstructured data on-premise or in the cloud using a database, data warehouse,
or data lake.
While cloud storage does offer security, companies might prefer on-premise storage for
highly sensitive data.

Non-relational database for storing unstructured data

Non-relational (NoSQL) databases have emerged as a convenient way of storing unstructured


big data. They are flexible, scalable, highly available, secure, and help to minimize the
unstructured data storage challenges. NoSQL databases make data management more
efficient and cost-effective.

There are various types of NoSQL database systems. One type is the document (object) store,
which provides a simple query mechanism to quickly retrieve data as the system recognizes
the data structure. Documents consist of various attributes with different data types.
Document stores are highly scalable and available by design, and can partition, replicate, and
persist the data. MongoDB is a document-based NoSQL database that stores data in a BSON
(JSON-like format). Such a format is easy to read and traverse. MongoDB is also suitable for
handling transactional data.

{
"studentID": "stud20210903",
"name" : "Ben Park",
"address": {
"zip" : "W1J9LL",
"city" : "London",
},
"hobbies": ["gardening", "travelling", "reading"],
"familydetails":{
"motherName": "Alicia",
"fatherName": "Ricky",
"sibling":["Carol"]
}
}

If you were to store the above information in a relational database, you’d probably need three
or more tables and would need to join the tables to see all this information in one view.

MongoDB Atlas, MongoDB’s database-as-a-service, utilizes major cloud platforms like


AWS, Azure, and Google Cloud for its database servers. This means you don’t need to install
MongoDB and still get all the benefits of a NoSQL document database in a cloud
environment.

Unstructured data storage with Data Lake


A data lake is a central storage repository that stores data in its native format. It uses flat
architecture to store data, usually as object or file storage. Data lakes are vast and store any
amount of unstructured, structured, or semi-structured big data. They work on the schema-on-
read principle (i.e., do not have a predefined schema).

The data sources can be IoT devices, streaming data, web applications, and many others.
Some of the data ingested might be filtered and ready to use as well — the kind of flexibility
impossible with relational databases.
Since data lakes are configured on commodity hardware and clusters, they are highly scalable
and inexpensive.

Data lakes can be configured on-premise or in the cloud. Again, on-premise data lakes are
suitable for highly sensitive and secure data. However, having a cloud data lake reduces the
cost of infrastructure and is easier to scale out.

MongoDB Atlas Data Lake is a great solution that provides a single platform for your
MongoDB Atlas clusters and allows you to:

 Organize and query across multiple Atlas and AWS S3 clusters.


 Structure the data stored in a data lake.
 Convert MongoDB data into Parquet, CSV, or other formats.
 Natively query, transform, and move data across AWS S3 and MongoDB Atlas clusters.

Data warehouse

A data warehouse is a repository created for analytics and reporting purposes. It usually
works on a structured storage (schema-on-write), unlike data lakes. Data warehouses
primarily store past and current structured or semi-structured data, which is internal to the
organization and available in standard format. Unstructured data (like that from the internet)
should be processed and formatted with an ETL step before being ingested into a data
warehouse. This makes the data consistent and of high quality—and, therefore, ready for
analysis. You can say that a data warehouse is an analytical database used for business
intelligence. The schema-based format makes data analysis easier.
Data warehouses can be on-premise and cloud-based. Cloud data warehouses reduce the cost,
deployment process, and infrastructure needs, and can automatically scale based on
application needs.

A data mart is a subset of a data warehouse that stores operational data of a particular niche
or line of business.

What is Unstructured Data?


Primarily, unstructured data storage is all the data that doesn’t fall under the horizon of
relational databases. The data here is not structured using predefined data models. Still, it
upholds an internal structure that may be textual or non-textual, generated by humans or
machines. It gets stored within non-relational databases such as NoSQL.

Unstructured data can be anything from social media posts, images, audio files, sensor data,
text data, and many different data types. The term unstructured highlights the fact that large
datasets aren’t in a defined structure layout.

How Does Unstructured Data Storage Work?


Unstructured data has seen tremendous exponential growth because of the ongoing
technological advancements driving eCommerce, cloud migration of businesses, and social
media activities.

Also, excessive growth means that data storage has to get redefined

Regarding data size and format, unstructured data comprises everything, including IoT,
remote system monitoring, and data to video and Images. File sizes can range from a few
bytes to many gigabytes plus.

Cloud Storage of Unstructured Data


The Cloud provides high-performance, scalable storage infrastructure services to customers.
There is increasing demand for such flexible services. Hence providers and pioneers of cloud
innovations offer their infrastructure on a subscription basis or as open-source software to
reduce the overall financial burden on business organizations.

Unstructured data pretty much includes every kind of information. The file sizes range from a
few bits and bytes to gigabytes or more. But, there is no uniform approach regarding data
storage. The type of storage used to store collected data depends on the computing capacity
and the preset thresholds for input and output, including everything from low-performance
cloud instances to high-performing, distributed files.

Network Attached Storage

Before, Network-Attached Storage (NAS) was just associated with single file, siloed data
storage. Nowadays,scale-out NAS can handle big data and high-capacity data storage.NAS
scaling has elevated file storage access into realms of higher performance and capacity.

Scale-out NAS has a parallel file system that provides a namespace across multiple attached
storage boxes to scale billions of file data. You can add computing capacity and processing
power in some cases.

However, object storage has also grown over the years and leads to unstructured data storage.
Object storage provides advantages like unique identification for stored data, high
performance, scalability, and easy API access. Hence, many cloud providers go for object
storage.

Object Storage

Object storage is the more recent development of unstructured data storage that keeps data in
a flat format. You can access the data using unique identification models with metadata
headers that enable search and analysis. The service grew in popularity after providing an
effective solution to the shortfalls of scale-out NAS.

Object storage is arguably the native format of the Cloud, too. It is hugely scalable and
accessible via application programming interfaces (APIs), which fits well with the DevOps
way of doing things.

Object storage falls short of file locking, and it recently improved in terms of performance.
The big cloud service companies have their primary storage offerings built on object storage.
They offer different service tiers also, to cater to many business cases. For instance, Amazon
web services provide various courses of S3 storage with variations determined by
accessibility, speed, and the reproducibility of the data

Cloud Benefits of Unstructured Data Storage

1. Dependency of hybrid work culture on the cloud

While restrictions raised due to the pandemic have now been eased, a significant portion of
employees globally continues to work remotely. Since a remote work environment is
mutually beneficial for both employers and employees, it has led to adopt a hybrid work
model. This requires the backing of cloud computing to support the flexibility to work from
home or the office.

In terms of statistics, global spending on public cloud services is expected to grow at 20.7%
to reach $591.8 billion in 2023 (Gartner).

The emergence of cloud-based tools for team communication, collaboration, file sharing, and
project management will remain a high priority for companies in the coming year. Also, the
demand for hybrid work models in the long term is creating the required push for the
availability of cloud-based solutions and tools.

2. Sustainability goals to be backed by cloud technology

In 2023, more and more companies are expected to leverage the efficiency of cloud
computing to fulfill their sustainability goals. A recent survey confirmed that more than 80%
of businesses consider sustainability a critical criterion to drive their IT buying decisions.
Also, it is expected that 85% of companies will see a significant increase in IT spending
backed by the cloud to support sustainable efficiencies.

It can be because cloud solution providers can financially invest in IT infrastructure for their
clients to achieve economies of scale that individual companies simply can not. As a result,
running a business application hosted on the cloud is more efficient than the traditional on-
premise setup and reduces carbon footprint.

While it is up to you to transition from on-premise software to cloud-backed solutions, you


can expect this trend to become a buzzword in the following year.

3. Emergence of XaaS

Anything as a service, also known as XaaS, describes a category of cloud computing services
delivered to end users via the Internet. The service charges are paid under a flexible
consumption model rather than any upfront license cost or expenses.

The growing popularity of XaaS can be attributed to the fact that it combines software,
analytics, support, cloud hosting, and more in one place. As a result, they can meet the
client’s demands and pay for outcomes instead of the time spent using the services. Besides
this, it also allows organizations to free up resources for improved innovation and streamline
operations. The XaaS market is expected to reach $624.1 billion by 2027.

You might also like