Structured, Semi-Structured and Unstructured Data
Structured, Semi-Structured and Unstructured Data
Structured data is highly organized and formatted to fit into traditional databases or
spreadsheets. It follows a consistent schema and is typically stored in rows and columns.
Structured data originates from relational database management systems (RDBMS) like SQL
Server, Oracle, MySQL, and PostgreSQL and it can be ingested into OneLake. This includes
tables, indexes, and views.
Content wise, this is often also referred to as transaction data and operational data.
Transaction data being generated from business transactions such as sales records, financial
transactions, and order processing. Operational data meaning data produced by day-to-day
business operations, including inventory levels, human resources records, and customer
relationship management (CRM) systems.
Besides raw data, data originating from other systems and sources, another typically found
example of structured data is derived data. Derived data is processed and transformed from
raw data to provide valuable insights and analytics such as aggregated data (summarized
data from various sources, such as monthly sales totals, average customer ratings, and
summary statistics) or analytical data (processed data ready for analysis, including data
cubes, dashboards, and reports).
Semi-Structured Data
Semi-structured data does not follow a rigid schema but contains tags or markers to
separate data elements, making it more flexible than structured data.
Data formatted in JavaScript Object Notation (JSON) and Extensible Markup Language (XML),
often used for web applications and APIs.
Log Files
System and application logs generated by servers, applications, and network devices. These
files often contain valuable insights for monitoring and troubleshooting.
Sensor Data
Data from Internet of Things (IoT) devices, including temperature readings, humidity levels,
and other environmental sensors.
Unstructured data lacks a predefined format, making it the most challenging type of data to
store and analyze. OneLake can handle vast amounts of unstructured data efficiently.
Multimedia Files
Images, videos, and audio files used in media production, marketing, and communications.
Documents
Text documents, PDFs, presentations, and other file types used in business operations and
communications.
Data from social media platforms such as posts, comments, likes, and shares.
Web Data
Content scraped from websites, including HTML, CSS, and JavaScript files.