0% found this document useful (0 votes)
4 views

Slide 2

The document discusses the differences between one-time and recurring data collection, highlighting the need for ongoing data in dynamic fields like logistics. It outlines various data sources, including internal records and third-party data, and describes data formats such as JSON, CSV, and XML. Additionally, it explains structured, unstructured, and semi-structured data, along with key terms like features, values, and data examples relevant to machine learning.

Uploaded by

meorz90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Slide 2

The document discusses the differences between one-time and recurring data collection, highlighting the need for ongoing data in dynamic fields like logistics. It outlines various data sources, including internal records and third-party data, and describes data formats such as JSON, CSV, and XML. Additionally, it explains structured, unstructured, and semi-structured data, along with key terms like features, values, and data examples relevant to machine learning.

Uploaded by

meorz90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

One-Time vs.

Recurring Collection
"First, data may need to be collected just once for a specific purpose, such as a
one-time analysis of a historical dataset. This might happen if we're studying past
trends or conducting a one-off analysis of company metrics. However, in other
cases, we may need ongoing data collection, which allows us to stay updated with
real-time or frequently updated data. This might be particularly useful in fields
like logistics or supply chain management, where conditions can change daily or
even hourly."

Sources of Data
"Data can come from different sources. For instance, a company might use its own
data, such as internal sales records or employee performance metrics.
Alternatively, we might rely on third-party data�data collected from external
organizations. This is common in market research, where external data provides
valuable context."

Ongoing Data Collection Examples


"When we talk about ongoing analysis, some examples of data sources include IoT
sensors, which collect real-time data from devices and machinery. We could also
monitor sales transactions as they happen or review system logs to track digital
events or errors. Each of these sources provides fresh data continuously, allowing
for timely analysis."

Data Formats
"Lastly, data can come in various formats, including JSON, CSV, XML, and relational
databases. JSON is common in web applications, CSV is widely used for spreadsheet-
like data, XML is prevalent in configuration and data exchange, and SQL or
relational databases are typical for complex, structured datasets. Recognizing
these formats is essential because each requires a different approach for
processing and analysis."

Structured Data
"First, structured data is highly organized, making it easy to search, filter, and
extract information. This data usually comes in the form of spreadsheets or
databases, where each entry follows a consistent format. For example, customer
records in a database are often structured, with clear fields for names, contact
information, and transaction histories. The structure helps machine learning
algorithms quickly identify patterns and relationships within the data."

Unstructured Data
"Next, we have unstructured data, which doesn�t have a defined structure and is
harder to query directly. This includes images, videos, audio files, and large
blocks of text. For instance, if we�re training a model to recognize images or
understand spoken language, we�re working with unstructured data. Unlike structured
data, this type requires additional processing to make it useful for machine
learning, such as tagging or converting it into numerical features."

Semi-structured Data
"Finally, there�s semi-structured data, which has elements of both structured and
unstructured formats. For example, in emails, headers like sender, receiver, and
timestamp have a predictable structure, but the body content is unstructured text.
Similarly, XML and JSON documents can be structured in parts, but they might also
contain free-form information that requires extra processing. Semi-structured data
offers some advantages for querying, but also presents challenges similar to
unstructured data."

This slide shows a sample dataset, which is a common format we encounter in machine
learning. Here, I�ll explain some important terms: Features, Values, and Data
Examples, which are crucial for understanding how data is structured."
Data Example
"First, let�s look at the entire row. Each row here represents a single data
example or record. In machine learning, each data example is an individual entry in
our dataset, containing information about a single subject or instance. In this
case, each row represents a user with details about their age, job, marital status,
education, and other factors."

Feature
"Next, the columns at the top�like user_id, age, job, marital, and so on�are called
features. Features are the attributes or characteristics of each data example that
our machine learning model will use to make predictions or gain insights. Each
feature represents a specific type of information. For instance, age represents the
user�s age, job represents their job type, and education indicates their
educational background."

Value
"Now, each cell within the table holds a specific value, which is the actual data
for a given feature in a data example. For example, the value for the job feature
in the second row is 'technician,' and the value for the age feature in that same
row is '44.' Values are the data points that machine learning algorithms analyze to
detect patterns."

You might also like