0% found this document useful (0 votes)
23 views8 pages

1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)

machine learning knowledge

Uploaded by

gateracalvin.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views8 pages

1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)

machine learning knowledge

Uploaded by

gateracalvin.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1.2.

PREPARING MACHINE LEARNING ENVIRONMENT


It involves setting up both the hardware (CPU, RAM, storage) and software (OS, python, package
manager: pip or conda, libraries: matplotlib, Numpy, pandas, keras, tensorflow…, IDE, ) necessary for
developing and running machine learning models.
Installation of Python (in Windows OS)
• Python is a high-level programming language that lets you work more quickly and
integrate your systems more effectively.
• Python is powerful and fast. It plays well with others. It runs everywhere. It is open,
friendly and easy to learn.
• Visit Python.org, go to downloads section, download latest version of python.
• Here are some steps to install Python on Windows:
– Select the Python version
– Download the Python executable installer
– Run the installer
– Customize the installation (optional)
– Install Python
– Verify the python installation
– Verify that PIP (Preferred Installer Program) was installed
– Install virtualenv (optional)
– Getting Started with Python

Installation of ML Tools
To install machine learning tools in Python, you can install libraries and packages, either one by
one or many at once:
− Install one by one in command prompt:
o Pip install matplotlib
o Pip install NumPy
o Pip install keras
o Pip install gekko
− Install many packages at once in command prompt (separated with a space among them):
o Pip install matplotlib numpy gekko pandas plotly
 Gekko provides an interface to gradient-based solvers for machine learning and
optimization of mixed-integer, differential algebraic equations.
– pip install gekko
 matplotlib generates plots in Python.
– pip install matplotlib
 Numpy is a numerical computing package for mathematics, science, and engineering.
– pip install numpy
 OpenCV is a package for real-time computer vision

pg. 1
– pip install opencv-python
 Pandas visualizes and manipulates data tables.
– pip install pandas
 Plotly renders interactive plots with HTML and JavaScript.
– pip install plotly
 PyTorch enables deep learning, computer vision, and natural language processing.
– pip install torch
 Beautiful Soup is a Python package for extracting (scraping) information from web
pages.
– pip install beautifulsoup4 lxml

LO1.2. Review questions


Multiple choice questions
1. What is the first step in setting up a machine learning environment on a new system?
a) Installing the latest version of TensorFlow
b) Setting up version control for the project
c) Installing Python and setting up a virtual environment
d) Downloading the dataset

2. Why is it important to use a package manager like pip or conda in a machine learning project?
a) To monitor model performance in production
b) To ensure that all dependencies and libraries are installed and updated correctly
c) To visualize data and model performance
d) To manage containerized applications

3. How can Jupyter Notebooks assist in the machine learning development process?
a) By managing machine learning experiments and deployments
b) By providing an interactive platform for code execution, data visualization, and documentation
c) By creating containerized environments for running models
d) By handling version control of code and data

pg. 2
1.3. DATA COLLECTION AND ACQUISITION
Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes.
The main techniques for gathering data are observation, interviews, questionnaires, schedules,
and surveys.
Key Terms Description
– Data: refers to the set of observations or measurements that can be used to train
a machine-learning model. Types of data for ML are: numerical data, categorical
data, time series data, and text data.
– Information: Data that has been interpreted and manipulated and has now some
meaningful inference for the users.
– Dataset: is a collection of data that is used to train and test algorithms and models.
– Data warehouse: It involves combining data from different sources into a single
database, which can then be used for analysis and decision making.
– Big data: is a collection of data from various sources, often characterized by
volume, variety, velocity…

Identification of Source of data


• Each source of data has its own characteristics and challenges, and the choice of data source
often depends on the specific application and goals of the machine learning project.
• The sources of data can be broadly categorized into several types:
 Sensors and IoT devices: Data come from various sensors and used in various projects
 Computer and smartphones: generate a wealth of data that can be valuable for
machine learning applications.
 Social data: refers to the information generated and shared through social media
platforms, forums, and other online social interactions.
 Transactional data: Data related to transactions or interactions
• In ML, various data are used to train, validate, and test models. Each data type serves
different purposes and is crucial for building robust and effective ML systems.
 Sensors and IoT devices: Data can come from various sensors, such as those monitoring
Environmental temperature, humidity, speed, Motion, Proximity, Image, pressure,
Biometric, sound, or other physical parameters.
 Social data: refers to the information generated and shared through social media
platforms, forums, and other online social interactions. Examples: Social media posts,

pg. 3
user profiles, Hashtags and Keywords (trending topics), geolocation data, Engagement
Metrics (likes, shares, comments)
 Transactional data: Data related to transactions or interactions, such as:
o E-commerce Transactions: Purchase records, shopping cart data.
o Financial Transactions: Credit card transactions, bank transfers.
 Computer: different data are generated from computers, such as: Text Data, Images,
Audio Data, Video Data
o Text Data: Articles, books, social media posts, emails, and other text sources used
in natural language processing tasks.
o Images: Photos, diagrams, and graphics used in computer vision tasks. Formats
include JPEG, PNG, GIF, etc.
o Audio Data: Speech recordings, music, and other sound data used in speech
recognition and audio analysis.
o Video Data: Video files used for tasks like action recognition and video
classification.
 Smartphone: generate a wealth of data that can be valuable for machine learning
applications, such as:
o Sensor data: accelerometer, gyroscope, magnetometer, GPS, proximity,
temperature and humidity, light, camera, microphone…
o Text data: user input’s Text messages, emails, or notes can be analyzed for
sentiment analysis, language modeling, or personalized recommendations.
o Usage data: call and SMS logs, app usage…

pg. 4
• Data representing quantities & measurable values
Numerical Data •Use Cases: Regression tasks, statistical analysis, and feature engineering.

•Data consisting of natural language text


Textual Data •Use Cases: Sentiment analysis, topic modeling, and language translation.

•Data consisting of digital images


Image Data •Use Cases: Object detection, image classification, and facial recognition.

•Data consisting of sound recordings or audio signals


Audio Data •Use Cases: Speech recognition, audio classification, and music
recommendation.

•Data consisting of moving visual images


Video Data •Use Cases: Action recognition, video classification, and object tracking.

•Data related to geographical locations and spatial relationships


Geospatial Data •Use Cases: GIS, route optimization, and location-based services.

•Data generated from transactions or interactions within systems


Transactional
•Use Cases: Fraud detection, recommendation systems, and customer
Data behavior analysis.

Description of 6 V's of Big Data:


– Volume: Scale or amount of Data.
– Variety: Different forms of data – healthcare, images, videos, audio clippings.
– Velocity: Rate of data streaming and generation.
– Value: Meaningfulness of data in terms of information
– Veracity: Certainty and correctness in data.
– Viability: ability of data to be used & integrated into different systems &
processes.

pg. 5
Description of types of data
Various types of data are used depending on the problem and the model being applied.
The common types of data re:
– Structured data: This type of data is highly organized and easily searchable. It includes:
o Tabular Data: Data organized in rows and columns, like spreadsheets or SQL databases.
Eg.: customer information, financial records, and sensor readings.
o Time-Series Data: Data collected at successive time points, such as stock prices or weather
measurements. Eg.: stock prices, weather data, and website traffic logs.
– Semi-structured data: This type of data does not fit neatly into tables but contains some
organizational properties, such as:
o XML Data: Data marked up with tags, often used in web services.
o JSON Data: Data in JavaScript Object Notation format, commonly used in web
applications.
– Unstructured data: This type of data does not follow a specific format or structure. It
includes:
o Text Data: Data in the form of text, such as articles, emails, or social media posts.
o Image Data: Data consisting of visual information, used in tasks like image classification
or object detection.
o Audio Data: Data captured in audio formats, like speech & music, used in speech
recognition or audio classification.
o Video Data: Data in the form of video sequences, which is used in video classification or
activity recognition.
– Categorical data: Data that represents categories or labels, such as colors, types of products,
or user demographics. It can be:
o Nominal: Categories without any inherent order, like types of animals.
o Ordinal: Categories with a specific order, like customer satisfaction ratings.

pg. 6
– Numerical data: Data represented by numbers, which can be:
o Discrete: Countable values, such as the number of visitors.
o Continuous: Measurable values, such as temperature or height.

LO1.3. review questions


1. Which type of data consists of discrete labels or categories rather than continuous values?
a) Numerical data
b) Categorical data
c) Textual data
d) Temporal data
2. What type of data is characterized by its sequence and time-based nature, such as stock prices or weather data?
a) Categorical data
b) Time-series data
c) Image data
d) Audio data
3. In machine learning, what is "structured data"?
a) Data that is unorganized and does not fit into predefined schemas
b) Data that is organized in tables with rows and columns, like databases or spreadsheets
c) Data that consists of unstructured text and requires natural language processing
d) Data that is represented in a sequential manner, such as time-series data
4. Which type of data is typically used for image recognition tasks and consists of pixel values arranged in a grid
format?
a) Textual data
b) Numerical data
c) Categorical data
d) Image data
5. What is the key characteristic of "unstructured data"?
a) It is organized in a fixed schema with predefined rows and columns
b) It does not follow a specific format or structure and often requires significant preprocessing
c) It is represented as a time series or sequence
d) It consists of categorical labels and classes
6. Which type of data involves natural language text and is commonly used in tasks such as sentiment analysis and
text classification?
a) Audio data
b) Image data
c) Textual data
d) temporal data
7. What kind of data would you typically find in a relational database, where data is stored in tables with rows and
columns?
a) Unstructured data
b) Structured data
c) Temporal data
d) Image data
8. Which type of data is commonly used in machine learning for tasks involving sound and speech recognition?
a) Image data
b) Numerical data
c) Audio data
d) Categorical data
9. What is "semi-structured data"?
a) Data that is partially organized and contains both structured and unstructured elements, like JSON or XML files
b) Data that is fully organized in tables with rows and columns
c) Data that does not fit into any specific schema or structure
d) Data that consists of continuous numerical values

pg. 7
10. In which scenario would you use "categorical data" in machine learning?
a) Predicting house prices based on historical data
b) Classifying emails into different categories like spam or not spam
c) Forecasting future stock prices based on past trends
d) Analyzing customer reviews for sentiment

1.4. INTERPRET DATA VISUALIZATION


It involves analyzing visual representations of data to extract meaningful insights.

Description of data Visualization tools


• Matplotlib: It is a popular Python library for creating static, interactive, and animated
visualizations. It provides a flexible way to generate plots and charts, and it’s widely used
for data analysis and visualization tasks.

Matplotlib supports various types of plots, including:


 Line Plot: plt.plot(x, y)
 Scatter Plot: plt.scatter(x, y)
 Bar Chart: plt.bar(x, y)
 Histogram: plt.hist(data)
 Pie Chart: plt.pie(sizes, labels=labels)
 Box Plot: plt.boxplot(data)
Description of Matplotlib figure

Figure 1: parts of matplotlib figure

pg. 8

You might also like