You Are Asked To Enter Student Names Below: 1. 2. 3. 4
You Are Asked To Enter Student Names Below: 1. 2. 3. 4
Teamwork policy
1. 2.
3. 4.
Only one submission from the team is expected. It will be done by the current team leader;
however this role should be rotated from one teamwork to another. Screen0 (see below)
should be made on the Leader’s computer.
Pre-requisite
Review the instruction from Lab5 and Lab6 of CYT175. Make sure that you can start Jupyter
Notebook. If needed, review the demos referred in the CYT175 Labs 5, 6, and 7 task
descriptions.
Make sure that your local Jupyter host server is up and running.
Step 0. In start menu type in Jupyter notebook, then start. Make screenshot of this starting
screen. The screenshot must contain indication of the laptop ownership (like user name):
Task description
Start Workflow
Note: Screenshots are required for each step. Include them into your submission
Step 1. Unzip the book.rar and move the folder book to your Anaconda environment.
Doing that, you make samples of code and data easily available.
Step 2. Open the Python script file and run Listing 1 portion in your notebook. Resolve error
messages if you have them. This way you are making the sample of data available for next
steps.
Step 3. Run the Listing 3-3. You set relative path for the downloaded data.
Step 4. Run Listing 3-5. At this point of time you will obtain the result showing first 5 rows from
the file.
This code defines the structure of IP Reputation Database. Run the code and observe the result.
Answer the following questions:
Step 5. Run Listing 3-6. You will see HTML formatted output of the same data frame.
Question:
1. What are Python code line lines that allow doing so (copy and paste from the code)
Step 6. Run Listing 3-8. You are now start exploring data. This portion of code demonstrates
understanding of quantitative category of data, in other words, data with values that can be
used for calculation. There is a need to generate so called the basic “descriptive statistics” (see
the definition below) on the variables. It will be used for reporting and visualization purposes.
The Run the code and see the results of calculation.
Descriptive statistics include those that summarize the central tendency, dispersion and
shape of a dataset’s distribution, excluding NaN values.
More details at
https://pandas.pydata.org/pandas-
docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.describe.html?highlight
=describe
It might happen that you receive the syntax error if you run this Listing. More complicated data
object definition is used here. It belongs to the qualitative category of data. In Pandas this class
should be declared as Categorical, and that is not what we prepared to do now. But still, take a
look at the code and the result, shown in the book. First you see the results showing the
number of malicious nodes calculated by Reliability, Risk, Type, and Country separately. With
the last outcome you can see the number of malicious nodes by Country.
Step 8. Run Listing 3-14. Number of records from the data frame will be shown as the graph,
named Summary by Country.
Questions:
• If a country does not have valid country code, will the records be taken for
calculation?
Step 9. Listing 3-15. Error expected. The result shows Reliability chart for top 10 countries (see
Figure 3-6).
Step 10. Listing 3-16. Error expected. The result shows Risk chart for top 10 countries (see
Figure 3-7).
Step 11. Run Listing 3-18. The result will show data by country in percentage. In this top ten list
you will notice that in accordance to this data sample China and US give almost 46% of the
malicious nodes in the list.
Question:
• This Lab can be completed individually or as the Team work – up to 4 people. Max
Score – 5%
• Submission includes MS Word document uploaded to the BB. The name of the
document must follow Submission Upload Requirements (see below).
Submission includes:
• Steps 1 to 6, then 8 and 11 are run by Jupyter and screenshots are present.
• Answers to the Questions included into the Steps accordingly.
• Full collection of screenshots (total 8 screenshots, some error messages are still
allowed, but majority should be ok) and correct answers – 5%
• Partially completed screenshots or not correct answers will result in some extraction
accordingly (not less than 6 screenshots and right answers) – 4%
• Less than 6 screenshots – 3%
Make online submission to BB, only one submission from your team.
If you have more than one document, wrap it up to ZIP, 7ZIP, or RAR folder
Name the file you will uploading as indicated below. The name must include:
• Course ID (CYT715)
• Section (Monday or Friday)
• What is this (e.g. lab1, assignment 1, etc )
• Authors by name(s)
Sample: CYT175MLab1_PeterJohnMohammadSue
Note: submissions that do not follow the requirements will not be accepted