02 - 16 2 Geocoding Visualization - en

The document discusses visualizing data by retrieving it from the network, processing it, storing it in a database, and then visualizing it. An example process is described that takes locations data, geocodes it using an API, stores it in a database, and then visualizes it on a map. This is presented as an example of a personal data mining workflow using Python.

Uploaded by

Box Box

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

02 - 16 2 Geocoding Visualization - en

Uploaded by

Box Box

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

So now, in this last chapter, we're going to talk a little bit about visualizing

data. But what we're really doing is we're summing everything up, because we're
going to retrieve data from the network, we're going to process the data, we're
going to store it in a database, we're going to then write it out and visualize it.
So, it's all coming together, and and it turns out that this notion of gathering
data, the data gathering using the network it's pretty common thing. It might take
a cleaning or processing step where we're- the part of the problem is when you're
pulling data off the net, you want to be able to start this process because it'll
run, and run, and then your computer will crash or it'll go to sleep or something.
So, you don't want to start from the beginning because it might be quite a bit of
data and it takes a while to do it or as we've seen in some, you might be talking
to an API that's got some rate limit that says, oh, you'd have to stop at 14 of
these things, or stop at 200 or whatever. So, this is often a restartable process
and it's usually a simple process. It's usually a relatively small amount of code,
where you have a queue of things you want to retrieve, you go to the next one and
then you stored in database, next one starting database and when you start the
process up, you start filling this database up with stuff and then if it blows up
and you restart it, the first thing it does is it reads the databases. "Oh, I don't
need any of those", and then it starts to get the next one, and the next one, and
the next one, and the next one. That is how you make this restartable. Databases
are really good at having it, so that your program that's writing to the database
can blow up and you don't corrupt your data. You don't have partial data, it's
either written or it's not written and so these things can blow up in. Sometimes
you just blow them up, because you want to blow them up and start them up, and you
start them up again, they scan down, say, ''Where was I? Oh, I'll start here, here,
here, here. '' So, this is often a slow and restartable process. It also might be
limited but for some reason. So, this runs for awhile and the third thing we'll do
in this chapter, it might run for days actually. Then, you have your data and then
you start doing stuff inside your computer where you don't really care so much
about the network, it might be this is raw data that came in off the APIs and you
want to make the data in some new little format, so you might go from one database
to another database or a database to a file and produced data that's really ready
for visualization. If this might be a little complex or there might be flaws in it,
you might write scanners that go like, "Oh, wait a sec, this is inconsistent,
sometimes it looks like this and sometimes it looks like that, so I'll clean that
stuff up." Then, using some visualization or doing some Python Programs that loop
through the data once it's cleaned up and then do some summing or adding or who
knows what they are that they're doing, but analyzing or visualizing. What we're
going to use is we're going to use things like Google Maps to do our visualization,
a lot of JavaScript and a thing called D3.js, which is a JavaScript library. Now,
in this class, we're not teaching a JavaScript, we're not going teach you Google
Maps. I provided all these things, so that when you run these programs, that stuff
is all there. But if you want to learn and see some examples of how to make a
little simple JavaScript visualization with a line or a word cloud or a map, we've
got it, and you can take a look at those things. Now, this is one form of data
mining and its really a data mining for an individual, where you're pulling this
data, you're getting at local and then you're working with it there are other much
more sophisticated data mining technologies that you might find yourself using. But
often, you'll also find Python is part of these or Python helps you manage these or
you write a Python Program to scan through these things or to prepare them or to do
something. So, there's lots of different data mining technologies, this is just one
oversimplified very Python-oriented data mining technology. I'd call this personal
data mining. You should take classes. If you really want to become a Data Mining
Expert, this is just giving you some of the skills that we've learned in this class
and solving some data mining. So, the first application that we're going to data
mine is an extension of an application we played with back in the JSON chapter. The
idea is it has a Q of Locations. These are not pretty locations, meaning their user
typed in your locations, they're actually from data from many years ago. It's
anonymized data from the students who actually took one of my very first MOOCs,
MOOC on Internet history, but it's reduced in anonymized just play with it. But
it's not accurate. We don't have GPS coordinates. But if we use the Google GeoData
API, but JSON we can do this, but we need to avoid rate limiting, so we're going to
cache this in a database. Meaning, we're only going to retrieve data at once and
then we're going to use the Google Maps API to visualize this in a browser. The
sample code is right there and that sample code geodata.zip has a read me and it
tells you exactly what to do to run this and it shouldn't be very hard for you to
run it and produce a nice visual result. Here's a basic process diagram of what's
going to happen, there is a list of the things to retrieve called where.data is
just a list of the locations, but these are not correct, they don't have GPS, there
just a as typed into a text field by a user and Geoload is going to start and start
reading this and it checks to see if it's already in the database. This is a
restartable process as I mentioned and then it looks to see the first unretrieved
data and then it goes out and does a web service, parses that then puts it into the
database and then goes to the next one, parses that puts it in a database and this
runs for awhile and then maybe you blows up then you fix whatever or you start your
computer backup and runs for a while. So, this is a restartable process that in
effect is adding stuff to this database, it's an SQLite database and you can use
the SQLite Browser to look at this if you like stuff we did in the database
chapter. So, you can run that, you can see what you got, run at some more sewage
which he got, debug it by using the SQLite Browser. Then at some point you've got
all of your data and you want to you've got a couple of things we got this
application called geodump.py that read through all of this data and then print
some information out, nice summary information. It's really common to want to do
this to get some summary information just for sanity checking, so you don't have to
use SQLite Browser but this also writes out a little JavaScript file called
where.js which then combined with where.html and the Google APIs. This uses
JavaScript to put all these little pins on based on whatever data is in this
database. So, that's our first end-to-end spider process visualize. First thing.
So, up next, we're going to show how we can use this to build a very simple search
engine and then run the PageRank algorithm.

Black Hat Rust
83% (6)
Black Hat Rust
357 pages
Morse 4400 Manual
100% (1)
Morse 4400 Manual
18 pages
Corrosion Inhibitor Guidelines
100% (2)
Corrosion Inhibitor Guidelines
66 pages
General Form 74A
100% (1)
General Form 74A
2 pages
Laurence S Greene - Training Young Distance Runners (2014, Human Kinetics) - Libgen - Li
100% (2)
Laurence S Greene - Training Young Distance Runners (2014, Human Kinetics) - Libgen - Li
257 pages
Internship Report
No ratings yet
Internship Report
27 pages
The Orbitdb Field Manual
No ratings yet
The Orbitdb Field Manual
90 pages
Java Programming for Kids: Learn Java Step By Step and Build Your Own Interactive Calculator for Fun!
From Everand
Java Programming for Kids: Learn Java Step By Step and Build Your Own Interactive Calculator for Fun!
R. Chandler Thompson
2/5 (4)
01 - 16 1 Geocoding - en
No ratings yet
01 - 16 1 Geocoding - en
3 pages
01 - Worked Example Geodata Chapter 16.en
No ratings yet
01 - Worked Example Geodata Chapter 16.en
3 pages
01 Gmane-Introduction - en
No ratings yet
01 Gmane-Introduction - en
2 pages
01 - Page Rank Overview - en
No ratings yet
01 - Page Rank Overview - en
2 pages
Summer Training 2020 PDF
No ratings yet
Summer Training 2020 PDF
23 pages
08 - Data Visualization
No ratings yet
08 - Data Visualization
38 pages
Big Data Visualizer Course Notes
No ratings yet
Big Data Visualizer Course Notes
20 pages
Black Hat Rust Deep dive into offensive security with the Rust programming language Sylvain Kerkour - The ebook with rich content is ready for you to download
100% (1)
Black Hat Rust Deep dive into offensive security with the Rust programming language Sylvain Kerkour - The ebook with rich content is ready for you to download
67 pages
DS Retest
No ratings yet
DS Retest
18 pages
Web Scraping Handbook
No ratings yet
Web Scraping Handbook
115 pages
Analyzing and Visualizing Data With F# PDF
No ratings yet
Analyzing and Visualizing Data With F# PDF
56 pages
Code Innovators
No ratings yet
Code Innovators
11 pages
Firebase: Martin Galajda, Lenka Janečková
No ratings yet
Firebase: Martin Galajda, Lenka Janečková
27 pages
Roadmap of Various Developer Domains 1719806904
No ratings yet
Roadmap of Various Developer Domains 1719806904
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
MR Databases
No ratings yet
MR Databases
52 pages
Courses Springboard To Be Shared
No ratings yet
Courses Springboard To Be Shared
103 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
No ratings yet
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
4 pages
cockroach usecases and syntax
No ratings yet
cockroach usecases and syntax
4 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
HTML For Novices By Novices
From Everand
HTML For Novices By Novices
Mike Abelar
No ratings yet
Build Your Own Database From Scratch-2023-英文版
No ratings yet
Build Your Own Database From Scratch-2023-英文版
120 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
(Ebook) Black Hat Rust: Applied offensive security with the Rust programming language by Sylvain Kerkour 2024 scribd download
100% (8)
(Ebook) Black Hat Rust: Applied offensive security with the Rust programming language by Sylvain Kerkour 2024 scribd download
65 pages
[Ebooks PDF] download (Ebook) Black Hat Rust: Applied offensive security with the Rust programming language by Sylvain Kerkour full chapters
100% (4)
[Ebooks PDF] download (Ebook) Black Hat Rust: Applied offensive security with the Rust programming language by Sylvain Kerkour full chapters
81 pages
Build Your Own Database From Scratch 1nbsped 9798391723394
100% (1)
Build Your Own Database From Scratch 1nbsped 9798391723394
120 pages
AttainU Software Engineering Syllabus
No ratings yet
AttainU Software Engineering Syllabus
7 pages
Full Stack Curriculum.
No ratings yet
Full Stack Curriculum.
8 pages
Spring Boot 2: How To Get Started and Build a Microservice - Third Edition
From Everand
Spring Boot 2: How To Get Started and Build a Microservice - Third Edition
Jens Boje
5/5 (1)
Designing Datvault 2.0
No ratings yet
Designing Datvault 2.0
18 pages
Introduction to Data Engineering Daniel Beach 2024 scribd download
100% (3)
Introduction to Data Engineering Daniel Beach 2024 scribd download
65 pages
Black Hat Rust
No ratings yet
Black Hat Rust
364 pages
MongoDB Administrator Training
100% (1)
MongoDB Administrator Training
216 pages
ADMT end war
No ratings yet
ADMT end war
30 pages
23 Big Data and Data Wrangling
No ratings yet
23 Big Data and Data Wrangling
56 pages
Data Wrangling with JavaScript 1st Edition Ashley Davis pdf download
100% (1)
Data Wrangling with JavaScript 1st Edition Ashley Davis pdf download
64 pages
Marko Grobelnik, Blaz Fortuna, Dunja Mladenic Jozef Stefan Institute, Slovenia
100% (1)
Marko Grobelnik, Blaz Fortuna, Dunja Mladenic Jozef Stefan Institute, Slovenia
107 pages
6. Introduction-to-Data-Storage-and-Retrieval
No ratings yet
6. Introduction-to-Data-Storage-and-Retrieval
26 pages
Video 3 What Is Data
No ratings yet
Video 3 What Is Data
3 pages
I P Latex D M - B T A F E CV: Ndividual Roject Ocumentation
No ratings yet
I P Latex D M - B T A F E CV: Ndividual Roject Ocumentation
36 pages
Backend Merged
No ratings yet
Backend Merged
14 pages
MapReduce Book Final
No ratings yet
MapReduce Book Final
175 pages
Beginning Python Visualization Crafting Visual Transformation Scripts-4304
No ratings yet
Beginning Python Visualization Crafting Visual Transformation Scripts-4304
9 pages
Mongodb Essentials Training
No ratings yet
Mongodb Essentials Training
272 pages
An Introduction to Website Performance: How to Outrun the Zombie Hordes: Undead Institute, #15
From Everand
An Introduction to Website Performance: How to Outrun the Zombie Hordes: Undead Institute, #15
John Rhea
No ratings yet
Java with TDD from the Beginning
From Everand
Java with TDD from the Beginning
Alonso Delarte
No ratings yet
Entity Framework Core
From Everand
Entity Framework Core
Kenji Elzerman
No ratings yet
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
From Everand
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Fabio Nelli
No ratings yet
Kanban Fundamentals How To Become Insanely Productive
From Everand
Kanban Fundamentals How To Become Insanely Productive
SADANAND PUJARI
No ratings yet
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
From Everand
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
Abiprod Pty Ltd
5/5 (10)
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
From Everand
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
Bob Mather
5/5 (1)
The Ultimate Aws Cloud Practitioner Mastery: Mastering AWS Essentials, A Comprehensive Guide for Cloud Practitioners
From Everand
The Ultimate Aws Cloud Practitioner Mastery: Mastering AWS Essentials, A Comprehensive Guide for Cloud Practitioners
Furuta Kimiko
No ratings yet
Instant PostgreSQL Backup and Restore How-to
From Everand
Instant PostgreSQL Backup and Restore How-to
Shaun Thomas
No ratings yet
The Computer User's Survival Handbook: Why Is My Computer Slow?
From Everand
The Computer User's Survival Handbook: Why Is My Computer Slow?
Laszlo Szenes
No ratings yet
03 - Worked Example Twfriends Py Chapter 15.en
No ratings yet
03 - Worked Example Twfriends Py Chapter 15.en
4 pages
02 - 1 2 Hardware Overview - en
No ratings yet
02 - 1 2 Hardware Overview - en
3 pages
02 - Fun The Textbook Authors Meet Pycon2015.en
No ratings yet
02 - Fun The Textbook Authors Meet Pycon2015.en
1 page
01 - 1 1 Why Program - en
No ratings yet
01 - 1 1 Why Program - en
4 pages
Cspl302 Data Structure and Algorithms Lab
No ratings yet
Cspl302 Data Structure and Algorithms Lab
54 pages
Unit-IV - Money Laundering
No ratings yet
Unit-IV - Money Laundering
8 pages
Catalogo SECO
No ratings yet
Catalogo SECO
340 pages
Yamaha RD 350
100% (1)
Yamaha RD 350
6 pages
Ducarme et al. 2013 - What are charismatic species for conservation biologists
No ratings yet
Ducarme et al. 2013 - What are charismatic species for conservation biologists
8 pages
Languge Differences in Karachi
No ratings yet
Languge Differences in Karachi
14 pages
Loading and Saving Brushes - Krita Manual 5.0.0 Documentation
No ratings yet
Loading and Saving Brushes - Krita Manual 5.0.0 Documentation
15 pages
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
100% (1)
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
15 pages
Alyssa Milano Meal Plan
No ratings yet
Alyssa Milano Meal Plan
6 pages
Scrollwork - A Tracking Character Sheet
No ratings yet
Scrollwork - A Tracking Character Sheet
3 pages
Mushroom
No ratings yet
Mushroom
52 pages
Ananas Comosus Them d 2 Pineapple
No ratings yet
Ananas Comosus Them d 2 Pineapple
3 pages
Dsa Q
No ratings yet
Dsa Q
45 pages
The Origin of Financial Crises by George Cooper
0% (1)
The Origin of Financial Crises by George Cooper
6 pages
Lamri Co
No ratings yet
Lamri Co
3 pages
Hindi Atoms & Molecules in One Shot Anubha
100% (1)
Hindi Atoms & Molecules in One Shot Anubha
112 pages
Class X It Sample Paper
No ratings yet
Class X It Sample Paper
29 pages
Check Point Admin Guide - Dynamic - CLI - Commands
No ratings yet
Check Point Admin Guide - Dynamic - CLI - Commands
2 pages
Grade 9 Agriculture Nutrition Schemes of Work Term 1
100% (1)
Grade 9 Agriculture Nutrition Schemes of Work Term 1
5 pages
Mass Spectrometry Assignment-4
No ratings yet
Mass Spectrometry Assignment-4
2 pages
LNT80 - 80,000m LNG Carrier: Main Dimensions Machinery & Propulsion
No ratings yet
LNT80 - 80,000m LNG Carrier: Main Dimensions Machinery & Propulsion
2 pages
DR BOWDEN's CV
No ratings yet
DR BOWDEN's CV
9 pages
Cu Medical I Pad Aed Manual
No ratings yet
Cu Medical I Pad Aed Manual
65 pages
Algebra 01
No ratings yet
Algebra 01
2 pages
How To Practice Chorale 6 Feb 2019 at 1:17 PM PDF
100% (1)
How To Practice Chorale 6 Feb 2019 at 1:17 PM PDF
1 page
Xii Ip Records
No ratings yet
Xii Ip Records
10 pages

02 - 16 2 Geocoding Visualization - en

Uploaded by

02 - 16 2 Geocoding Visualization - en

Uploaded by

So now, in this last chapter, we're going to talk a little bit about visualizing

You might also like