Digital Assignment1 - Openrefine DC
Digital Assignment1 - Openrefine DC
Distributed Computing
OpenRefine :
OpenRefine (previously Google Refine) is a powerful tool for working with messy
data: cleaning it; transforming it from one format into another; and extending it
with web services and external data.
OpenRefine always keeps your data private on your own computer until us want to
share or collaborate. Your private data never leaves your computer unless you want
it to. (It works by running a small server on your computer and you use your web
browser to interact with it)
Why -Spreadsheets can also refine a dataset but they are not the best tool for it as
Openrefine cleans data in a more systematic controlled manner. While using
historical data, we come across issues like blank fields, duplicate records,
inconsistent formats and using Openrefine tool can help to resolve such issues.
When -Now data analysis play an important role in business. Data analysts
improve decision making, cut costs and identify new business opportunities.
Analysis of data is a process of inspecting, cleaning, transforming, and modelling
data with the goal of discovering useful information, suggesting conclusions, and
supporting decision making. So, to ensure the accuracy of our analysis, we have to
clean our data
Cleaning messy data: for example if working with a text file with some
semi-structured data, it can be edited using transformations, facets and
clustering to make the data cleanly structured.[8]
Transformation of data: converting values to other formats, normalizing and
denormalizing.
Parsing data from web sites: OpenRefine has a URL fetch feature and jsoup
HTML parser and DOM engine.[9]
Adding data to dataset by fetching it from webservices (i.e. returning json).
[10] For example, can be used for geocoding addresses to geographic
coordinates.[11]
Aligning to Wikidata (formerly Freebase[12]): this involves reconciliation -
mapping string values in cells to entities in Wikidata.
Data Normalization
Column Reorganization
Faceting and Clustering
Tracking Operations
Exporting Data
Why OpenRefine is a better tool?
Strengths:
3. It has a Browser based interface, and so can handle more data efficiently.
4. Openrefine has a strong feature in extending data – user can use it to find
Meta Data and it can be used to correlate with it.
Weakness:
2. Unfortunately Google has removed support for this tool, making few of its
features redundant.
OpenRefine is a desktop application in that you download it, install it, and run it on
your own computer. However, unlike most other desktop applications, it runs as a
small web server on your own computer and you point your web browser at that
web server in order to use Refine. So, think of Refine as a personal and private
web application.
Requirements
1. Java JRE/JDK installed (If you are running a 64 bit operating system, then
it's recommended that you install 64 bit Java)
2. A Supported OS: Windows, Linux, macOS
Release Version
OpenRefine requires you to have a working Java JRE, otherwise you will
not be able to start OpenRefine. (the commmand window will just open and
close quickly after you double click on OpenRefine.exe)
Download OpenRefine here.
Install it as detailed below for your operating system
o Windows
o macOS
o Linux
As long as OpenRefine is running, you can point your browser at
http://127.0.0.1:3333/ to use it, and you can even use it in several browser
tabs and windows.
If you're running a proxy or get a BindException, you can change the IP
configuration with -i and -p, see Running & Configuration below, or use
refine -help for options.
Windows
Install: Once you have downloaded the .zip file, uncompress it into a folder
wherever you want (such as in C:\Open-Refine).
Run: Run the .exe file in that folder. You should see the Command window in
which OpenRefine runs. By default, the Command window has a black
background and text in monospace font in it.
Shut down: When you need to shut down OpenRefine, switch to that Command
window, and press Ctrl-C. Wait until there's a message that says the shutdown is
complete. That window might close automatically, or you can close it yourself. If
you get asked, "Terminate all batch processes? Y/N", just press Y.
MacOS
Install via Disk Image: Once you have downloaded the .dmg file, open it, and
drag the OpenRefine icon into the Applications folder icon (just like you would
normally install Mac applications). If you get a message saying "Open Refine can't
be opened because it is from an unidentified developer" you will need to open
System Preferences and go to "Security and Privacy" and the General tab. Here
you will see a message indicating that "OpenRefine was blocked from opening
because it is not from an identified developer". Click the "Open Anyway" button to
complete the OpenRefine installation. (for details WHY you have to do this, see
Issue #2191. Note that in macOS Catalina the message shown has the additional
text "macOS cannot verify that this app is free from malware", but the reason for
the message and the solution is the same)
Run: To launch OpenRefine, go to the Applications folder and double click the
OpenRefine app. You'll see the OpenRefine app appear in your dock.
Shut down: You can switch to the OpenRefine app (clicking on its icon in the
dock) and invoke its Quit command.
See also: Cannot install on Mac OS X 10.8 (Mountain Lion) - "Google Refine" is
damaged and can't be opened. You should move it to the Trash
If you use Yosemite you will need to install Java for OS X 2014-001 first.
Linux
Install / Run: Once you have downloaded the tar.gz file, open a shell and type
This will start OpenRefine and open your browser to its starting page.
./refine -i 0.0.0.0
On macOS, you can add a specific entry to the Info.plist file located within the app
bundle
(/Applications/OpenRefine.app/Contents/Info.plist):
<key>JVMOptions</key>
<array>
<string>-Drefine.host=0.0.0.0</string>
…
</array>
1.Make sure that java jdk and jre is installed with supported os like windows
,linux,mac.
5.Run the .exe file in that folder. You should see the Command window in which
OpenRefine runs. By default, the Command window has a black background and
text in monospace font in it
6.then in that specified web server the open refine will be opened .
So we create project or insert dataset and apply text filter and deleted all
unnecessary data or hide data for secure purpose and export data after data
cleaning which is useful for user.
Conclusion : Hence Openrefine can be easily been installed but When you need to
shut down OpenRefine, switch to that Command window, and press Ctrl-C. Wait
until there's a message that says the shutdown is complete. That window might
close automatically, or you can close it yourself. If you get asked, "Terminate all
batch processes? Y/N", just press Y. So by the following procedure we can
download and install the open refine successfully.