CulturalAnalytics

Experiments and tutorials from the wide field of cultural analytics based on textual and multimodal corpora.

Analyzing Unstructured Data

In this tutorial you will learn to:

web/screen scrape relatively unstructured data from the Wikipedia
transform unstructured data into tabular data to facilitate processing with Python
create graph data from your data to visualize your data as networks
export Python-created data to use it with JavaScript visualization libraries such as D3.js

Would you should already know:

a little Python 3
some minor HTML
some JavaScript if you want to understand the web-based visualization at the end of the tutorial

This notebook comes with a requirements.txt file to facilitate the installation of package dependencies. To install the dependencies, launch the following command from the command line before you start the notebook:

pip3 install -r requirements.txt

Running the Tutorial from a Container

In case you have Podman installed, there is a Dockerfile available in the containerdirectory.

Docker might work as well but has not been tested.

Building the Tutorial's Image

Make sure that you are running the following commands from within the containerdirectory.

podman build  -t ibi_runtime .

This will take some time as it will build everything from scratch.

Running the Container

After the creation of the image, you are set to run the container by executing the following command:

podman run  -p 127.0.0.1:8888:8888 -p 127.0.0.1:8000:8000 -i -t localhost/ibi_runtime

This command will ensure that you can access the Jupyter notebook under http://localhost:8888/notebooks/WikipediaTest.ipynb and the web server that can be launched from within the notebook under port 8000. The address 127.0.0.1 is represented by localhost.

Furthermore, the command will open a terminal connection to the container in order to display all log output from Jupyter and launch the notebook automatically.

Stopping the Container

To stop the container, you can use the shut down entry from Jupyter's File menu or activate the terminal running Jupyter and press CTRL+C. You will then be asked immediately if you want to shut down the Jupyter server. Enter y and Jupyter will shut down.

After the server has been stopped, the container will exit as well.

Attention: You will also lose all your data that has been created from within the notebook!

Multimodal Analysis and Enrichment of a Library Metadata Corpus

This notebook eventually evolved into a TPDL publication. ATTENTION! The notebook is no longer maintained here. It has been moved to a separate repository.

In this tutorial, you will learn to read metadata from an OAI-PMH data provider and how to convert the retrieved data from Dublin Core to a pandas data frame.
Furthermore, you will carry out some basic data analysis on your data in order to find out if the data is corrupt or unclean. Based on an example, you will clean some aspects of your data using techniques borrowed from machine learning.
Finally, you will visualize data with the help of a network graph.

Sentiment Analysis on the Berlin State Library Catalog and Amazon

In this tutorial, you will learn how to read from a unstructured and structured dataset, create a dataframe from this raw data, and to visualize characteristics from the data in order to find out whether the titles of a research library are truly neutral from a sentiment analysis perspective and how they compare to a sample from books sold by Amazon.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
container		container
data		data
figures		figures
graphs		graphs
img		img
pickelz.bak		pickelz.bak
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
20century_events.csv		20century_events.csv
20century_events.xls		20century_events.xls
LICENSE		LICENSE
README.md		README.md
SOM-Test.ipynb		SOM-Test.ipynb
SocialAnalysisStabikat.ipynb		SocialAnalysisStabikat.ipynb
Stabi Text Clustering.ipynb		Stabi Text Clustering.ipynb
Stabi_ImageAnalytics.ipynb		Stabi_ImageAnalytics.ipynb
WikipediaTest.ipynb		WikipediaTest.ipynb
citynames.txt		citynames.txt
cleanedData.csv		cleanedData.csv
clusterCEDD.py		clusterCEDD.py
dc_dummy.png		dc_dummy.png
dc_superheroes.csv		dc_superheroes.csv
downloadIssues.txt		downloadIssues.txt
http_server.py		http_server.py
krieg.csv		krieg.csv
liste-deutscher-staedte.txt		liste-deutscher-staedte.txt
marvel_superheroes.csv		marvel_superheroes.csv
rawData.csv		rawData.csv
rawDataSpatialClean.csv		rawDataSpatialClean.csv
rawDataSpatialClean_10k.csv		rawDataSpatialClean_10k.csv
rawData_10k.csv		rawData_10k.csv
requirements.txt		requirements.txt
save_120k_dc_all.pickle.zip		save_120k_dc_all.pickle.zip
sentimentAnalysis.ipynb		sentimentAnalysis.ipynb
slides.slides.html		slides.slides.html
spatialnames.xlsx		spatialnames.xlsx
spatialnamesCorrections.xlsx		spatialnamesCorrections.xlsx
vd18.csv		vd18.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CulturalAnalytics

Analyzing Unstructured Data

Running the Tutorial from a Container

Building the Tutorial's Image

Running the Container

Stopping the Container

Multimodal Analysis and Enrichment of a Library Metadata Corpus

Sentiment Analysis on the Berlin State Library Catalog and Amazon

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

elektrobohemian/CulturalAnalytics

Folders and files

Latest commit

History

Repository files navigation

CulturalAnalytics

Analyzing Unstructured Data

Running the Tutorial from a Container

Building the Tutorial's Image

Running the Container

Stopping the Container

Multimodal Analysis and Enrichment of a Library Metadata Corpus

Sentiment Analysis on the Berlin State Library Catalog and Amazon

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages