Hanuman

The Invisible Influencers project focuses on finding information about people who fit the common-sense but not statutory definition of "lobbyist," mainly by extracting information from the text of the staff biographies of such people posted on the websites of their employers. This repository houses various bits related to the retrieval of lobbyist bios:

chrome_app contains a Chrome app that presents users with an interface for annotating lobbying firm websites to indicate which pages are bio pages, and which parts of the page are the person's name and bio text.
data_collection contains the Django app with which the Chrome app communicates
extraction contains another Django app that generalizes based on the user input collected with the Chrome app, using machine learning to find more bio pages on a given firm's site based on a hand-collected sample, and to extract the content (names, bios) from those pages. This component depends on the nanospider repo for spidering, and the mlscrape repo for building machine learning models to recognize pages and relevant content.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
build		build
chrome_app		chrome_app
data_collection		data_collection
extraction		extraction
hanuman		hanuman
landing_site		landing_site
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
build_ext.py		build_ext.py
gevent_manage.py		gevent_manage.py
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hanuman

About

Uh oh!

Releases

Packages

Languages

License

sunlightlabs/hanuman

Folders and files

Latest commit

History

Repository files navigation

Hanuman

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages