Skip to content
View tballison's full-sized avatar

Block or report tballison

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OSS-Fuzz - continuous fuzzing for open source software.

Shell 11,786 2,568 Updated Jan 7, 2026

Different example of using Nutch: with Solr, Selenium Hub, standalone web drivers

Dockerfile 2 Updated Feb 12, 2019

Index of URLs to pdf files all over the internet and scripts

Shell 25 3 Updated May 2, 2023

JPL's File Observatory App for the DARPA Safedocs Program

TypeScript 8 2 Updated Jul 10, 2023

Unofficial user interface for Apache Tika

HTML 10 1 Updated Dec 2, 2025

ExifTool meta information reader/writer

Perl 4,310 403 Updated Dec 27, 2025

Originally exported from code.google.com/p/juniversalchardet

Java 369 70 Updated Nov 22, 2025

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Java 3,499 903 Updated Jan 7, 2026

READONLY: Auto-generated mirror for https://github.com/marvinpinto/actions/tree/master/packages/automatic-releases

775 125 Updated Apr 24, 2024

Convenience Docker images for Apache Tika Server

Shell 228 79 Updated Jan 3, 2026

Towards an open source stack for e-commerce search

Ruby 150 32 Updated Oct 8, 2025

A PDF processor written in Go.

Go 8,346 584 Updated Dec 21, 2025

A vendor- and implementation-independent specification-derived, machine-readable model of PDF.

C 93 9 Updated Nov 24, 2025

A java library providing support for ASCII, XML and binary property lists.

Java 278 101 Updated Aug 10, 2024

Tabula is a tool for liberating data tables trapped inside PDF files

CSS 7,298 679 Updated Mar 14, 2025

Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures

Java 194 39 Updated Nov 21, 2025

The Validator.nu HTML parser https://about.validator.nu/htmlparser/

Java 62 27 Updated Dec 30, 2025

A Java API to read, write and create MP4 files

Java 2,794 573 Updated Aug 15, 2024

Free and Open Source, Distributed, RESTful Search Engine

Java 75,804 25,753 Updated Jan 7, 2026

Efficient indexing and retrieval of OCR bounding boxes in Solr

Java 22 2 Updated Mar 13, 2019

Plain Java unrar library

Java 305 81 Updated Dec 23, 2025

Tesseract Open Source OCR Engine (main repository)

C++ 71,756 10,453 Updated Jan 1, 2026

AFL-based fuzzing for Java

Java 238 52 Updated Jan 26, 2020

A DropWizard wrapper around Apache Tika.

Java 10 Updated Dec 22, 2016

Automated Adversary Emulation Platform

Python 6,651 1,272 Updated Jan 7, 2026

SQLite JDBC Driver

Java 3,171 656 Updated Jan 6, 2026

Apache Lucene and Solr open-source search software

4,370 2,625 Updated Sep 25, 2024

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Java 11,311 2,275 Updated Jan 5, 2026

OCR evaluation brought to you by University of Alicante

HTML 66 27 Updated Sep 1, 2022

Now stored here:

407 91 Updated Dec 11, 2020
Next