Stars
A fast and accurate POS and morphological tagging toolkit (EACL 2014)
Flink Scala API is a thin wrapper on top of Flink Java API which support Scala Types for serialisation as well the latest Scala version
Your files ready for Gen AI ✨🚀 AlcheMark is a lightweight PDF to Markdown, alchemical-inspired toolkit that transmutes PDF documents into structured Markdown pages—complete with rich metadata and n…
Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
Automated, smooth, N'th order derivatives of non-uniformly sampled time series data
This is a python implementation for stitching images.
Scala ZIO-powered Apache Parquet library
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
DOM to Semantic-Markdown for use with LLMs
A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
Utilities for latency measurement and reporting
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Tinfoil Chat - Onion-routed, endpoint secure messaging system
SNIPER / AutoFocus is an efficient multi-scale object detection training / inference algorithm
Painless relocation of Linux binaries–and all of their dependencies–without containers.
Files to create the figures in the paper "Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates"
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
A better build tool for Java, Scala and Kotlin: Simpler than Maven, easier than Gradle, with 3-7x faster dev workflows than other JVM build tools
A RESTish web API for climate change related data 🌍
Generates Fortran, C, and Python header files containing CODATA 2014 physical constants
NWChem: Open Source High-Performance Computational Chemistry
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
