Browse free open source Text Processing software and projects below. Use the toggles on the left to filter open source Text Processing software by OS, license, language, programming language, and project status.

  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 1
    RefDB is a reference database and bibliography tool for SGML, XML, and LaTeX documents, sort of a Reference Manager or BibTeX for markup languages. It is portable and known to run on Linux, Free/NetBSD, OSX, Solaris, and Windows/Cygwin.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    The CNC Header & Footer Convert is a free and beta software. It could convert the post-processed files made by the CAM software of Duct into different formats of header and footer for CNC Machine. For example, the formats of the Fanuc, Mitsubishi and
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    SEGTeX
    LaTeX package for geophysical publications
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Queequeg is an English grammar checker for non-native English speakers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 5
    TetraPack is a package with Delphi components for the TextTransformer by Dr. Detlef Meyer-Eltz. The components make it easy to parse and transform strings and files, or to build an parse tree from them.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    latexdiff is a Perl script, which compares two latex files and marks up significant differences between them (i.e. a diff for latex files). Various options are available for visual markup using standard latex packages such as "color.sty".
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    A knowledgment management system written in Java under JBoss 4.2.3 Server, with richfaces 3.3.0BETA4. Including fileconversion from html to pdf and rich:editor component without special syntaxing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Apolda is a plugin for the Gate framework (see http://sourceforge.net/projects/gate/) that annotates texts with labels of concepts from an arbitrary OWL-ontology.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Bi-gram applications based on language models produced by SRILM from Chinese Wikipedia corpus, include Chinese word segmenter, word-based (not character-based) Traditional-Simplified Chinese converter and Chinese syllable-to-word converter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 10

    DWDS/Dialing Concordance

    a collection of indexing and search tools for corpus linguists

    DWDS/Dialing Concordance (DDC) - a collection of index and search tools for corpus linguists
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Latex editor with document structure tree view and project handling. Latex output allows direct jump to warnings/errors. Projects folders gives support for figures/graphs. Editor component includes usual features like search/replace and syntax highlight.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Embeddable Predictive Text Library
    A C (and JavaScript) library providing predictive text functions. The API is very simple and provides dictionary autocomplete and partial/full matching. Sample cellphone-like examples are included.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    JReferences is a tool to store and retrieve bibliographic references from a file or MySQL database. It reads BibTeXML, DocBook XML and RIS type references, and can output these and BibTex. A bibtex like alternative is also provided for DocBook XML docu
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    The Java Text Categorizing Library (JTCL) is a pure java implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy."
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OmniHelp is a cross-platform, browser-independent, tri-pane help viewer built in pure JavaScript and CSS with HTML 4. Some functions (such as help embedding) may in the future be in Java, C, or C++; CSH is fully supported. All code is under the LGPL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Open Office Server Daemon based on older daemon written in python (oood). Open Office is unstable as a server (memory leaks, not multithreaded, ...), this daemon makes it working in long-term without having to change anything in your code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Research Description Language (RDL) is an XML application for describing and publishing scientific research efforts. Research Editor (REd) is a tool for editing RDL documents, and exporting them to LaTeX, PDF, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    SmGen

    Verilog Finite State Machine (FSM) Code Generator

    SmGen is a finite state machine (FSM) generator for Verilog. On the other hand, it is not an FSM entry tool. The input is behavioral Verilog with clock boundaries specifically set by the designer. SmGen unrolls this behavioral code and generates an FSM from it in synthesizable Verilog. Clock boundaries are explicitly provided by the designer so there is good control on the expected timing
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Ub3rMath

    Ub3rMath

    Simple math parsing library for C++

    A math parsing library for C++ with a number of powerful features to allow flexible interpretation of mathematical formula in text form.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The Vocatout is a good program to learn vocabulary, you can use it to learn all the languages in the whole world. You can also learn capitals, search for voc-files, create voc-files, share your files, scan vocabulary, take a look at your diary, Download
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    VoynichReader

    VoynichReader

    Voynich Manuscript viewer.

    Software for viewing, searching & analysing Prof. Stolfi's interlinear transcription of the Voynich Manuscript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WikiPDF is a mediawiki extension based on Wiki2PDF that adds PDF/LaTeX features to mediawiki. Wiki2PDF is a python script to convert multiple articles of a mediawiki based wiki (pre-configured to use with www.wikipedia.org) to a single LaTeX or PDF file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OCR c++ library. Include: contour recognition; vectorisation; matrix letter feature recognition; auto page segmentation and detect rotation; SS3 ASM core; XML base; web-based GUI; 99,6% printed Unicode text recognition; letter base up to 1200 letters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    crf decoder
    CRF decoder is the simplified version of CRF++, only for decoding the sequential data. It removes the training component and its correspondent codes from CRF++, which makes CRF decoder more reabable and understandable for freshman.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB