Showing 72 open source projects for "document index"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Find Hidden Risks in Windows Task Scheduler Icon
    Find Hidden Risks in Windows Task Scheduler

    Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

    Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.
    Download Free Tool
  • 1
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Elasticsearch MCP Server

    Elasticsearch MCP Server

    A Model Context Protocol (MCP) server implementation

    This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    ...HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. Easy definition of a document tree, with automatic links to siblings, parents and children. General index as well as a language-specific module index. Automatic highlighting using the Pygments highlighter. Automatic testing of code snippets, the inclusion of docstrings from Python modules (API docs), and more.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 4
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Sonic

    Sonic

    Fast, lightweight & schema-less search backend

    Sonic is a super fast and lightweight, schema-less search backend that can be used in place of super-heavy and full-featured search backends like Elasticsearch. It is able to normalize language search queries, auto-complete search queries and offer the most relevant results. Being an identifier index rather than a document index, when queried it provides IDs that can be used to refer to matched documents in an external database.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    RAG API

    RAG API

    ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

    rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    bleve

    bleve

    A modern text indexing library for go

    Import one package, build an index with three lines of code, query for documents with another three lines. Bleve includes general-purpose analyzers as well as pre-built text analyzers for the following languages, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Thai, and Turkish.
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    ArangoDB JavaScript Driver

    ArangoDB JavaScript Driver

    The official ArangoDB JavaScript driver

    ArangoJS is the official JavaScript client for ArangoDB, a multi-model NoSQL database that supports document, key-value, and graph data models. This client provides a powerful yet simple API to interact with ArangoDB from Node.js or browser-based applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 12
    SimpleMem

    SimpleMem

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    ...It provides easy-to-use APIs for storing structured memory entries, querying those memories using semantic search, and retrieving context to augment prompt inputs for downstream processing. Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    elasticsearc-php

    elasticsearc-php

    PHP low-level client for Elasticsearch

    Introducing Elasticsearch DSL library to provide objective query builder for Elasticsearch bundle and elasticsearch-php client. You can easily build any Elasticsearch query and transform it to an array. This agnostic package is a lightweight wrapper on top of the Elasticsearch PHP client. Its main goal is to allow for easier structuring of queries and indices in your application. It does not want to hide or replace the functionality of the Elasticsearch PHP client. Feature complete, object...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    goquery

    goquery

    A little like that j-thing, only in Go

    ...Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this. Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt package), even though some of its methods are less than intuitive (looking at you, index()...).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Cherche

    Cherche

    Neural Search

    Cherche allows the creation of efficient neural search pipelines using retrievers and pre-trained language models as rankers. Cherche's main strength is its ability to build diverse and end-to-end pipelines from lexical matching, semantic matching, and collaborative filtering-based models. Cherche provides modules dedicated to summarization and question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines. Search is...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...
    Leader badge
    Downloads: 4,741 This Week
    Last Update:
    See Project
  • 18
    ccls

    ccls

    C/C++/ObjC language server supporting cross references & hierarchies

    ...It starts indexing the whole project (including subprojects if exist) parallelly when you open the first file, while the main thread can serve requests before the indexing is complete. Saving files will incrementally update the index. Hierarchies, call (caller/callee) hierarchy, inheritance (base/derived) hierarchy, member hierarchy. Symbol rename. Document symbols and approximate search of workspace symbol. Hover information. Diagnostics and code actions (clang FixIts). Semantic highlighting and preprocessor skipped regions.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    LogicalDOC Document Management - DMS

    LogicalDOC Document Management - DMS

    smart and open source document management system

    LogicalDOC is both document management and collaboration system. The software is loaded with many functions and allows organizing, index, retrieving, controlling and distributing important business documents securely and safely for any organization and individual. Gone are the days when companies used paper-based processes such as printing, mailing and manual filing of paper documents; our document management system replaces all of this with electronic procedures that allow your organization to reduce costs significantly. ...
    Leader badge
    Downloads: 284 This Week
    Last Update:
    See Project
  • 20
    Ladle

    Ladle

    Develop, test and document your React story components faster

    Ladle is a drop-in alternative to Storybook. It is a tool for developing and testing your React components in an environment that's isolated and faster than most real-world applications. Ladle also creates an index of your components, so you can easily test them through tools like Playwright. Ladle is compatible with the Component Story Format and Controls. It supports links, themes, right-to-left, source code, a11y (axe), typescript and flow out of the box. Powered by Vite, using esbuild,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WA2L/WinTools

    WA2L/WinTools

    End User Tools for Windows.

    Some end user utilities for the Windows operating system. The utilities can be called thru the "Send To" context menu when right-clicking on a file or directory in the explorer or thru the Windows "Start Menu". The package can be 'installed' portable and does not need admin rights. ◆ 𝗨𝗧𝗜𝗟𝗜𝗧𝗜𝗘𝗦 - https://sourceforge.net/projects/wa2l-wintools/files/ → README ◆ 𝗙𝗘𝗔𝗧𝗨𝗥𝗘𝗦 - https://wa2l-wintools.sourceforge.net/man1/wintools.1.html -...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24

    xsd2pgschema

    Relational database replication tool based on XML Schema

    xsd2pgschema is a Java application suite, which converts XML Schema 1.1 (hierarchical data model) to PostgreSQL DDL (relational data model) and supports XML data migration into PostgreSQL based on the XML Schema without defects on information content. It also supports full-text indexing via either Apache Lucene or Sphinx Search utilizing the relational data model. File conversion from XML to CSV, TSV, or JSON is possible as well as mapping XML Schema to JSON Schema. Obtained PostgreSQL...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25

    joy of text

    Editor with scripting language, security features & system interfaces.

    Jot was developed general purpose editor for large CAD files. It's command-driven UI requires no mode switching and hence requires fewer keystrokes to get a typical job done. It is particularly useful for checking and cross-referencing between several source, intermediate and output files - a common requirement for CAD work. But jot's usefulness doesn't stop there. It's sophisticated search features can, for example, be used for interactive data mining or automating the extraction of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next