langchain-hs-0.0.2.0: Haskell implementation of Langchain
Copyright(c) 2025 Tushar Adhatrao
LicenseMIT
MaintainerTushar Adhatrao <[email protected]>
Stabilityexperimental
Safe HaskellNone
LanguageHaskell2010

Langchain.DocumentLoader.Core

Description

Implementation of LangChain's document loading abstraction, providing:

  • Document representation with content and metadata
  • Typeclass for loading/splitting documents from various sources
  • Integration with text splitting capabilities

For more information on document loader in the original Langchain library, see: https:/python.langchain.comdocsconceptsdocument_loaders/

Example usage:

-- Create a document
doc :: Document
doc = Document "Sample content" (fromList [("source", String "example.txt")])

-- Hypothetical file loader instance
data FileLoader = FileLoader FilePath

instance BaseLoader FileLoader where
  load (FileLoader path) = do
    content <- readFile path
    return $ Right [Document content (fromList [("source", String (T.pack path))])]

Test case patterns:

>>> mempty :: Document
Document {pageContent = "", metadata = fromList []}
>>> doc1 = Document "Hello" (fromList [("a", Number 1)])
>>> doc2 = Document " World" (fromList [("b", Bool True)])
>>> doc1 <> doc2
Document {pageContent = "Hello World", metadata = fromList [("a", Number 1), ("b", Bool True)]}
Synopsis

Document Representation

data Document Source #

Document container with content and metadata. Used for storing loaded data and associated metadata like source URLs or page numbers.

Example:

>>> Document "Hello World" (fromList [("source", String "example.txt")])
Document {pageContent = "Hello World", metadata = fromList [("source",String "example.txt")]}

Constructors

Document 

Fields

Instances

Instances details
Monoid Document Source #

Monoid instance provides empty document:

>>> mempty :: Document
Document {pageContent = "", metadata = fromList []}
Instance details

Defined in Langchain.DocumentLoader.Core

Semigroup Document Source #

Semigroup instance combines both content and metadata

>>> let doc1 = Document "A" (fromList [("x", Number 1)])
>>> let doc2 = Document "B" (fromList [("y", Bool True)])
>>> doc1 <> doc2
Document {pageContent = "AB", metadata = fromList [("x", Number 1), ("y", Bool True)]}
Instance details

Defined in Langchain.DocumentLoader.Core

Show Document Source # 
Instance details

Defined in Langchain.DocumentLoader.Core

Eq Document Source # 
Instance details

Defined in Langchain.DocumentLoader.Core

Loading Interface

class BaseLoader m where Source #

Typeclass for document loading implementations. Implementations should define how to:

  1. Load full documents with load
  2. Load and split content with loadAndSplit

Example instance for text files:

instance BaseLoader FilePath where
  load path = do
    content <- readFile path
    return $ Right [Document content (fromList [("source", String (T.pack path))])]

  loadAndSplit path = do
    content <- readFile path
    return $ Right (splitText defaultCharacterSplitterOps content)

Methods

load :: m -> IO (Either String [Document]) Source #

Load all documents from the source.

loadAndSplit :: m -> IO (Either String [Text]) Source #

Load all the document and split them using recursiveCharacterSpliter