Copyright | (c) 2025 Tushar Adhatrao |
---|---|
License | MIT |
Maintainer | Tushar Adhatrao <[email protected]> |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
Langchain.DocumentLoader.Core
Description
Implementation of LangChain's document loading abstraction, providing:
- Document representation with content and metadata
- Typeclass for loading/splitting documents from various sources
- Integration with text splitting capabilities
For more information on document loader in the original Langchain library, see: https:/python.langchain.comdocsconceptsdocument_loaders/
Example usage:
-- Create a document doc :: Document doc = Document "Sample content" (fromList [("source", String "example.txt")]) -- Hypothetical file loader instance data FileLoader = FileLoader FilePath instance BaseLoader FileLoader where load (FileLoader path) = do content <- readFile path return $ Right [Document content (fromList [("source", String (T.pack path))])]
Test case patterns:
>>>
mempty :: Document
Document {pageContent = "", metadata = fromList []}
>>>
doc1 = Document "Hello" (fromList [("a", Number 1)])
>>>
doc2 = Document " World" (fromList [("b", Bool True)])
>>>
doc1 <> doc2
Document {pageContent = "Hello World", metadata = fromList [("a", Number 1), ("b", Bool True)]}
Document Representation
Document container with content and metadata. Used for storing loaded data and associated metadata like source URLs or page numbers.
Example:
>>>
Document "Hello World" (fromList [("source", String "example.txt")])
Document {pageContent = "Hello World", metadata = fromList [("source",String "example.txt")]}
Constructors
Document | |
Instances
Monoid Document Source # | Monoid instance provides empty document:
|
Semigroup Document Source # | Semigroup instance combines both content and metadata
|
Show Document Source # | |
Eq Document Source # | |
Loading Interface
class BaseLoader m where Source #
Typeclass for document loading implementations. Implementations should define how to:
- Load full documents with
load
- Load and split content with
loadAndSplit
Example instance for text files:
instance BaseLoader FilePath where load path = do content <- readFile path return $ Right [Document content (fromList [("source", String (T.pack path))])] loadAndSplit path = do content <- readFile path return $ Right (splitText defaultCharacterSplitterOps content)
Methods
load :: m -> IO (Either String [Document]) Source #
Load all documents from the source.
loadAndSplit :: m -> IO (Either String [Text]) Source #
Load all the document and split them using recursiveCharacterSpliter