🧠 Semantica
Open-Source Semantic Layer & Knowledge Engineering Framework
Transform Chaos into Intelligence. Build AI systems that are explainable, traceable, and trustworthy — not black boxes.
The semantic intelligence layer that makes your AI agents auditable, explainable, and trustworthy. Perfect for high-stakes domains where mistakes have real consequences.
🆓 Open Source • 📜 MIT Licensed • 🚀 Production Ready • 🌍 Community Driven
🚀 Why Semantica?¶
Semantica bridges the semantic gap between text similarity and true meaning. It's the semantic intelligence layer that makes your AI agents auditable, explainable, and trustworthy.
Perfect for high-stakes domains where mistakes have real consequences.
⚡ Get Started in 30 Seconds¶
from semantica.semantic_extract import NERExtractor
from semantica.kg import GraphBuilder
# Extract entities and build knowledge graph
ner = NERExtractor(method="ml", model="en_core_web_sm")
entities = ner.extract("Apple Inc. was founded by Steve Jobs in 1976.")
kg = GraphBuilder().build({"entities": entities, "relationships": []})
print(f"Built KG with {len(kg.get('entities', []))} entities")
📖 Full Quick Start • 🍳 Cookbook Examples • 💬 Join Discord • ⭐ Star Us
Core Value Proposition¶
| Trustworthy | Explainable | Auditable |
|---|---|---|
| Conflict detection & validation | Transparent reasoning paths | Complete provenance tracking |
| Rule-based governance | Entity relationships & ontologies | W3C PROV-O compliant lineage |
| Production-grade QA | Multi-hop graph reasoning | Source tracking & integrity verification |
Key Features & Benefits¶
Not Just Another Agentic Framework¶
Semantica complements LangChain, LlamaIndex, AutoGen, CrewAI, Google ADK, Agno, and other frameworks to enhance your agents with:
| Feature | Benefit |
|---|---|
| Auditable | Complete provenance tracking with W3C PROV-O compliance |
| Explainable | Transparent reasoning paths with entity relationships |
| Provenance-Aware | End-to-end lineage from documents to responses |
| Validated | Built-in conflict detection, deduplication, QA |
| Governed | Rule-based validation and semantic consistency |
| Version Control | Enterprise-grade change management with integrity verification |
Perfect For High-Stakes Use Cases¶
| 🏥 Healthcare | 💰 Finance | ⚖️ Legal |
|---|---|---|
| Clinical decisions | Fraud detection | Evidence-backed research |
| Drug interactions | Regulatory support | Contract analysis |
| Patient safety | Risk assessment | Case law reasoning |
| 🔒 Cybersecurity | 🏛️ Government | 🏭 Infrastructure | 🚗 Autonomous |
|---|---|---|---|
| Threat attribution | Policy decisions | Power grids | Decision logs |
| Incident response | Classified info | Transportation | Safety validation |
Powers Your AI Stack¶
- GraphRAG Systems — Retrieval with graph reasoning and hybrid search
- AI Agents — Trustworthy, accountable multi-agent systems with semantic memory
- Reasoning Models — Explainable AI decisions with reasoning paths
- Enterprise AI — Governed, auditable platforms that support compliance
Integrations¶
- Docling Support — Document parsing with table extraction (PDF, DOCX, PPTX, XLSX)
- AWS Neptune — Amazon Neptune graph database support with IAM authentication
- Custom Ontology Import — Import existing ontologies (OWL, RDF, Turtle, JSON-LD)
Built for environments where every answer must be explainable and governed.
🚨 The Problem: The Semantic Gap¶
Most AI systems fail in high-stakes domains because they operate on text similarity, not meaning.¶
Understanding the Semantic Gap¶
The semantic gap is the fundamental disconnect between what AI systems can process (text patterns, vector similarities) and what high-stakes applications require (semantic understanding, meaning, context, and relationships).
Traditional AI approaches: - Rely on statistical patterns and text similarity - Cannot understand relationships between entities - Cannot reason about domain-specific rules - Cannot explain why decisions were made - Cannot trace back to original sources with confidence
High-stakes AI requires: - Semantic understanding of entities and their relationships - Domain knowledge encoded as formal rules (ontologies) - Explainable reasoning paths - Source-level provenance - Conflict detection and resolution
Semantica bridges this gap by providing a semantic intelligence layer that transforms unstructured data into validated, explainable, and auditable knowledge.
What Organizations Have vs What They Need¶
| Current State | Required for High-Stakes AI |
|---|---|
| PDFs, DOCX, emails, logs | Formal domain rules (ontologies) |
| APIs, databases, streams | Structured and validated entities |
| Conflicting facts and duplicates | Explicit semantic relationships |
| Siloed systems with no lineage | Explainable reasoning paths |
| Source-level provenance | |
| Audit-ready compliance |
The Cost of Missing Semantics¶
- Decisions cannot be explained — No transparency in AI reasoning
- Errors cannot be traced — No way to debug or improve
- Conflicts go undetected — Contradictory information causes failures
- Compliance becomes impossible — No audit trails for regulations
Trustworthy AI requires semantic accountability.
🆚 Semantica vs Traditional RAG¶
| Feature | Traditional RAG | Semantica |
|---|---|---|
| Reasoning | ❌ Black-box answers | ✅ Explainable reasoning paths |
| Provenance | ❌ No provenance | ✅ W3C PROV-O compliant lineage tracking |
| Search | ⚠️ Vector similarity only | ✅ Semantic + graph reasoning |
| Quality | ❌ No conflict handling | ✅ Explicit contradiction detection |
| Safety | ⚠️ Unsafe for high-stakes | ✅ Designed for governed environments |
| Compliance | ❌ No audit trails | ✅ Complete audit trails with integrity verification |
🧩 Semantica Architecture¶
1️⃣ Input Layer — Governed Ingestion¶
- 📄 Multiple Formats — PDFs, DOCX, HTML, JSON, CSV, Excel, PPTX
- 🔧 Docling Support — Docling parser for table extraction
- 💾 Data Sources — Databases, APIs, streams, archives, web content
- 🎨 Media Support — Image parsing with OCR, audio/video metadata extraction
- � Single Pipeline — Unified ingestion with metadata and source tracking
2️⃣ Semantic Layer — Trust & Reasoning Engine¶
- 🔍 Entity Extraction — NER, normalization, classification
- 🔗 Relationship Discovery — Triplet generation, semantic links
- 📐 Ontology Induction — Automated domain rule generation
- 🔄 Deduplication — Jaro-Winkler similarity, conflict resolution
- ✅ Quality Assurance — Conflict detection, validation
- 📊 Provenance Tracking — W3C PROV-O compliant lineage tracking across all modules
- 🧠 Reasoning Traces — Explainable inference paths
- 🔐 Change Management — Version control with audit trails, checksums, compliance support
3️⃣ Output Layer — Auditable Knowledge Assets¶
- � Knowledge Graphs — Queryable, temporal, explainable
- 📐 OWL Ontologies — HermiT/Pellet validated, custom ontology import support
- 🔢 Vector Embeddings — FastEmbed by default
- ☁️ AWS Neptune — Amazon Neptune graph database support
- 🔍 Provenance — Every AI response links back to:
- 📄 Source documents
- 🏷️ Extracted entities & relations
- 📐 Ontology rules applied
- 🧠 Reasoning steps used
🏥 Built for High-Stakes Domains¶
Designed for domains where mistakes have real consequences and every decision must be accountable:
- 🏥 Healthcare & Life Sciences — Clinical decision support, drug interaction analysis, medical literature reasoning, patient safety tracking
- 💰 Finance & Risk — Fraud detection, regulatory support (SOX, GDPR, MiFID II), credit risk assessment, algorithmic trading validation
- ⚖️ Legal & Compliance — Evidence-backed legal research, contract analysis, regulatory change tracking, case law reasoning
- 🔒 Cybersecurity & Intelligence — Threat attribution, incident response, security audit trails, intelligence analysis
- 🏛️ Government & Defense — Governed AI systems, policy decisions, classified information handling, defense intelligence
- 🏭 Critical Infrastructure — Power grid management, transportation safety, water treatment, emergency response
- 🚗 Autonomous Systems — Self-driving vehicles, drone navigation, robotics safety, industrial automation
� Who Uses Semantica?¶
- 🤖 AI / ML Engineers — Building explainable GraphRAG & agents
- ⚙️ Data Engineers — Creating governed semantic pipelines
- 📊 Knowledge Engineers — Managing ontologies & KGs at scale
- 🏢 Enterprise Teams — Requiring trustworthy AI infrastructure
- 🛡️ Risk & Compliance Teams — Needing audit-ready systems
🚀 Choose Your Path¶
-
Quick Start --- Get up and running with Semantica in minutes. Learn the basics of ingestion and extraction.
-
Core Concepts --- Deep dive into Knowledge Graphs, Ontologies, and Semantic Reasoning.
-
API Reference --- Detailed technical documentation for all Semantica modules and classes.
-
Cookbook --- Interactive tutorials, real-world examples, and 14 domain-specific cookbooks.
📦 Installation¶
Now Available on PyPI!
Semantica is officially published on PyPI! Install it with a single command.
Install Semantica directly from PyPI:
Install from the local source for the latest development version:
For contributors who want to modify the framework:
🚦 Quick Example¶
Semantica uses a modular architecture. You can use individual modules directly for maximum flexibility:
from semantica.ingest import FileIngestor
from semantica.parse import DocumentParser
from semantica.semantic_extract import NERExtractor, RelationExtractor
from semantica.kg import GraphBuilder
# 1. Ingest documents
ingestor = FileIngestor()
documents = ingestor.ingest_directory("documents/", recursive=True)
# 2. Parse documents
parser = DocumentParser()
parsed_docs = [parser.parse_document(doc) for doc in documents]
# 3. Extract entities and relationships
ner = NERExtractor()
rel_extractor = RelationExtractor()
entities = []
relationships = []
for doc in parsed_docs:
text = doc.get("full_text", "")
doc_entities = ner.extract_entities(text)
doc_rels = rel_extractor.extract_relations(text, entities=doc_entities)
entities.extend(doc_entities)
relationships.extend(doc_rels)
# 4. Build knowledge graph
builder = GraphBuilder(merge_entities=True)
kg = builder.build_graph(entities=entities, relationships=relationships)
print(f"Created graph with {len(kg.nodes)} nodes and {len(kg.edges)} edges")
Orchestration Option
For complex workflows, you can also use the Semantica class for orchestration. See the Core Module documentation for details.
🎯 Why Semantica?¶
-
🆓 Open Source --- MIT licensed. No vendor lock-in. Full transparency.
-
🚀 Production Ready --- Battle-tested with quality assurance, conflict resolution, and validation.
-
🧩 Modular Architecture --- Use only what you need. Swap components easily.
-
🌍 Community Driven --- Built by developers, for developers. Active Discord community.
-
📚 Comprehensive --- End-to-end solution from ingestion to reasoning. No duct-taping required.
-
🔬 Research-Backed --- Based on latest research in knowledge graphs, ontologies, and semantic web.
🏗️ Built For¶
- Data Scientists: Transform messy data into clean knowledge graphs
- Data Engineers: Build scalable data pipelines with semantic enrichment
- AI Engineers: Build GraphRAG, AI agents, and multi-agent systems
- Knowledge Engineers: Generate and manage formal ontologies
- Ontologists: Design and validate domain-specific ontologies and taxonomies
- Researchers: Analyze scientific literature and build citation networks
- ML Engineers: Create semantic features for machine learning models
- Enterprises: Unify data silos into a semantic layer
📚 Learn More¶
- Getting Started Guide - Your first knowledge graph in 5 minutes
- Core Concepts - Deep dive into knowledge graphs and ontologies
- Cookbook - Real-world examples and 14 domain-specific cookbooks
- API Reference - Complete technical documentation
🍳 Recommended Cookbook Tutorials¶
Get hands-on with interactive Jupyter notebooks:
- Welcome to Semantica: Comprehensive introduction to all Semantica modules
- Topics: Framework overview, all modules, architecture
- Difficulty: Beginner
-
Use Cases: First-time users, understanding the framework
-
Your First Knowledge Graph: Build your first knowledge graph from scratch
- Topics: Entity extraction, relationship extraction, graph construction
- Difficulty: Beginner
-
Use Cases: Learning the basics, quick start
-
GraphRAG Complete: Production-ready Graph Retrieval Augmented Generation
- Topics: GraphRAG, hybrid retrieval, vector search, graph traversal
- Difficulty: Advanced
- Use Cases: Building AI applications with knowledge graphs