1 unstable release
| 0.5.1 | Dec 1, 2025 |
|---|
#2440 in Parser implementations
1.5MB
14K
SLoC
Scribe - Advanced Code Analysis Library
Scribe is a comprehensive Rust library for code analysis, repository exploration, and intelligent file processing. It provides powerful tools for understanding codebases through heuristic scoring, graph analysis, and AI-powered insights.
π Features
- π Intelligent File Analysis: Multi-dimensional heuristic scoring system for identifying important files
- π Dependency Graph Analysis: PageRank centrality computation for understanding code relationships
- β‘ High-Performance Scanning: Parallel file system traversal with git integration
- π― Advanced Pattern Matching: Flexible glob and gitignore pattern support with preset configurations
- π§ Smart Code Selection: Context-aware code bundling and relevance scoring
- π οΈ Extensible Architecture: Plugin system for custom analyzers and scorers
- βοΈ Modular Design: Use only the features you need with optional components
π¦ Installation
Add this to your Cargo.toml:
[dependencies]
scribe = "0.1.0"
Feature Flags
Scribe uses feature flags to allow selective compilation:
# Full installation (default)
scribe = "0.1.0"
# Minimal installation
scribe = { version = "0.1.0", default-features = false, features = ["core"] }
# Fast file operations only
scribe = { version = "0.1.0", default-features = false, features = ["fast"] }
# Analysis without graph features
scribe = { version = "0.1.0", default-features = false, features = ["core", "analysis", "scanner"] }
Available Features
| Feature | Description | Dependencies |
|---|---|---|
default |
All features enabled | core, analysis, graph, scanner, patterns, selection |
core |
Essential types and utilities | None |
analysis |
Heuristic scoring and metrics | core |
graph |
PageRank centrality analysis | core, analysis |
scanner |
File system scanning | core |
patterns |
Pattern matching (glob, gitignore) | core |
selection |
Code selection and bundling | core, analysis, graph |
Feature Groups
| Group | Features | Use Case |
|---|---|---|
minimal |
core |
Basic types and utilities only |
fast |
core, scanner, patterns |
Quick file operations |
comprehensive |
All features | Complete analysis capabilities |
π Quick Start
Basic Repository Analysis
use scribe::prelude::*;
use std::path::Path;
#[tokio::main]
async fn main() -> Result<()> {
// Analyze a repository with default settings
let config = Config::default();
let analysis = analyze_repository(".", &config).await?;
// Get the most important files
println!("Top 10 most important files:");
for (file, score) in analysis.top_files(10) {
println!(" {}: {:.3}", file, score);
}
// Display summary
println!("\n{}", analysis.summary());
Ok(())
}
Selective Feature Usage
// Using only core and scanner features
use scribe::core::{Config, Result};
use scribe::scanner::{Scanner, ScanOptions};
#[tokio::main]
async fn main() -> Result<()> {
let scanner = Scanner::new();
let options = ScanOptions::default()
.with_git_integration(true)
.with_parallel_processing(true);
let files = scanner.scan(".", options).await?;
println!("Found {} files", files.len());
Ok(())
}
Pattern Matching
use scribe::patterns::presets;
#[tokio::main]
async fn main() -> scribe::Result<()> {
// Use preset patterns for common file types
let mut source_matcher = presets::source_code()?;
let mut doc_matcher = presets::documentation()?;
if source_matcher.should_process("src/main.rs")? {
println!("Found source file!");
}
if doc_matcher.should_process("README.md")? {
println!("Found documentation!");
}
Ok(())
}
Graph Analysis
use scribe::graph::PageRankAnalysis;
#[tokio::main]
async fn main() -> scribe::Result<()> {
let analysis = PageRankAnalysis::for_code_analysis()?;
// Compute centrality for scan results
// let centrality_results = analysis.compute_centrality(&scan_results)?;
// let top_files = centrality_results.top_files_by_centrality(10);
Ok(())
}
CLI Covering Sets
Scribeβs CLI can compute minimal covering sets:
--covering-set <name>: target a function/class/module by name.--covering-set-diff: build a covering set for the currentgit diff(uses the dependency graph to include touched files plus related dependents/dependencies).--diff-against <ref>: diff against a specific ref (defaults toHEAD).- Shared filters:
--include-dependents,--max-depth,--max-files. - Output helper: add
--line-numbersto prefix every line in the bundled files, making it easy for review agents to comment by line number.
Example:
cargo run --bin scribe -- --covering-set-diff --include-dependents --max-depth 2
ποΈ Architecture
Scribe is built with a modular architecture where each crate provides specific functionality:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β scribe β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β scribe-core β βscribe-scannerβ β scribe-patterns β β
β β (types, β β(file system β β (glob, gitignore, β β
β β traits, β β traversal, β β pattern matching) β β
β β utilities) β β git support) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βscribe-analysisβ βscribe-graph β β scribe-selection β β
β β (heuristic β β (PageRank β β (intelligent bundling, β β
β β scoring, β β centrality, β β context extraction, β β
β β code metrics)β β dependency β β relevance scoring) β β
β β β β analysis) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Overview
scribe-core: Foundation types, traits, configuration, and utilitiesscribe-scanner: High-performance file system traversal with git integrationscribe-patterns: Flexible pattern matching with glob and gitignore supportscribe-analysis: Heuristic scoring algorithms and code metricsscribe-graph: PageRank centrality and dependency graph analysisscribe-selection: Intelligent code selection and context extraction
π Examples
The repository includes several examples demonstrating different usage patterns:
Run Examples
# Full analysis example
cargo run --example basic_usage -- /path/to/repository
# Minimal features example
cargo run --example selective_features --no-default-features --features="core,scanner" -- /path/to/directory
Available Examples
basic_usage.rs: Complete repository analysis with all featuresselective_features.rs: Minimal usage with core and scanner only
π§ Performance
Scribe is designed for high performance:
- Memory Efficient: Streaming file processing with configurable memory limits
- Parallel Processing: Multi-threaded scanning and analysis using Rayon
- Git Integration: Fast file discovery using
git ls-fileswhen available - Optimized Algorithms: Research-grade PageRank implementation with convergence detection
Benchmarks
Run benchmarks to see performance characteristics:
cargo bench
Performance characteristics on typical repositories:
- Small repos (< 1k files): ~10-50ms analysis time
- Medium repos (1k-10k files): ~100ms-1s analysis time
- Large repos (> 10k files): ~1-10s analysis time
- Memory usage: ~2MB per 1000 files for basic analysis
π οΈ Development
Building
# Build all features
cargo build
# Build with specific features
cargo build --no-default-features --features="core,scanner"
# Build for release
cargo build --release
Testing
# Run all tests
cargo test
# Test specific features
cargo test --no-default-features --features="core,analysis"
# Run tests with output
cargo test -- --nocapture
Documentation
# Generate documentation
cargo doc --open
# Generate documentation for all features
cargo doc --all-features --open
π Related Projects
- [scribe-cli]: Command-line interface for Scribe
- [scribe-vscode]: Visual Studio Code extension
- [scribe-jupyter]: Jupyter notebook integration
π License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
π€ Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Contribution Guidelines
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
π Support
- π Documentation: docs.rs/scribe
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
π Acknowledgments
- Built with Rust π¦
- Uses tree-sitter for parsing
- Inspired by research in code analysis and repository mining
- Community feedback and contributions
Dependencies
~35β84MB
~1.5M SLoC