Readme
Scribe - Advanced Code Analysis Library
Scribe is a comprehensive Rust library for code analysis, repository exploration, and intelligent file processing. It provides powerful tools for understanding codebases through heuristic scoring, graph analysis, and AI-powered insights.
π Features
π Intelligent File Analysis : Multi-dimensional heuristic scoring system for identifying important files
π Dependency Graph Analysis : PageRank centrality computation for understanding code relationships
β‘ High-Performance Scanning : Parallel file system traversal with git integration
π― Advanced Pattern Matching : Flexible glob and gitignore pattern support with preset configurations
π§ Smart Code Selection : Context-aware code bundling and relevance scoring
π οΈ Extensible Architecture : Plugin system for custom analyzers and scorers
βοΈ Modular Design : Use only the features you need with optional components
π¦ Installation
Add this to your Cargo.toml :
[ dependencies ]
scribe = " 0.1.0"
Feature Flags
Scribe uses feature flags to allow selective compilation:
# Full installation (default)
scribe = "0.1.0"
# Minimal installation
scribe = { version = "0.1.0", default-features = false, features = ["core"] }
# Fast file operations only
scribe = { version = "0.1.0", default-features = false, features = ["fast"] }
# Analysis without graph features
scribe = { version = "0.1.0", default-features = false, features = ["core", "analysis", "scanner"] }
Available Features
Feature
Description
Dependencies
default
All features enabled
core , analysis , graph , scanner , patterns , selection
core
Essential types and utilities
None
analysis
Heuristic scoring and metrics
core
graph
PageRank centrality analysis
core , analysis
scanner
File system scanning
core
patterns
Pattern matching (glob, gitignore)
core
selection
Code selection and bundling
core , analysis , graph
Feature Groups
Group
Features
Use Case
minimal
core
Basic types and utilities only
fast
core , scanner , patterns
Quick file operations
comprehensive
All features
Complete analysis capabilities
π Quick Start
Basic Repository Analysis
use scribe:: prelude:: * ;
use std:: path:: Path;
# [ tokio ::main ]
async fn main ( ) -> Result < ( ) > {
// Analyze a repository with default settings
let config = Config:: default( ) ;
let analysis = analyze_repository ( " ." , & config) . await? ;
// Get the most important files
println! ( " Top 10 most important files:" ) ;
for ( file, score) in analysis. top_files ( 10 ) {
println! ( " {} : {:.3} " , file, score) ;
}
// Display summary
println! ( " \n {} " , analysis. summary ( ) ) ;
Ok ( ( ) )
}
Selective Feature Usage
// Using only core and scanner features
use scribe:: core:: { Config, Result } ;
use scribe:: scanner:: { Scanner, ScanOptions} ;
# [ tokio ::main ]
async fn main ( ) -> Result < ( ) > {
let scanner = Scanner:: new( ) ;
let options = ScanOptions:: default( )
. with_git_integration ( true )
. with_parallel_processing ( true ) ;
let files = scanner. scan ( " ." , options) . await? ;
println! ( " Found {} files" , files. len ( ) ) ;
Ok ( ( ) )
}
Pattern Matching
use scribe:: patterns:: presets;
# [ tokio ::main ]
async fn main ( ) -> scribe:: Result < ( ) > {
// Use preset patterns for common file types
let mut source_matcher = presets:: source_code( ) ? ;
let mut doc_matcher = presets:: documentation( ) ? ;
if source_matcher. should_process ( " src/main.rs" ) ? {
println! ( " Found source file!" ) ;
}
if doc_matcher. should_process ( " README.md" ) ? {
println! ( " Found documentation!" ) ;
}
Ok ( ( ) )
}
Graph Analysis
use scribe:: graph:: PageRankAnalysis;
# [ tokio ::main ]
async fn main ( ) -> scribe:: Result < ( ) > {
let analysis = PageRankAnalysis:: for_code_analysis( ) ? ;
// Compute centrality for scan results
// let centrality_results = analysis.compute_centrality(&scan_results)?;
// let top_files = centrality_results.top_files_by_centrality(10);
Ok ( ( ) )
}
CLI Covering Sets
Scribeβs CLI can compute minimal covering sets:
--covering-set < name> : target a function/class/module by name.
--covering-set -diff : build a covering set for the current git diff (uses the dependency graph to include touched files plus related dependents/dependencies).
--diff-against < ref> : diff against a specific ref (defaults to HEAD ).
Shared filters: --include-dependents , --max-depth , --max-files .
Output helper: add --line-numbers to prefix every line in the bundled files, making it easy for review agents to comment by line number.
Example:
cargo run -- bin scribe -- --covering-set-diff --include-dependents --max-depth 2
ποΈ Architecture
Scribe is built with a modular architecture where each crate provides specific functionality:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β scribe β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β scribe- core β βscribe- scannerβ β scribe- patterns β β
β β ( types, β β( file system β β ( glob, gitignore, β β
β β traits, β β traversal, β β pattern matching) β β
β β utilities) β β git support) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β βscribe- analysisβ βscribe- graph β β scribe- selection β β
β β ( heuristic β β ( PageRank β β ( intelligent bundling, β β
β β scoring, β β centrality, β β context extraction, β β
β β code metrics) β β dependency β β relevance scoring) β β
β β β β analysis) β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Overview
scribe-core : Foundation types, traits, configuration, and utilities
scribe-scanner : High-performance file system traversal with git integration
scribe-patterns : Flexible pattern matching with glob and gitignore support
scribe-analysis : Heuristic scoring algorithms and code metrics
scribe-graph : PageRank centrality and dependency graph analysis
scribe-selection : Intelligent code selection and context extraction
π Examples
The repository includes several examples demonstrating different usage patterns:
Run Examples
# Full analysis example
cargo run --example basic_usage -- /path/to/repository
# Minimal features example
cargo run --example selective_features --no-default-features --features="core,scanner" -- /path/to/directory
Available Examples
basic_usage.rs : Complete repository analysis with all features
selective_features.rs : Minimal usage with core and scanner only
Scribe is designed for high performance:
Memory Efficient : Streaming file processing with configurable memory limits
Parallel Processing : Multi-threaded scanning and analysis using Rayon
Git Integration : Fast file discovery using git ls-files when available
Optimized Algorithms : Research-grade PageRank implementation with convergence detection
Benchmarks
Run benchmarks to see performance characteristics:
cargo bench
Performance characteristics on typical repositories:
Small repos (< 1k files) : ~10-50ms analysis time
Medium repos (1k-10k files) : ~100ms-1s analysis time
Large repos (> 10k files) : ~1-10s analysis time
Memory usage : ~2MB per 1000 files for basic analysis
π οΈ Development
Building
# Build all features
cargo build
# Build with specific features
cargo build --no-default-features --features="core,scanner"
# Build for release
cargo build --release
Testing
# Run all tests
cargo test
# Test specific features
cargo test --no-default-features --features="core,analysis"
# Run tests with output
cargo test -- --nocapture
Documentation
# Generate documentation
cargo doc --open
# Generate documentation for all features
cargo doc --all-features --open
[scribe-cli] : Command-line interface for Scribe
[scribe-vscode] : Visual Studio Code extension
[scribe-jupyter] : Jupyter notebook integration
π License
This project is licensed under either of
at your option.
π€ Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Contribution Guidelines
Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request
π Support
π Acknowledgments
Built with Rust π¦
Uses tree-sitter for parsing
Inspired by research in code analysis and repository mining
Community feedback and contributions