A professional Node.js CLI tool for extracting or generating structured data from websites using Puppeteer and LangChain.
- π Automatic Detection of existing structured data (JSON-LD, Microdata, Open Graph, Twitter Cards)
- π€ AI-powered Generation of new structured data with LangChain/OpenAI
- π Multiple Formats supported (JSON-LD, Microdata, RDFa)
- β Schema.org Validation of generated data
- πΎ JSON Export with metadata and timestamps
- π Web Scraping with Puppeteer for dynamic content
- π¨ Beautiful CLI with ASCII logo and colored output
- β‘ TypeScript for maximum type safety
# Clone repository
git clone <repository-url>
cd StructuredData
# Install dependencies
npm install
# Build project
npm run buildThe tool displays a beautiful ASCII logo on startup:
npm run dev --help# Analyze website
npm run analyze https://example.com
# With custom output directory
npm run analyze https://example.com -- --output ./my-results
# Force regeneration (even if structured data exists)
npm run analyze https://example.com -- --force# Set API key in .env file
echo "OPENAI_API_KEY=sk-your-api-key-here" > .env
# Or as parameter
npm run analyze https://example.com -- --openai-key "sk-your-api-key"
# Analyze website with AI support
npm run analyze https://example.comnpm run validate "./output/example_com_2025-07-24.json"# Development server with hot reload
npm run watch
# Direct execution
npm run dev analyze https://example.comThe generated JSON files contain:
{
"metadata": {
"url": "https://example.com",
"analyzedAt": "2025-07-24T10:30:00.000Z",
"generated": false,
"structuredDataCount": 3
},
"structuredData": [
{
"type": "Organization",
"data": {
"@context": "https://schema.org",
"@type": "Organization",
"name": "Example Company",
"url": "https://example.com",
"description": "A great company"
},
"format": "json-ld",
"source": "script"
}
]
}- Organization - Companies and organizations
- Person - People and profiles
- WebSite - Website information
- Service - Services
- Product - Products
- LocalBusiness - Local businesses
- Article/BlogPosting - Articles and blog posts
- ContactPoint - Contact information
- PostalAddress - Addresses
- Event - Events
- Review - Reviews
- FAQ - Frequently asked questions
- and many more Schema.org types
# OpenAI API key for AI-powered generation
OPENAI_API_KEY=sk-your-openai-api-key
# Enable debug mode
DEBUG=true- The
.envfile contains sensitive API keys - It's already included in
.gitignore - Never commit API keys to public repositories
- Use
.env.exampleas a template for other developers
# Show all available options
npm run dev analyze --help
# Options:
# -o, --output <path> Output directory for JSON files
# -f, --force Force regeneration
# --openai-key <key> OpenAI API keysrc/
βββ index.ts # CLI Entry Point with ASCII logo
βββ scraper.ts # Main orchestration class
βββ extractors/ # Data extraction modules
β βββ structured-data-extractor.ts
βββ generators/ # AI generation modules
β βββ ai-content-generator.ts
βββ utils/ # Helper utilities
βββ logger.ts # Logging with emojis
βββ schema-validator.ts
npm run build # Compile TypeScript
npm run dev # Development mode
npm run start # Production execution
npm run watch # Hot-reload development
npm run analyze # Direct analysis
npm run validate # Direct validation- Puppeteer - Web scraping and browser automation
- LangChain - AI integration and content generation
- Commander.js - CLI framework
- Figlet - ASCII art text for logo
- Chalk - Terminal colors
- dotenv - Environment variables management
npm run analyze https://schema.org
npm run analyze https://developers.google.comnpm run analyze https://small-business.comnpm run analyze https://site1.com
npm run analyze https://site2.com
npm run analyze https://site3.comnpm run validate "./output/*.json"The tool displays an appealing ASCII logo with colored output on startup:
- Cyan-colored "StructuredData" logo
- Green description
- Gray separator line
- Tries OpenAI API first for best results
- Automatically falls back to basic generation
- Extracts existing data before regeneration
- JSON-LD Scripts
- Microdata Markup
- Open Graph Meta Tags
- Twitter Card Meta Tags
- Contact information (email, phone)
- Social media links
- Website structure (headings)
- Fork the repository
- Create a Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
ZurdAI - AI Expert for Intelligent Automation
- Website: zurdai.com
- GitHub: @zurd46
- LinkedIn: zurd46
Built with β€οΈ and AI-Power for the Swiss Tech Community