Skip to content
This repository was archived by the owner on Feb 5, 2026. It is now read-only.

Add prometheus counters#13

Closed
shahabkamali wants to merge 3 commits intoahartel:mainfrom
shahabkamali:add_prometheus_counters
Closed

Add prometheus counters#13
shahabkamali wants to merge 3 commits intoahartel:mainfrom
shahabkamali:add_prometheus_counters

Conversation

@shahabkamali
Copy link
Copy Markdown

No description provided.

shahabkamali and others added 3 commits October 27, 2025 15:20
Adds comprehensive deployment options including local, Docker, and Kubernetes support.

Provides detailed documentation and scripts for each deployment scenario.

Introduces environment variables for flexible configuration and removes hardcoded values.

Improves logging and error handling for better debugging.

Adds convenience scripts for easier setup and management.
Enhances deployment and configuration options
Adds Prometheus counters to track document processing stages in both the batcher and worker services. This enhancement provides insights into data filtering and processing performance, and also diagnostic capabilities by tracking specific steps/bottlenecks.

Specifically, the batcher now tracks:
- Total documents processed
- Documents without language info
- Non-English documents filtered
- English documents found
- HTTP 200 responses
- Non-200 responses filtered
- URLs sent to queue
- Batches published

The worker now tracks:
- URLs processed
- WARC records processed
- Text extraction attempts
- Successful text extractions
- Batches consumed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant