Enhancing dataset.py with Progress Indicators
Enhancing dataset.py with Progress Indicators
PR Writing
🧑 Human
Add detailed progress indicators for data processing tasks in [Link], integrating with the frontend via
WebSocket updates. Provide the updated code in a tarball.
X-AnyLabeling-main/
🤖 Choose the better answer. If one response streams more quickly than the other, please do not let that affect your choice!
I'll help you add detailed progress indicators for data processing tasks in [Link] and integrate I'll help you add detailed progress indicators for data processing tasks in [Link] and integrate
them with the frontend via WebSocket updates. I'll first check the code and then make the WebSocket updates to communicate with the frontend. Let's follow these steps:
necessary modifications.
1. First, let's examine the current [Link] content
2. Add progress tracking functionality
str_replace_editor 3. Set up WebSocket communication
command
4. Create the modified files and package them
view
Let's start by looking at the dataset file:
path
/tmp/inputs/X-AnyLabeling-main/anylabeling/[Link]
terminal
<error>The path /tmp/inputs/X-AnyLabeling-main/anylabeling/[Link] does not exist. Please
provide a valid path.</error> session_name
find_dataset
text
Let me check the directory structure first to locate the correct file. find /tmp/inputs/X-AnyLabeling-main -name [Link]
wait_for_idle
0.5
terminal
session_name find_dataset was not found in the current agent state. Created a new terminal session.
file_check
<error>`wait_for_idle` was specified without `timeout`</error>
text
ls -R /tmp/inputs/X-AnyLabeling-main/
wait_for_idle
0.5 terminal
session_name
file_check was not found in the current agent state. Created a new terminal session. find_dataset
wait_for_idle
0.5
[Link] 1/22
12/24/24, 7:13 PM Human feedback
terminal timeout
10
session_name
file_check
output
text
ls -R /tmp/inputs/X-AnyLabeling-main/
output
Great, let's examine the current content of [Link]:
<response clipped>_large_video.yaml
sam2_hiera_small.yaml
sam2_hiera_small_video.yaml str_replace_editor
sam2_hiera_tiny.yaml
command
sam2_hiera_tiny_video.yaml view
sam_hq_vit_b.yaml
path
sam_hq_vit_h_quant.yaml
/tmp/inputs/X-AnyLabeling-
sam_hq_vit_l.yaml main/anylabeling/services/auto_labeling/visualgd/datasets/[Link]
sam_hq_vit_l_quant.yaml
sam_med2d_vit_b.yaml
output
segment_anything_vit_b.yaml
segment_anything_vit_b_quant.yaml Here's the result of running `cat -n` on /tmp/inputs/X-AnyLabeling-
segment_anything_vit_h_quant.yaml main/anylabeling/services/auto_labeling/visualgd/datasets/[Link]:
segment_anything_vit_l.yaml 1 from __future__ import print_function
segment_anything_vit_l_quant.yaml 2
[Link] 3 import torch
[Link] 4 import [Link] as datasets
yolo11s_cls.yaml 5 from [Link] import Dataset
yolo11s_det_botsort.yaml 6 from PIL import Image
yolo11s_obb.yaml 7 from .tsv_io import TSVFile
yolo11s_obb_botsort.yaml 8 import numpy as np
yolo11s_pose.yaml 9 import base64
yolo11s_pose_botsort.yaml 10 import io
yolo11s_seg.yaml 11
yolo11s_seg_botsort.yaml 12
yolo_nas_l.yaml 13 class TSVDataset(Dataset):
yolo_nas_m.yaml 14 """TSV dataset for ImageNet 1K training"""
yolo_nas_s.yaml 15
[Link] 16 def __init__(self, tsv_file, transform=None, target_transform=None):
[Link] 17 [Link] = TSVFile(tsv_file)
[Link] 18 [Link] = transform
[Link] 19 self.target_transform = target_transform
[Link] 20
[Link] 21 def __getitem__(self, index):
yolov5_car_plate.yaml 22 """
[Link] 23 Args:
[Link] 2/22
12/24/24, 7:13 PM Human feedback
[Link] 24 index (int): Index
yolov5m_obb_csl_dotav15.yaml 25 Returns:
yolov5m_obb_csl_dotav20.yaml 26 tuple: (image, target) where target is class_index of the target class.
yolov5m_ram.yaml 27 """
[Link] 28 row = [Link](index)
yolov5n_obb_drone_vehicle.yaml 29 image_data = base64.b64decode(row[-1])
[Link] 30 image = [Link]([Link](image_data))
yolov5s_cls.yaml 31 image = [Link]("RGB")
yolov5s_det_botsort.yaml 32 target = int(row[1])
yolov5s_mobile_sam_vit_h.yaml 33
yolov5s_obb_csl_dotav10.yaml 34 if [Link] is not None:
yolov5s_resnet50.yaml 35 img = [Link](image)
yolov5s_sahi.yaml 36 else:
yolov5s_seg.yaml 37 img = image
[Link] 38 if self.target_transform is not None:
yolov6l_mbla.yaml 39 target = self.target_transform(target)
yolov6lite_l_face.yaml 40
yolov6lite_m_face.yaml 41 return img, target
yolov6lite_s_face.yaml 42
yolov6m_mbla.yaml 43 def __len__(self):
[Link] 44 return [Link].num_rows()
[Link] 45
yolov6s_mbla.yaml
yolov6x_mbla.yaml
[Link]
Now I'll create an enhanced version of [Link] that includes progress indicators and WebSocket
[Link]
updates. The main changes will be:
yolov8l_oiv7.yaml
yolov8l_seg.yaml
1. Add a WebSocket client class for progress updates
yolov8l_worldv2.yaml
2. Add progress tracking to data loading operations
yolov8l_worldv2_cc3m.yaml
3. Enhance error handling and provide detailed status messages
[Link]
4. Add batch processing progress indicators
yolov8m_oiv7.yaml
yolov8m_seg.yaml
Let's create the updated files. First, I'll create a utils directory with a websocket client:
yolov8m_seg_bytetrack.yaml
yolov8m_worldv2.yaml
[Link] str_replace_editor
yolov8n_efficientvit_sam_l0_vit_h.yaml
command
yolov8n_obb_botsort.yaml
create
yolov8n_oiv7.yaml
yolov8n_pose.yaml path
/tmp/outputs/websocket_client.py
yolov8n_seg.yaml
[Link] file_text
yolov8s_cls.yaml import asyncio
import json
yolov8s_det_botsort.yaml import websockets
yolov8s_obb.yaml
class ProgressWebSocket:
yolov8s_oiv7.yaml """WebSocket client for sending progress updates to the frontend"""
yolov8s_sahi.yaml
def __init__(self, uri="[Link]
yolov8s_seg.yaml [Link] = uri
yolov8s_worldv2.yaml [Link] = None
[Link] 3/22
12/24/24, 7:13 PM Human feedback
yolov8s_worldv2_ram_plus.yaml async def connect(self):
"""Establish WebSocket connection"""
[Link] try:
yolov8x6_oiv7.yaml [Link] = await [Link]([Link])
return True
yolov8x_oiv7.yaml except Exception as e:
yolov8x_pose_p6.yaml print(f"WebSocket connection failed: {str(e)}")
return False
yolov8x_pose_p6_botsort.yaml
yolov8x_seg.yaml async def send_progress(self, progress_type, current, total, status="",
details=None):
yolov8x_worldv2.yaml """Send progress update via WebSocket
yolov9_gelan_c.yaml
Args:
yolov9_gelan_e.yaml progress_type (str): Type of progress update (e.g., "load", "process")
[Link] current (int): Current progress value
total (int): Total expected value
[Link] status (str): Status message
yolox_l_dwpose_ucoco.yaml details (dict): Additional progress details
"""
yolox_s.yaml if not [Link]:
return
/tmp/inputs/X-AnyLabeling-main/anylabeling/resources: message = {
__init__.py [Link] [Link] translations "type": "progress",
"data": {
"progress_type": progress_type,
/tmp/inputs/X-AnyLabeling-main/anylabeling/resources/translations: "current": current,
"total": total,
en_US.qm en_US.ts zh_CN.qm zh_CN.ts "percentage": round((current / total * 100) if total > 0 else 0, 2),
"status": status
}
/tmp/inputs/X-AnyLabeling-main/anylabeling/services: }
__init__.py auto_labeling
if details:
message["data"].update(details)
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling:
try:
__base__ [Link] yolov5_det_track.py await [Link]([Link](message))
__init__.py rtmdet_pose.py yolov5_obb.py except Exception as e:
print(f"Failed to send progress update: {str(e)}")
[Link] sam_hq.py yolov5_ram.py
configs sam_med2d.py yolov5_resnet.py async def close(self):
"""Close WebSocket connection"""
damo_yolo.py sam_onnx.py yolov5_sahi.py if [Link]:
depth_anything.py segment_anything.py yolov5_sam.py await [Link]()
[Link] = None
depth_anything_v2.py segment_anything_2.py yolov5_seg.py
doclayout_yolo.py segment_anything_2_video.py yolov5_track.py
output
edge_sam.py trackers [Link]
efficientvit_sam.py [Link] yolov6_face.py
File created successfully at: /tmp/outputs/websocket_client.py
engines [Link] [Link]
[Link] utils [Link]
gold_yolo.py visualgd yolov8_cls.py
Now let's create the updated [Link] with progress tracking:
grounding_dino.py [Link] yolov8_det_track.py
grounding_sam.py yolo11_cls.py yolov8_efficientvit_sam.py
grounding_sam2.py yolo11_det_track.py yolov8_obb.py
str_replace_editor
internimage_cls.py yolo11_obb.py yolov8_obb_track.py
lru_cache.py yolo11_obb_track.py yolov8_pose.py command
create
[Link] yolo11_pose.py yolov8_pose_track.py
model_manager.py yolo11_pose_track.py yolov8_sahi.py path
open_vision.py yolo11_seg.py yolov8_seg.py /tmp/outputs/[Link]
[Link] 4/22
12/24/24, 7:13 PM Human feedback
[Link] [Link] yolow_ram.py import base64
import io
[Link] yolov5_car_plate.py [Link] import time
[Link] yolov5_cls.py yolox_dwpose.py from typing import Optional, Dict, Any, Tuple
import numpy as np
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/__base__: import torch
import [Link] as datasets
__init__.py [Link] [Link] [Link] [Link] [Link] [Link] [Link] from PIL import Image
from [Link] import Dataset
class ProgressTracker:
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/configs/bert: """Track and report progress for dataset operations"""
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/trackers: Args:
__init__.py [Link] bot_sort.py byte_tracker.py utils increment (int): Amount to increment progress by
operation (str): Type of operation being tracked
status (str): Status message
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/trackers/utils: details (dict): Additional progress details
"""
__init__.py [Link] kalman_filter.py [Link] [Link] += increment
if not [Link]:
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/utils: return
__init__.py [Link] [Link] points_conversion.py ppocr_utils sahi
# Calculate processing statistics
elapsed = [Link]() - self._start_time
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/utils/ppocr_utils: rate = [Link] / elapsed if elapsed > 0 else 0
eta = ([Link] - [Link]) / rate if rate > 0 else 0
__init__.py db_postprocess.py rec_postprocess.py
cls_postprocess.py [Link] text_system.py progress_details = {
"rate": round(rate, 2),
"elapsed": round(elapsed, 2),
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/utils/sahi: "eta": round(eta, 2)
}
__init__.py auto_model.py models [Link] scripts utils if details:
[Link] [Link] postprocess [Link] [Link] progress_details.update(details)
await [Link].send_progress(
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/utils/sahi/models: operation,
[Link],
[Link] 5/22
12/24/24, 7:13 PM Human feedback
__init__.py [Link] [Link] [Link] [Link],
status,
[Link] [Link] [Link] [Link] progress_details
[Link] [Link] yolov5_onnx.py yolov8_onnx.py )
__init__.py coco_panoptic.py [Link] random_crop.py def __getitem__(self, index: int) -> Tuple[[Link], int]:
[Link] cocogrounding_eval.py [Link] [Link] """Get a dataset item
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/visualgd/model: Returns:
__init__.py [Link] positional_encoding_loca.py tuple: (image, target) where target is class_index of the target class.
"""
backbone [Link] [Link] try:
[Link] mlp_loca.py transformer_loca.py # Get row data
row = [Link](index)
csrc ms_deform_attn.py transformer_vanilla.py
fuse_modules.py ops [Link] # Decode image
try:
image_data = base64.b64decode(row[-1])
/tmp/inputs/X-AnyLabeling- image = [Link]([Link](image_data))
image = [Link]("RGB")
main/anylabeling/services/auto_labeling/visualgd/model/backbone: except Exception as e:
__init__.py [Link] position_encoding.py swin_transformer.py raise RuntimeError(f"Failed to decode image at index {index}: {str(e)}")
# Get target
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/visualgd/model/csrc: try:
target = int(row[1])
MsDeformAttn cuda_version.cu [Link] except (IndexError, ValueError) as e:
raise RuntimeError(f"Invalid target at index {index}: {str(e)}")
[Link] 6/22
12/24/24, 7:13 PM Human feedback
ms_deform_attn.h ms_deform_attn_cpu.h ms_deform_attn_cuda.h img = [Link](image)
except Exception as e:
ms_deform_attn_cpu.cpp ms_deform_attn_cuda.cu ms_deform_im2col_cuda.cuh raise RuntimeError(f"Transform failed at index {index}: {str(e)}")
else:
img = image
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/visualgd/model/ops:
functions modules [Link] src [Link] if self.target_transform is not None:
try:
target = self.target_transform(target)
/tmp/inputs/X-AnyLabeling- except Exception as e:
raise RuntimeError(f"Target transform failed at index {index}:
main/anylabeling/services/auto_labeling/visualgd/model/ops/functions: {str(e)}")
__init__.py ms_deform_attn_func.py
# Update progress
loop = asyncio.get_event_loop()
/tmp/inputs/X-AnyLabeling- loop.run_until_complete([Link](
1,
main/anylabeling/services/auto_labeling/visualgd/model/ops/modules: "item_load",
__init__.py ms_deform_attn.py f"Loaded item {index}",
{"index": index}
))
/tmp/inputs/X-AnyLabeling-main/anylabeling/services/auto_labeling/visualgd/model/ops/src:
return img, target
cpu cuda ms_deform_attn.h [Link]
except Exception as e:
# Log error and re-raise
/tmp/inputs/X-AnyLabeling- loop = asyncio.get_event_loop()
main/anylabeling/services/auto_labeling/visualgd/model/ops/src/cpu: loop.run_until_complete([Link](
0,
ms_deform_attn_cpu.cpp ms_deform_attn_cpu.h "error",
f"Error at index {index}: {str(e)}"
))
/tmp/inputs/X-AnyLabeling- raise
main/anylabeling/services/auto_labeling/visualgd/model/ops/src/cuda:
def __len__(self) -> int:
ms_deform_attn_cuda.cu ms_deform_attn_cuda.h ms_deform_im2col_cuda.cuh return [Link].num_rows()
/tmp/inputs/X-AnyLabeling-main/anylabeling/views/common: Now let's create a simple example script showing how to use the enhanced dataset:
__init__.py [Link]
/tmp/inputs/X-AnyLabeling-main/anylabeling/views/labeling: str_replace_editor
path
/tmp/inputs/X-AnyLabeling-main/anylabeling/views/labeling/utils: /tmp/outputs/example_usage.py
__init__.py _io.py [Link] [Link] [Link] [Link] [Link] [Link]
file_text
import asyncio
/tmp/inputs/X-AnyLabeling-main/anylabeling/views/labeling/widgets: from dataset import TSVDataset
__init__.py general_dialog.py import [Link] as transforms
[Link] 7/22
12/24/24, 7:13 PM Human feedback
escapable_qlist_widget.py [Link] [Link](mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
file_dialog_preview.py unique_label_qlist_widget.py ])
filter_label_widget.py zoom_widget.py
# Create dataset with WebSocket progress updates
dataset = TSVDataset(
/tmp/inputs/X-AnyLabeling-main/anylabeling/views/labeling/widgets/auto_labeling: "path/to/your/[Link]",
transform=transform,
__init__.py auto_labeling.py auto_labeling.ui websocket_uri="[Link]
)
/tmp/inputs/X-AnyLabeling-main/assets/Annotations: # Cleanup
[Link] instances_default.json await [Link]()
if __name__ == "__main__":
/tmp/inputs/X-AnyLabeling-main/assets/MOT: [Link](main())
/tmp/inputs/X-AnyLabeling-main/assets/attributes:
person_attributes.json vehicle_attributes.json Let's also create a simple WebSocket server for testing:
/tmp/inputs/X-AnyLabeling-main/assets/labelTxt:
demo_obb.txt str_replace_editor
command
/tmp/inputs/X-AnyLabeling-main/assets/labels: create
[Link]
path
/tmp/outputs/websocket_server.py
/tmp/inputs/X-AnyLabeling-main/assets/ppocr:
file_text
ppocr-kie ppocr-rec
import asyncio
import websockets
/tmp/inputs/X-AnyLabeling-main/assets/ppocr/ppocr-kie: import json
[Link] 8/22
12/24/24, 7:13 PM Human feedback
/tmp/inputs/X-AnyLabeling-main/examples: "localhost",
8765
classification interactive_video_object_segmentation segmentation )
description matting vision_language print("WebSocket server started on [Link]
if __name__ == "__main__":
/tmp/inputs/X-AnyLabeling-main/examples/classification: [Link](main())
image-level shape-level
output
/tmp/inputs/X-AnyLabeling-main/examples/classification/image-level:
File created successfully at: /tmp/outputs/websocket_server.py
[Link] fruit_flags.txt logo_flags.txt
/tmp/inputs/X-AnyLabeling-main/examples/classification/shape-level:
[Link] [Link] label_flags.yaml [Link] Let's create a README file with usage instructions:
/tmp/inputs/X-AnyLabeling-main/examples/description:
captioning tagging str_replace_editor
command
/tmp/inputs/X-AnyLabeling-main/examples/description/captioning: create
[Link]
path
/tmp/outputs/[Link]
/tmp/inputs/X-AnyLabeling-main/examples/description/tagging:
file_text
[Link]
# Enhanced TSV Dataset with Progress Tracking
/tmp/inputs/X-AnyLabeling-main/examples/detection: This package provides an enhanced version of the TSV dataset with detailed progress
tracking and WebSocket updates for the frontend.
hbb obb
## Features
[Link] ## Installation
Requirements:
/tmp/inputs/X-AnyLabeling-main/examples/estimation: ```
depth_estimation pose_estimation torch
torchvision
Pillow
/tmp/inputs/X-AnyLabeling-main/examples/estimation/depth_estimation: websockets
numpy
[Link] ```
Install dependencies:
/tmp/inputs/X-AnyLabeling-main/examples/estimation/pose_estimation: ```bash
[Link] labels pose_classes.yaml pip install torch torchvision Pillow websockets numpy
```
/tmp/inputs/X-AnyLabeling-main/examples/estimation/pose_estimation/labels: ## Usage
[Link] 9/22
12/24/24, 7:13 PM Human feedback
image_matting
# Define transforms
transform = [Link]([
/tmp/inputs/X-AnyLabeling-main/examples/matting/image_matting: [Link](256),
[Link](224),
[Link] [Link](),
])
# Cleanup
/tmp/inputs/X-AnyLabeling-main/examples/optical_character_recognition: await [Link]()
```
document_layout_analysis key_information_extraction text_recognition
## Progress Updates
/tmp/inputs/X-AnyLabeling- The WebSocket server will receive progress updates in this format:
main/examples/optical_character_recognition/document_layout_analysis: ```json
{
[Link] "type": "progress",
"data": {
"progress_type": "item_load",
/tmp/inputs/X-AnyLabeling- "current": 42,
main/examples/optical_character_recognition/key_information_extraction: "total": 1000,
"percentage": 4.2,
[Link] ppocr-kie "status": "Loading item 42",
"rate": 25.5,
"elapsed": 1.65,
/tmp/inputs/X-AnyLabeling- "eta": 37.6
main/examples/optical_character_recognition/key_information_extraction/ppocr-kie: }
}
class_list.txt ppocr_kie.json ```
## Error Handling
/tmp/inputs/X-AnyLabeling-main/examples/optical_character_recognition/text_recognition:
[Link] ppocr-rec Errors during processing will be reported via WebSocket:
```json
{
/tmp/inputs/X-AnyLabeling- "type": "progress",
"data": {
main/examples/optical_character_recognition/text_recognition/ppocr-rec: "progress_type": "error",
[Link] rec_gt.txt "current": 42,
"total": 1000,
"status": "Error at index 42: Failed to decode image"
/tmp/inputs/X-AnyLabeling-main/examples/segmentation: }
}
[Link] instance_segmentation ```
binary_semantic_segmentation multiclass_semantic_segmentation
## Frontend Integration
/tmp/inputs/X-AnyLabeling-main/examples/segmentation/instance_segmentation/labels:
output
cat_dog.txt
[Link] 10/22
12/24/24, 7:13 PM Human feedback
/tmp/inputs/X-AnyLabeling-main/examples/segmentation/multiclass_semantic_segmentation:
mask_color_map.json
Now let's create a tarball with all the updated files:
/tmp/inputs/X-AnyLabeling-main/examples/vision_language:
florence2
terminal
/tmp/inputs/X-AnyLabeling-main/examples/vision_language/florence2: session_name
[Link] create_tarball
text
/tmp/inputs/X-AnyLabeling-main/scripts: cd /tmp/outputs && tar -czf dataset_with_progress.[Link] *.py [Link]
build_and_publish_pypi.sh compile_languages.py generate_languages.py
wait_for_idle
build_executable.sh format_code.sh zip_models.py
0.5
/tmp/inputs/X-AnyLabeling-main/tests: timeout
10
__init__.py test_models test_utils
create_tarball was not found in the current agent state. Created a new terminal session.
/tmp/inputs/X-AnyLabeling-main/tests/test_models:
rmbg_v_1_4.py
(base) argocd@runsc:~$ <ar -czf dataset_with_progress.[Link] *.py [Link]
/tmp/inputs/X-AnyLabeling-main/tests/test_utils: (base) argocd@runsc:/tmp/outputs$
test_general.py
/tmp/inputs/X-AnyLabeling-main/tools:
I've created an enhanced version of the dataset implementation with the following improvements:
label_converter.py label_drawer.py onnx_exporter
1. WebSocket Integration:
/tmp/inputs/X-AnyLabeling-main/tools/onnx_exporter:
- Added a `ProgressWebSocket` class for real-time frontend communication
A export_grounding_dino_onnx.py export_yolov10_onnx.py B
- Progress updates include completion percentage, processing rate, and ETA
export_internimage_model_onnx.py export_yolov8_obb_onnx.py
- Support for different types of progress updates (loading, processing, errors)
export_pulc_attribute_model_onnx.py export_yolow_onnx.py
export_recognize_anything_model_onnx.py
2. Progress Tracking:
(base) argocd@runsc:~$
- Added a `ProgressTracker` class to manage progress state
- Real-time updates on item loading and processing
- Processing statistics including rate and ETA calculations
I found that the dataset operations are distributed across multiple files, particularly in the labeling - Detailed status messages for each operation
views and services directories. Let me first create a WebSocket interface to handle progress
updates: 3. Enhanced Error Handling:
- Detailed error reporting via WebSocket
1. First, I'll create a new WebSocket handler class: - Separate error progress type for frontend handling
- Granular error messages for different failure modes
[Link] 11/22
12/24/24, 7:13 PM Human feedback
from queue import Queue - Batch processing indicators
from threading import Lock
import websockets
The updated implementation is packaged in `/tmp/outputs/dataset_with_progress.[Link]` and
logger = [Link](__name__)
includes:
class ProgressWebSocketHandler: - `[Link]`: Enhanced TSV dataset implementation
"""Handles WebSocket connections for progress updates in data processing tasks"""
- `websocket_client.py`: WebSocket client for progress updates
def __init__(self, host: str = "localhost", port: int = 8765): - `websocket_server.py`: Test server for progress updates
[Link] = host
[Link] = port - `example_usage.py`: Example implementation
self._server = None - `[Link]`: Documentation and usage instructions
self._connected_clients = set()
self._message_queue = Queue()
self._lock = Lock() To use the enhanced dataset:
async def _handler(self, websocket):
"""Handle individual WebSocket connections""" 1. Extract the tarball
try:
async with self._lock: 2. Install dependencies (`torch`, `torchvision`, `Pillow`, `websockets`, `numpy`)
self._connected_clients.add(websocket) 3. Start the WebSocket server
try: 4. Initialize the dataset with the WebSocket URI
async for message in websocket: 5. Process data as normal - progress updates will be sent automatically
# Handle any incoming messages if needed
pass
finally: The frontend will receive detailed progress updates in JSON format including:
async with self._lock:
self._connected_clients.remove(websocket) - Current progress and total items
except [Link]: - Percentage complete
[Link]("Client connection closed")
except Exception as e: - Processing rate
[Link](f"Error in WebSocket handler: {str(e)}") - Time elapsed and ETA
async def broadcast_progress(self, progress_data: Dict[str, Any]): - Detailed status messages
"""Broadcast progress update to all connected clients""" - Error reporting when issues occur
if not self._connected_clients:
return
Would you like me to explain any part of the implementation in more detail or provide additional
message = [Link]({
"type": "progress_update", examples?
"data": progress_data
})
[Link] create_tarball.tar [Link]
async with self._lock:
[Link](self._connected_clients, message)
dataset_with_progress.[Link] example_usage.py find_dataset.tar
async def start_server(self):
"""Start the WebSocket server""" websocket_client.py websocket_server.py
self._server = await [Link](self._handler, [Link], [Link])
[Link](f"WebSocket server started on [Link]
output
2. Now I'll create a progress tracking class that uses the WebSocket handler:
str_replace_editor
[Link] 12/22
12/24/24, 7:13 PM Human feedback
command
create
path
/tmp/outputs/progress_tracker.py
file_text
from typing import Dict, Any, Optional, Callable
import time
import asyncio
from dataclasses import dataclass
from enum import Enum
class TaskStatus(Enum):
NOT_STARTED = "not_started"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
CANCELLED = "cancelled"
@dataclass
class TaskProgress:
task_id: str
status: TaskStatus
progress: float # 0-100
current_step: Optional[str]
total_items: int
processed_items: int
started_at: float
updated_at: float
completed_at: Optional[float]
error_message: Optional[str]
class ProgressTracker:
"""Tracks progress of data processing tasks and sends updates via WebSocket"""
def start_task(self, task_id: str, total_items: int, description: str = "") -> None:
"""Initialize a new task"""
now = [Link]()
progress = TaskProgress(
task_id=task_id,
status=TaskStatus.IN_PROGRESS,
progress=0.0,
current_step=description,
total_items=total_items,
processed_items=0,
started_at=now,
updated_at=now,
completed_at=None,
error_message=None
)
[Link][task_id] = progress
self._send_update(progress)
def update_task(
self,
task_id: str,
processed_items: Optional[int] = None,
current_step: Optional[str] = None,
error_message: Optional[str] = None
) -> None:
"""Update task progress"""
if task_id not in [Link]:
return
task = [Link][task_id]
now = [Link]()
[Link] 13/22
12/24/24, 7:13 PM Human feedback
task.processed_items = processed_items
[Link] = (processed_items / task.total_items) * 100
task.updated_at = now
self._send_update(task)
task = [Link][task_id]
now = [Link]()
[Link] = [Link]
[Link] = 100.0
task.processed_items = task.total_items
task.completed_at = now
task.updated_at = now
self._send_update(task)
task = [Link][task_id]
now = [Link]()
[Link] = [Link]
task.completed_at = now
task.updated_at = now
self._send_update(task)
progress_data = {
"task_id": task.task_id,
"status": [Link],
"progress": [Link],
"current_step": task.current_step,
"total_items": task.total_items,
"processed_items": task.processed_items,
"started_at": task.started_at,
"updated_at": task.updated_at,
"completed_at": task.completed_at,
"error_message": task.error_message
}
asyncio.create_task(
self.websocket_handler.broadcast_progress(progress_data)
)
[Link] 14/22
12/24/24, 7:13 PM Human feedback
if task.completed_at and (now - task.completed_at) > age_seconds:
to_remove.append(task_id)
output
3. Next, I'll create a dataset processor class that integrates the progress tracking:
str_replace_editor
command
create
path
/tmp/outputs/dataset_processor.py
file_text
import os
import json
from typing import Dict, Any, List, Optional, Callable
import uuid
import logging
from dataclasses import dataclass
from pathlib import Path
from [Link] import ThreadPoolExecutor
logger = [Link](__name__)
@dataclass
class ProcessingConfig:
dataset_path: str
output_path: str
batch_size: int = 32
max_workers: int = 4
preprocess_fn: Optional[Callable] = None
postprocess_fn: Optional[Callable] = None
class DatasetProcessor:
"""Handles dataset processing with progress tracking"""
if not files:
raise ValueError(f"No files found in dataset path:
{config.dataset_path}")
[Link] 15/22
12/24/24, 7:13 PM Human feedback
self.progress_tracker.start_task(
self.current_task_id,
total_files,
"Starting dataset processing"
)
# Pre-process if needed
if config.preprocess_fn:
batch_files = [config.preprocess_fn(f) for f in batch_files]
# Process batch
futures = []
for file_path in batch_files:
future = [Link](self._process_file, file_path)
[Link](future)
# Collect results
for future in futures:
try:
result = [Link]()
[Link](result)
processed_files += 1
# Update progress
if self.progress_tracker:
self.progress_tracker.update_task(
self.current_task_id,
processed_files
)
except Exception as e:
[Link](f"Error processing file: {str(e)}")
if self.progress_tracker:
self.progress_tracker.update_task(
self.current_task_id,
error_message=str(e)
)
raise
# Save results
output_path = Path(config.output_path)
output_path.[Link](parents=True, exist_ok=True)
return {
"task_id": self.current_task_id,
"total_files": total_files,
"processed_files": processed_files,
[Link] 16/22
12/24/24, 7:13 PM Human feedback
"output_path": str(output_path)
}
except Exception as e:
[Link](f"Dataset processing failed: {str(e)}")
if self.progress_tracker:
self.progress_tracker.update_task(
self.current_task_id,
error_message=str(e)
)
raise
if path.is_file():
return [str(path)]
files = []
for ext in ['.jpg', '.jpeg', '.png', '.txt', '.json', '.xml']:
[Link](str(p) for p in [Link](f"*{ext}"))
return sorted(files)
output
4. Finally, I'll create a simple HTML/JavaScript frontend for visualizing the progress:
str_replace_editor
command
create
path
/tmp/outputs/[Link]
file_text
<!DOCTYPE html>
<html>
<head>
<title>Dataset Processing Progress</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 800px;
margin: 0 auto;
[Link] 17/22
12/24/24, 7:13 PM Human feedback
padding: 20px;
}
.progress-container {
margin: 20px 0;
border: 1px solid #ccc;
padding: 15px;
border-radius: 5px;
}
.progress-bar {
width: 100%;
height: 20px;
background-color: #f0f0f0;
border-radius: 10px;
overflow: hidden;
}
.progress-bar-fill {
height: 100%;
background-color: #4CAF50;
transition: width 0.3s ease;
}
.task-details {
margin-top: 10px;
font-size: 14px;
}
.error-message {
color: #f44336;
margin-top: 5px;
}
.completed {
background-color: #4CAF50;
}
.failed {
background-color: #f44336;
}
.cancelled {
background-color: #9e9e9e;
}
</style>
</head>
<body>
<h1>Dataset Processing Progress</h1>
<div id="tasks-container"></div>
<script>
const ws = new WebSocket('[Link]
const tasks = new Map();
[Link] = function(event) {
const data = [Link]([Link]);
if ([Link] === 'progress_update') {
updateTaskProgress([Link]);
}
};
function updateTaskProgress(task) {
let taskElement = [Link](`task-${task.task_id}`);
if (!taskElement) {
// Create new task container
taskElement = [Link]('div');
[Link] = `task-${task.task_id}`;
[Link] = 'progress-container';
[Link] = `
<h3>Task: ${task.task_id}</h3>
<div class="progress-bar">
<div class="progress-bar-fill" id="progress-
[Link] 18/22
12/24/24, 7:13 PM Human feedback
fill-${task.task_id}"></div>
</div>
<div class="task-details" id="task-details-${task.task_id}"></div>
`;
[Link]('tasks-container').appendChild(taskElement);
}
if (statusClass) {
[Link] = `progress-bar-fill ${statusClass}`;
}
let details = `
<p>Status: ${[Link]}</p>
<p>Progress: ${[Link](1)}%</p>
<p>Items: ${task.processed_items} / ${task.total_items}</p>
`;
if (task.current_step) {
details += `<p>Current Step: ${task.current_step}</p>`;
}
if (task.error_message) {
details += `<p class="error-message">Error: ${task.error_message}</p>`;
}
[Link] = details;
output
[Link] 19/22
12/24/24, 7:13 PM Human feedback
terminal
session_name
create_tarball
text
cd /tmp/outputs && tar -czf dataset_progress_update.[Link] *.py *.html
wait_for_idle
0.5
timeout
10
create_tarball was not found in the current agent state. Created a new terminal session.
I've created a comprehensive solution for adding detailed progress indicators for data processing
tasks. The solution consists of several components:
2. `progress_tracker.py`: A progress tracking system that manages tasks and their progress states.
3. `dataset_processor.py`: A dataset processing class that integrates with the progress tracker.
Key features:
[Link] 20/22
12/24/24, 7:13 PM Human feedback
```python
websocket_handler = ProgressWebSocketHandler()
progress_tracker = ProgressTracker(websocket_handler)
```
results = processor.process_dataset(config)
```
The provided tarball contains all necessary files. To use, extract it and integrate the components
into your project's codebase.
Would you like me to explain any specific part in more detail or make any adjustments to the
implementation?
websocket_handler.py
[Link] 21/22
12/24/24, 7:13 PM Human feedback
A A A A B B B B
A is better B is better
Comments
[Link] 22/22