Skip to content

Conversation

@baxen
Copy link
Collaborator

@baxen baxen commented Dec 25, 2025

Summary

Refactors the docx_tool function in crates/goose-mcp/src/computercontroller/docx_tool.rs to address the clippy too_many_lines warning. The original function was 440 lines; it is now 33 lines.

Changes

Split the monolithic docx_tool function into focused helper functions:

Error Helpers

  • docx_error: Creates INTERNAL_ERROR ErrorData
  • invalid_params: Creates INVALID_PARAMS ErrorData

File I/O Helpers

  • read_docx_file: Reads and parses a DOCX file
  • read_or_create_docx: Reads existing or creates new DOCX
  • write_docx_file: Builds and writes DOCX to disk

Content Extraction Helpers

  • extract_paragraph_text: Extracts text from a paragraph
  • extract_text_from_docx: Extracts all text from a document
  • extract_structure_from_docx: Extracts heading structure

Content Creation Helpers

  • add_styled_paragraphs: Creates styled paragraphs from content
  • parse_update_mode: Parses update mode from JSON params
  • load_image_as_png: Loads and converts images to PNG

Operation Handlers

  • do_extract_text: Handles extract_text operation
  • do_append: Handles append mode
  • do_replace: Handles replace mode
  • do_insert_structured: Handles structured insert mode
  • do_add_image: Handles add_image mode

Testing

All 9 existing tests pass:

  • test_docx_text_extraction
  • test_docx_update_append
  • test_docx_update_styled
  • test_docx_update_replace
  • test_docx_add_image
  • test_docx_invalid_path
  • test_docx_invalid_operation
  • test_docx_update_without_content
  • test_docx_update_preserve_content

Result

The main docx_tool function is now a simple 33-line dispatcher, well under the 200 line target.

Split the monolithic docx_tool function into focused helper functions:
- docx_error/invalid_params: Error construction helpers
- read_docx_file/read_or_create_docx/write_docx_file: File I/O helpers
- extract_paragraph_text: Text extraction from paragraphs
- add_styled_paragraphs: Styled paragraph creation
- parse_update_mode: Parameter parsing for update modes
- extract_text_from_docx/extract_structure_from_docx: Document content extraction
- do_extract_text: Extract text operation handler
- do_append: Append mode handler
- do_replace: Replace mode handler
- do_insert_structured: Structured insert mode handler
- load_image_as_png: Image loading and conversion
- do_add_image: Add image mode handler

The main docx_tool function is now a simple dispatcher (33 lines),
well under the 200 line target. All existing tests pass.
Copilot AI review requested due to automatic review settings December 25, 2025 03:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully refactors the docx_tool function from 440 lines to 33 lines by extracting focused helper functions. The refactoring maintains all original functionality while significantly improving code organization and readability.

Key changes:

  • Extracted 15 helper functions covering error handling, file I/O, content extraction, content creation, and operation handlers
  • Simplified error creation with docx_error and invalid_params helper functions
  • Consolidated duplicate paragraph text extraction logic into extract_paragraph_text

@baxen baxen marked this pull request as draft December 25, 2025 03:29
@block block deleted a comment from Copilot AI Dec 25, 2025
@block block deleted a comment from Copilot AI Dec 25, 2025
@baxen baxen marked this pull request as ready for review December 25, 2025 03:51
@baxen
Copy link
Collaborator Author

baxen commented Dec 25, 2025

I'd also be very willing to remove this mcp server entirely! I'm not sure what use it has. But assuming we want to keep it this should be ready for review

@baxen baxen changed the title TSK-698: Refactor docx_tool to reduce function size from 440 to 33 lines chore: refactor docx_tool to reduce function size Dec 25, 2025
@baxen baxen mentioned this pull request Dec 26, 2025
11 tasks
}
}
text
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to have a trailing \n? consider using filter_map and join instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kept the original code as is other than splitting up the big function here, thinking ill keep it but we can do another pr to clean this implementatino up too

@baxen baxen merged commit 41dbdda into main Jan 6, 2026
29 of 30 checks passed
@baxen baxen deleted the baxen/TSK-698 branch January 6, 2026 04:41
zanesq added a commit that referenced this pull request Jan 6, 2026
* 'main' of github.com:block/goose:
  refactor:  when changing provider/model,load existing provider/model (#6334)
  chore: refactor configure_extensions_dialog to reduce line count (#6277)
  chore: refactor handle_configure to reduce line count (#6276)
  chore: refactor interactive session to reduce line count (#6274)
  chore: refactor docx_tool to reduce function size (#6273)
  chore: refactor cli() function to reduce line count (#6272)
  make sure the models are using streaming properly (#6331)
  feat: add a max tokens env var (#6264)
  docs: slash commands topic (#6333)
  fix(ci): prevent gh-pages branch bloat (#6340)
  chore(deps): bump qs and body-parser in /documentation (#6338)
  Skip the smoke tests for dependabot PRs (#6337)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants