Skip to content

Conversation

@cmungall
Copy link
Member

Add a new schema module defining BGC categories for natural product biosynthesis:

  • POLYKETIDE (CHEBI:26188)
  • NRP (Non-ribosomal peptide)
  • RIPP (Ribosomally synthesized and post-translationally modified peptide)
  • TERPENE (CHEBI:35186)
  • ALKALOID (CHEBI:22315)
  • SACCHARIDE (CHEBI:18154)
  • OTHER

Categories based on MIBiG standard for biosynthetic gene cluster classification.

Add a new schema module defining BGC categories for natural product biosynthesis:
- POLYKETIDE (CHEBI:26188)
- NRP (Non-ribosomal peptide)
- RIPP (Ribosomally synthesized and post-translationally modified peptide)
- TERPENE (CHEBI:35186)
- ALKALOID (CHEBI:22315)
- SACCHARIDE (CHEBI:18154)
- OTHER

Categories based on MIBiG standard for biosynthetic gene cluster classification.
Copilot AI review requested due to automatic review settings December 13, 2025 00:53
@claude
Copy link
Contributor

claude bot commented Dec 13, 2025

Claude finished @cmungall's task —— View job


PR Review: Add BGC Category

Comprehensive review of the new BGC categories schema module.

Review Checklist:

  • Read and analyze the new BGC categories file
  • Check adherence to project conventions and LinkML standards
  • Validate ontology mappings and CHEBI IDs
  • Assess schema structure and completeness
  • Check for potential bugs or issues
  • Evaluate test coverage considerations
  • Provide comprehensive feedback

STRENGTHS

Schema Structure & Standards:

  • Excellent LinkML schema format with proper imports and prefixes
  • Well-structured enum with appropriate metadata fields
  • Good use of annotations for biological context (enzymes, examples)
  • Follows project naming conventions (CamelCase enum, UPPER_CASE values)

Content Quality:

  • Scientifically accurate categories based on MIBiG standard
  • Good descriptions and biological context
  • Relevant examples provided for each category
  • Appropriate contributor attribution

Ontology Integration:

  • Most categories have proper CHEBI mappings where appropriate
  • CHEBI IDs appear correctly formatted (verified structure)
  • OTHER category appropriately lacks ontology mapping

⚠️ ISSUES TO ADDRESS

Critical - Missing Schema Import:

  • bio/bgc_categories is not imported in the main valuesets.yaml schema
  • Location: src/valuesets/schema/valuesets.yaml:bgc_categories.yaml needs to be added to imports list
  • This prevents the schema from being included in the generated artifacts

Minor - Incomplete Ontology Mappings:

  • NRP and RIPP lack meaning: mappings to ontology terms
  • Consider mapping to appropriate CHEBI terms if they exist
  • Recommendation: Research CHEBI for "non-ribosomal peptide" and RiPP-related terms

🔍 RECOMMENDATIONS

Integration:

  1. Add - bio/bgc_categories to the imports list in src/valuesets/schema/valuesets.yaml
  2. Run just validate and just site after integration to ensure no conflicts

Ontology Completeness:

  1. Research and add CHEBI mappings for NRP and RIPP if available
  2. Consider using more specific CHEBI terms if they exist

Testing:

  • The schema follows established patterns, so existing validation should cover it
  • Consider adding BGC-specific examples to any integration tests

🛡️ Security & Performance

  • Security: ✅ No security concerns - static value set definitions
  • Performance: ✅ Lightweight enum definitions with no performance impact
  • Dependencies: ✅ Only uses standard LinkML types and CHEBI ontology

📋 SUMMARY

This is a high-quality addition that follows project conventions well. The main blocker is the missing import in the main schema. The scientific content is accurate and well-documented. Once integrated, this will be a valuable addition to the valuesets collection.

Action Required: Add schema import to make it functional in the project.

@dragon-ai-agent dragon-ai-agent merged commit 6c64121 into main Dec 13, 2025
11 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new schema module (bgc_categories.yaml) that defines biosynthetic gene cluster (BGC) categories for natural product biosynthesis, aligned with the MIBiG (Minimum Information about a Biosynthetic Gene cluster) standard. The schema provides a controlled vocabulary for classifying BGCs based on the type of secondary metabolite they produce, which is essential for genome mining and natural product discovery workflows.

Key changes:

  • Introduces 7 BGC category enumerations (POLYKETIDE, NRP, RIPP, TERPENE, ALKALOID, SACCHARIDE, OTHER)
  • Maps categories to ChEBI ontology terms where applicable to ensure semantic interoperability
  • Includes detailed annotations with biosynthetic enzyme information and examples for each category

annotations:
definition: BGCs that do not fit into standard categories or are of unknown
type
examples: Novel or hybrid BGCs
Copy link

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding license and see_also fields at the end of the schema file for consistency with other schema files like demographics.yaml and gb_edoh.yaml. These fields typically include:

  • license: MIT
  • see_also: (with relevant MIBiG standard documentation links)

This would help document the source of the standard and licensing terms, especially since the PR description mentions this is based on the MIBiG standard.

Suggested change
examples: Novel or hybrid BGCs
examples: Novel or hybrid BGCs
license: MIT
see_also:
- https://mibig.secondarymetabolites.org/
- https://mibig.secondarymetabolites.org/documentation.html

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +52
NRP:
title: Non-ribosomal peptide
description: Non-ribosomal peptide biosynthetic gene clusters
annotations:
definition: Peptides synthesized by non-ribosomal peptide synthetases
biosynthetic_enzyme: Non-ribosomal peptide synthetase (NRPS)
examples: Penicillin, vancomycin, cyclosporin
Copy link

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NRP (Non-ribosomal peptide) category is missing a 'meaning' field with a ChEBI ontology mapping, while most other categories (POLYKETIDE, TERPENE, ALKALOID, SACCHARIDE) include this field. For consistency and to provide proper semantic grounding, consider adding a ChEBI term if one exists for non-ribosomal peptides. This would maintain uniformity across the enumeration values and improve interoperability with other ontology-based systems.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +61
RIPP:
title: Ribosomally synthesized and post-translationally modified peptide
description: RiPP biosynthetic gene clusters
annotations:
definition: Ribosomally synthesized peptides with extensive post-translational
modifications
biosynthetic_enzyme: Various modifying enzymes
examples: Nisin, thiopeptides, lanthipeptides
related_chebi: CHEBI:71629 (nisin)
Copy link

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RIPP category is missing a 'meaning' field with a ChEBI ontology mapping, while most other categories (POLYKETIDE, TERPENE, ALKALOID, SACCHARIDE) include this field. Although a related_chebi annotation is provided for nisin (a specific example), the category itself should have a ChEBI term if one exists for the class of ribosomally synthesized and post-translationally modified peptides. This would maintain consistency with the other categories and improve semantic interoperability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants