Sitemap
JavaScript in Plain English

New JavaScript and Web Development content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

International Text Segmentation with Intl.Segmenter in JavaScript

4 min readMay 30, 2023

--

Zoom image will be displayed
Photo by Stefan Cosma on Unsplash

The Need for Intl.Segmenter

'Hello there! How are you doing?'.split(/[.!?]/);

The Mechanics of Intl.Segmenter

const germanSegmenter = new Intl.Segmenter('de', { 
granularity: 'word'
});
const germanSegments = germanSegmenter.segment('Was geht ab, Freunde?');

The Return of Segmenter.segment

const germanSegmenter = new Intl.Segmenter('de', {
granularity: 'sentence'
});
const germanSegments = germanSegmenter.segment('Was geht ab, Freunde?');

console.log([...germanSegments]);
console.log(Array.from(germanSegments));
for (let segment of germanSegments) {
console.log(segment);
}

Mapping Segments to Their String Values

const germanSegmenter = new Intl.Segmenter('de', {
granularity: 'sentence'
});
const germanSegments = germanSegmenter.segment('Was geht ab?');

console.log(Array.from(germanSegments, s => s.segment));

Using the isWordLike Property

const germanSegmenter = new Intl.Segmenter('de', {
granularity: 'word'
});
const germanSegments = germanSegmenter.segment('Was geht ab?');

console.log([...germanSegments].filter(s => s.isWordLike));
Zoom image will be displayed

Handling Emojis with Intl.Segmenter

const emojis = '🫣🫵👨‍👨‍👦‍👦';

console.log(emojis.split('')); // Split by code units
console.log([...emojis]); // Split by code points
Zoom image will be displayed
const emojis = '🫣🫵👨‍👨‍👦‍👦';

const segmenter = new Intl.Segmenter('en', {
granularity: 'grapheme'
});
console.log(Array.from(segmenter.segment(emojis), s => s.segment));
Zoom image will be displayed

Wrapping Up

In Plain English 🚀

--

--

JavaScript in Plain English
JavaScript in Plain English

Published in JavaScript in Plain English

New JavaScript and Web Development content every day. Follow to join our 3.5M+ monthly readers.

Zachary
Zachary

Written by Zachary

Programmer #JavaScript #Rust.

No responses yet