Skip to main content

split_soft_hyphens

Function split_soft_hyphens 

Source
pub(crate) fn split_soft_hyphens(text: &str) -> (String, Vec<usize>)
Expand description

Strip U+00AD (soft hyphen) codepoints from text and return the stripped string plus the byte offsets in the stripped output where each SHY originally sat. The offsets mark the codepoint boundary after the preceding cluster: a break taken at offset o leaves bytes [0..o) on the previous line and [o..) on the next.

The greedy line-breaker consumes these offsets through try_shy_break when a word would overflow; the Knuth-Plass cutover treats each as a flagged Penalty(50) item with hyphen-glyph advance as its post-break width.

text is expected to be NFC-normalized. NFC does not decompose U+00AD, so no quasi-SHY sequences need to be handled.