pub(crate) fn split_soft_hyphens(text: &str) -> (String, Vec<usize>)Expand description
Strip U+00AD (soft hyphen) codepoints from text and return the
stripped string plus the byte offsets in the stripped output
where each SHY originally sat. The offsets mark the codepoint
boundary after the preceding cluster: a break taken at offset
o leaves bytes [0..o) on the previous line and [o..) on the
next.
The greedy line-breaker consumes these offsets through
try_shy_break when a word would overflow; the Knuth-Plass
cutover treats each as a flagged Penalty(50) item with hyphen-glyph
advance as its post-break width.
text is expected to be NFC-normalized. NFC does not decompose
U+00AD, so no quasi-SHY sequences need to be handled.