Hyphenation with words containing capital letters

Language Log 2017-03-14

A truly startling (and surely unintended) hyphenation in the print edition of The Economist (March 11th) suggests that some updating of word-breaking algorithms is in order in the light of the fairly recent practice of inventing product and brand names that have word-internal upper-case letters. An article about juvenile delinquency, reporting that kids are less involved in crime in part because they're indoors playing video games, ends with this paragraph (I reproduce the line breaks and hyphens of the UK print edition exactly, though not the microspacing that justifies the right-hand margin; the only thing I'm interested in is the end of the penultimate line):

    The decline in crime among the young bodes well for the future. A Home Office study in 2013 found that those who com- mitted their first crime aged between ten and 17 were nearly four times more likely to become chronic offenders than those who were aged 18-24, and 11 times more likely than those who were over 25. More PlayS- tation, less police station.

Even for a word like workstation, it would be very odd to hyphenate it as work- (line break) station, but I guess the algorithms that decide on where to hyphenate in narrow-column typesetting do not contain full details of all the stem boundaries in compound words in English (treetop, daylight, workstation, lunchroom, teacup, cutthroat, typesetting, update, and hundreds of thousands of other words) and where the boundaries of their components are.

I know very little about hyphenation algorithms (comments below are open so that truly nerdy readers who know about word-processing and typesetting can enlighten me), but my guess is that breaking in a way that leaves a possible syllable each side of the break is favored over breaking either before or after a cluster of consonant letters. Thus clus-ter would be favored over both clu-ster and clust-er, which is reasonable enough. But with PlayStation that policy produces a result that looks absolutely insane.

There needs to be a rule in there that says in effect, "Avoid at all costs a hyphen immediately after a word-internal capital letter" — and perhaps the program should also favor a hyphen break before any such word-internal capital.