Cluster detection and error correction

Peter Cameron's Blog 2024-09-22

A thought, possibly worthless, but I will record it anyway.

Today I heard a talk by Brian Franczak from MacEwan University. It was on cluster detection in data, a big topic in artificial intelligence now. The new feature was that he and his colleagues were detecting clusters in data where there is missing or corrupted data. They had an algorithm, and their implementation of it compares favourably in both performance and runtime with more established algorithms.

After the talk, somebody asked a question, which I completely failed to hear. What I thought had been asked was: can this method be used for error correction, when data is transmitted through a noisy channel which can erase or alter some symbols in a transmitted word?

This was not the question actually asked, but I think that the answer is yes. As usual in error correction, we will only transmit words from a code having the property that any two of its words are at a considerable distance apart. So we proceed as follows. If not too many errors are made, then the received word will be closer to the transmitted word than to any other codeword. So the received words will form clusters centred on codewords, and identifying the cluster containing a given received word will determine the transmitted word uniquely (provided that not too many errors have actually occurred).

So the remaining question is whether their algorthm competes with any of the many known decoding algorithms. This is not a trivial question, since the general problem of decoding linear codes is known to be NP-complete, so there is room for new ideas in the field.