Runups and cycles

Wildon's Weblog 2018-03-12

Let xs be a stream of values that can be compared for equality. Let ++ denote concatenation of lists. Assuming that xs is of the form runUp ++ cyc ++ cyc ++ ..., what is a good way to find the decomposition (runUp, cyc)?

The problem is more subtle than it might seem. If xs is produced by iterating a function, so the entries are [x, f x, f f x, ...] then there are several cycle finding algorithms that can be used. In general, an algorithmic solution is impossible, because the stream may be of the form cyc ++ rest, where cyc has arbitrary length $n+1$ and agrees with rest for $n$ positions. In practice, we might know that any cycle has some bounded length, or be prepared to accept a small chance of misidentification.

In this case, the following Haskell code finds the first decomposition (runUp, guessedCycle) such that the stream is runUp ++ rest and rest, which has length at least t, is of the form guessedCycle ++ guessedCycle ++ rest'.

runupAndCycle t xs =          runupAndCycleAux t bs (zip [1..] (tail xs))runupAndCycleAux t xs [] = error $ show (t, xs)runupAndCycleAux t xs qxs@((q, x) : _)    = case lookup (map snd $ take t qxs) (take q tailsPositions) of        Nothing -> runupAndCycleAux t xs (tail qxs)        Just p  -> -- the t entries starting in positions q and p agree;                   -- test if taking guessedCycle to be all entries                   -- between positions q and p-1 gives a decomposition                    -- of the required form                   let xsFirst  = take (q-p) $ drop p xs                       xsSecond = take (q-p) $ map snd qxs                   in  if and (zipWith (==) xsFirst xsSecond)                           then (take p xs, xsSecond)                           else runupAndCycleAux t xs (tail qxs)    where tailsPositions               = [(take t ys, p) | (ys, p) <- zip (tails xs) [0..]]

For example

runupAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,5] evaluates to ([1,2,3],[1,2,3,4,5]);
runupAndCycle 4 [1,2,3,1,2,3,4,6,1,2,3,4,5] raises an exception, because while there is agreement for four positions, this is shown by the if ... then ... else ... test not to correspond to a cycle (and the bistream is then exhausted without finding anything better);
runUpAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,6] raises an exception for the same reason;
runUpAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,5,6] evaluates to ([1,2,3],[1,2,3,4,5]), as the first example, wrongly suggesting a 5-cycle. Replacing the first argument 4 with 6 we get an exception instead.

The algorithm may use $O(tn^2)$ work to exhaust $n$ bits without finding a suitable decomposition. For example with $t = 3$ , the stream [0,0,0,1,0,0,0,2,0,0,0,3,0,0,0,4, ...] finds possible cycle starts in every fourth position; by position $4m$ , these require work $4m$ to investigate.

I’d like to know if there is a faster or more elegant algorithm—ideally both of course—but some searching only found hits for the deterministic version of the problem.