Runups and cycles
Wildon's Weblog 2018-03-12
Let xs
be a stream of values that can be compared for equality. Let ++
denote concatenation of lists. Assuming that xs
is of the form runUp ++ cyc ++ cyc ++ ...
, what is a good way to find the decomposition (runUp, cyc)
?
The problem is more subtle than it might seem. If xs
is produced by iterating a function, so the entries are [x, f x, f f x, ...]
then there are several cycle finding algorithms that can be used. In general, an algorithmic solution is impossible, because the stream may be of the form cyc ++ rest
, where cyc
has arbitrary length and agrees with rest
for positions. In practice, we might know that any cycle has some bounded length, or be prepared to accept a small chance of misidentification.
In this case, the following Haskell code finds the first decomposition (runUp, guessedCycle)
such that the stream is runUp ++ rest
and rest
, which has length at least t
, is of the form guessedCycle ++ guessedCycle ++ rest'
.
runupAndCycle t xs = runupAndCycleAux t bs (zip [1..] (tail xs))runupAndCycleAux t xs [] = error $ show (t, xs)runupAndCycleAux t xs qxs@((q, x) : _) = case lookup (map snd $ take t qxs) (take q tailsPositions) of Nothing -> runupAndCycleAux t xs (tail qxs) Just p -> -- the t entries starting in positions q and p agree; -- test if taking guessedCycle to be all entries -- between positions q and p-1 gives a decomposition -- of the required form let xsFirst = take (q-p) $ drop p xs xsSecond = take (q-p) $ map snd qxs in if and (zipWith (==) xsFirst xsSecond) then (take p xs, xsSecond) else runupAndCycleAux t xs (tail qxs) where tailsPositions = [(take t ys, p) | (ys, p) <- zip (tails xs) [0..]]
For example
-
runupAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,5]
evaluates to([1,2,3],[1,2,3,4,5])
; -
runupAndCycle 4 [1,2,3,1,2,3,4,6,1,2,3,4,5]
raises an exception, because while there is agreement for four positions, this is shown by theif ... then ... else ...
test not to correspond to a cycle (and the bistream is then exhausted without finding anything better); -
runUpAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,6]
raises an exception for the same reason; -
runUpAndCycle 4 [1,2,3,1,2,3,4,5,1,2,3,4,5,6]
evaluates to([1,2,3],[1,2,3,4,5])
, as the first example, wrongly suggesting a 5-cycle. Replacing the first argument 4 with 6 we get an exception instead.
The algorithm may use work to exhaust bits without finding a suitable decomposition. For example with , the stream [0,0,0,1,0,0,0,2,0,0,0,3,0,0,0,4, ...]
finds possible cycle starts in every fourth position; by position , these require work to investigate.
I’d like to know if there is a faster or more elegant algorithm—ideally both of course—but some searching only found hits for the deterministic version of the problem.