Copyright | (c) 2025 Florian Ragwitz |
---|---|
License | MIT |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Data.FindCycle
Description
Any function f :: a -> a
where the type a
has finitely many values
eventually has to be cyclic when iterated from some initial a
.
This module provides a number of common algorithms and utilities to identify and work with such cycles.
Synopsis
- data CycleFinder a
- naiveOrd :: Ord a => CycleFinder a
- naiveHashable :: (Eq a, Hashable a) => CycleFinder a
- brent :: Eq a => CycleFinder a
- floyd :: Eq a => CycleFinder a
- nivash :: Ord a => CycleFinder a
- nivashPart :: (Ix k, Bounded k, Ord a) => (a -> k) -> CycleFinder a
- nivashPartWithinBounds :: (Ix k, Ord a) => (k, k) -> (a -> k) -> CycleFinder a
- findCycle :: CycleFinder a -> (a -> a) -> a -> (Int, Int)
- findCycleExtract :: CycleFinder a -> (a -> a) -> a -> (Int, Int, ([a], [a]))
- cycleExp :: CycleFinder a -> (a -> a) -> a -> Integer -> a
- cycleExp' :: CycleFinder a -> (a -> a) -> a -> Integer -> a
- unsafeFindCycleFromList :: CycleFinder a -> [a] -> (Int, Int, ([a], [a]))
- minimalMu :: Eq a => CycleFinder a -> CycleFinder a
Typical Usage
The value of iterating someCyclicFunc
for \(10^{100}\) times from
startingValue
, using the brent
algorithm for cycle detection:
let fastCyclicFunc = cycleExp brent someCyclicFunc startingValue fastCyclicFunc (10^100)
The length of the non-repeating prefix and the length of the cycle, as
determined using the nivash
algorithm:
let (mu, lambda) = findCycle nivash someCyclicFunc startingValue
The same two lengths, plus two lists containing the values of the prefix and
cyclic parts of the sequence using the naiveOrd
algorithm:
let (mu, lambda, (pre, cyc)) = findCycleExtract naiveOrd someCyclicFunc startingValue
When you already have a list of values created by iterating a cyclic function:
let xs = iterate someCyclicFunc startingValue let (mu, lambda, (pre, cyc)) = unsafeFindCycleFromList brent xs
CycleFinder type
data CycleFinder a Source #
An algorithm to find the cycle in a function from a
to a
(or a list of a
s).
Algorithms
Cycles are typically described with a pair \((\mu, \lambda)\), where \(\mu\) represents the length of the non-cyclic prefix of the sequence, and \(\lambda\) represents the length of the repeating cycle of the sequence.
The cycle finding algorithms provided by this module return such a pair as
a result, but some might return an upper bound \(\tilde{\mu}\) instead of
the minimal \(\mu\) in order to avoid the computational cost of finding the
minimal value. This approximation is acceptable in many practical cases,
such as when using cycleExp
, which uses the cyclic behavior of a function
to efficiently compute \(f^n(x)\) for large \(n\).
When a minimal \(\mu\) is needed, it can be computed from a CycleFinder
returning a non-minimal \(\tilde{\mu}\) using minimalMu
.
All algorithms always provide a minimal \(\lambda\) as opposed to a multiple of the true cycle length.
In practice, you'll usually want to use nivash
, brent
, or one of the
naive variants. If performance matters and you're not sure what to choose,
compare the alternatives by benchmarking for your usecase.
Naive
These algorithms use Map-like structures to store the index of the first occurrence of each value in the sequence until a duplicate is found.
They always produce the minimal \((\mu, \lambda)\).
They never iterate the sequence further than \(\mu + \lambda\) elements.
They never compute an element at a given position in the sequence more than once.
They use memory approximately proportional to \(\mu + \lambda\).
naiveHashable
tends to perform slightly better and uses slightly less
memory. Both are provided for completeness and for cases where you might
not have a Hashable
instance or don't want to write one.
naiveHashable :: (Eq a, Hashable a) => CycleFinder a Source #
Naive cycle finding algorithm using HashMap
.
Constant Memory
These algorithms use a constant amount of memory, at the cost of having to potentially evaluate values in the sequence more than once.
brent
is always better than floyd
, and the latter is only present for
completeness and as a baseline for testing. You shouldn't use floyd
.
They always compute a minimal \(\lambda\), but only an upper bound
\(\tilde{\mu}\) for the cycle length. Combine with minimalMu
if the
minimal \(\mu\) is needed.
brent :: Eq a => CycleFinder a Source #
Brent's cycle finding algorithm.
Evaluates at most \(2(\mu + \lambda)\) elements of the sequence.
Always better than floyd.
floyd :: Eq a => CycleFinder a Source #
Floyd's / Tortoise and Hare cycle finding algorithm.
Always worse than brent
. Don't use this.
Memory/Time Compromise
nivash :: Ord a => CycleFinder a Source #
Nivash's cycle finding algorithm.
Never computes an element at a given position in the sequence more than once.
Might use memory proportional to \(\mu + \lambda\) in the worst case of an ascending sequence, but commonly uses much less for reasonably "random" sequences.
Can often be faster than brent
while not using nearly as much memory as
naiveOrd
or naiveHashable
.
nivashPart :: (Ix k, Bounded k, Ord a) => (a -> k) -> CycleFinder a Source #
Like nivash
, but uses multiple independent stacks as determined by a partitioning
function.
This trades off some additional memory usage for the ability to detect cycles earlier in the sequence.
Using \(k\) stacks, the cycle will be identified after, on average, a fraction of
\(\dfrac{1}{k+1}\) of the cycle length. E.g 50% above the absolute minimum achievable
for \(k=1\) (nivash
without partitioning), or 1% above that minimum for \(k=99\).
The entire range of k
will be used for partitioning, with each value from minBound
to maxBound
having its own partition. Use a type appropriate for the number of
partitions you'd like to use. Use nivashPartWithinBounds
if you'd like to explicitly
use a continuous subrange of k
instead.
>>>
import Data.Word (Word8)
>>>
let alg256 = nivashPart (fromIntegral :: Integer -> Word8)
>>>
import Data.Finite (modulo) -- >= 0.2 for the Ix instance
>>>
let alg100 = nivashPart (modulo @100)
nivashPartWithinBounds Source #
Arguments
:: (Ix k, Ord a) | |
=> (k, k) | the lower and upper bound (inclusive), respectively, of the partition keys returned by the partition function |
-> (a -> k) | the partition function. Returning values outside of the specified bounds will cause run-time errors. |
-> CycleFinder a |
Like nivashPart
, but allows for a specific continuous subrange of k
to be used for
partitioning rather than creating a partition for each possible value of k
.
>>>
let alg100 = nivashPartWithinBounds (0, 99) (`mod` 100)
Running algorithms
findCycle :: CycleFinder a -> (a -> a) -> a -> (Int, Int) Source #
Runs a CycleFinder
algorithm for the given function and starting value,
returning a pair \((\mu, \lambda)\) representing the length of the
non-cyclic prefix and the length of the cycle of the sequence.
findCycleExtract :: CycleFinder a -> (a -> a) -> a -> (Int, Int, ([a], [a])) Source #
Like findCycle
, but also returns a third value (pre, cyc)
such that
pre ++ cycle cyc == iterate f x
In addition to extracting the prefix and cyclic part of the list, this can
also be used to cache some function calls to f
which the specified
CycleFinder
might make, as the results of all calls to f
in the sequence
are memorised in a lazy list which is later used to extract pre
and cyc
.
If you're only interested in caching calls to f
but don't need the two
parts of the list, just don't evaluate the last part of the return value to
not pay the cost of those parts being computed.
cycleExp :: CycleFinder a -> (a -> a) -> a -> Integer -> a Source #
Constructs an efficient evaluator for a cyclic function by "exponentiating" it.
Given a CycleFinder
for a function f
and an initial value x
, it returns
a function of type Integer -> a
which computes the nth iterate (i.e. the
value of \(f^n(x)\)).
Using the pair \((\mu, \lambda)\) obtained by the CycleFinder
, this
function computes
[ f^n(x) = begin{cases} f^n(x) & text{if } n < mu, f^{mu + ((n - mu) bmod lambda)}(x) & text{if } n ge mu.
end{cases} ]
which allows \(f^n(x)\) to be computed for very large \(n\) without requiring \(n\) function applications.
Note that this function will use a lazy list generated by iterate f x
. This
list will only be evaluated up to \(\mu + \lambda\) elements and is shared
between the cycle finding phase and the computation of the value after n
iterations, but might still require a significant amount of memory. Use
cycleExp'
if you'd rather re-evaluate f
many more times but use less
memory at the expense of more time.
The lazy list might also be evaluated further than \(\mu + \lambda\)
depending on the cycle finding algorithm chosen (brent
, floyd
).
>>>
-- cycle μ=1, λ=83333 when starting from 23
>>>
let f :: Integer -> Integer; f x = x^(42 :: Int) `mod` 1000003
>>>
>>>
g = cycleExp nivash f 23
>>>
g 0 -- after 0 iterations
23>>>
-- after a googol iterations, but finishes in less than the current
>>>
-- age of the universe
>>>
g (10^(100 :: Int))
671872
cycleExp' :: CycleFinder a -> (a -> a) -> a -> Integer -> a Source #
Like cycleExp
, but doesn't cache. Probably not very useful in practice.
unsafeFindCycleFromList :: CycleFinder a -> [a] -> (Int, Int, ([a], [a])) Source #
Runs the CycleFinder
for a given input list.
This function is provided as a convenience for when you already have a list of values you'd like to find a cycle in. It's referred to as "unsafe", because it might lead to surprising results when the input doesn't satisfy the invariants that different algorithms assume.
All algorithms assume that the sequence they're searching can be constructed
by repeated function application from a starting value. Many sequences can't
be, such as [1,2,1,3,1,4,1,5,...]
(because there can only be one unique
successor of 1
).
Algorithms also assume the input sequence to be infinite, and they will commonly consume more than \(\mu + \lambda\) (or \(\tilde{\mu} + \lambda\)) elements from it. If you provide a finite input list, cycles might not be identified correctly if the chosen algorithm runs into the end of it, even though the input does technically contain an identifiable cycle.
If an assumption is violated, algorithms might wrongly identify cycles or
never terminate. Try to stick to findCycle
, findCycleExtract
, cycleExp
,
or cycleExp'
if possible, or only pass infinite lists generated via
iterate f x
(or equivalent) to unsafeFindCycleFromList
.
Similar to findCycleExtract
, just don't evaluate the last part of the
return value if you don't need it and want to avoid the cost of computing it.
Utilities
minimalMu :: Eq a => CycleFinder a -> CycleFinder a Source #
Compute a minimal result \((\mu, \lambda)\) from a partial result \((\tilde{\mu}, \lambda)\).
This involves re-traversing the sequence from the start and from \(\lambda\)
which might be expensive for large \(\mu\). This should largely be negligible
if you're running the CycleFinder
using any of the functions which cache
the sequence of values (any but findCycle
and cycleExp'
).