Copyright | (c) Edward Kmett 2013-2025 |
---|---|
License | BSD3 |
Maintainer | Edward Kmett <[email protected]> |
Stability | experimental |
Portability | non-portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Data.HyperLogLog.Type
Contents
Description
This package provides an approximate streaming (constant space) unique object counter.
See the original paper for details: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
Synopsis
- data DefaultSipKey
- type DefaultHyperLogLog = HyperLogLog DefaultSipKey
- data SipKey
- reifySipKey :: Word64 -> Word64 -> (forall (s :: Type). Reifies s SipKey => Proxy s -> r) -> r
- newtype HyperLogLog s p = HyperLogLog {
- runHyperLogLog :: Vector Rank
- generateHyperLogLog :: Reifies p Integer => (forall (s :: Type). HyperLogLog s p -> IO r) -> IO r
- class HasHyperLogLog a s p | a -> s p where
- hyperLogLog :: Lens' a (HyperLogLog s p)
- size :: Reifies p Integer => HyperLogLog s p -> Approximate Int64
- insert :: forall s p a. (Reifies s SipKey, Reifies p Integer, Serial a) => a -> HyperLogLog s p -> HyperLogLog s p
- insertHash :: Reifies p Integer => Word32 -> HyperLogLog s p -> HyperLogLog s p
- intersectionSize :: Reifies p Integer => [HyperLogLog s p] -> Approximate Int64
- cast :: forall p q s. (Reifies p Integer, Reifies q Integer) => HyperLogLog s p -> Maybe (HyperLogLog s q)
- coerceConfig :: forall p q r s. (Reifies p Integer, Reifies q Integer, Reifies r SipKey, Reifies s SipKey) => Maybe (Coercion (HyperLogLog r p) (HyperLogLog s q))
HyperLogLog
data DefaultSipKey Source #
Instances
Reifies DefaultSipKey SipKey Source # | |
Defined in Data.HyperLogLog.Type Methods reflect :: proxy DefaultSipKey -> SipKey # |
SigHash Key
reifySipKey :: Word64 -> Word64 -> (forall (s :: Type). Reifies s SipKey => Proxy s -> r) -> r Source #
Promote a SipKey
to the type level for use as part of a HyperLogLog
type.
newtype HyperLogLog s p Source #
Initialize a new counter:
>>>
runHyperLogLog (mempty :: DefaultHyperLogLog 3) == V.fromList [0,0,0,0,0,0,0,0]
True
Please note how you specify a counter size with the n
invocation. Sizes of up to 16 are valid, with 7 being a
likely good minimum for decent accuracy.
Let's count a list of unique items and get the latest estimate:
>>>
size (foldr insert mempty [1..10] :: DefaultHyperLogLog 4)
Approximate {_confidence = 0.9972, _lo = 2, _estimate = 9, _hi = 17}
Note how insert
can be used to add new observations to the
approximate counter.
The s
type parameter configures the SipKey
that is passed to the hash
function when insert
ing a new value. Note that if cryptographic security is
a primary consideration, it is recommended that you create HyperLogLog
values using generateHyperLogLog
so that the SipKey
is randomly
generated using system entropy. In contrast, the HyperLogLog
data
constructor and the mempty
method allow constructing HyperLogLog
values
with fixed SipKey
s, which can result in exponentially inaccurate estimates
if exploited by an adversary. (See https://eprint.iacr.org/2021/1139.)
Constructors
HyperLogLog | Construct a Note that using this data constructor directly permits the |
Fields
|
Instances
generateHyperLogLog :: Reifies p Integer => (forall (s :: Type). HyperLogLog s p -> IO r) -> IO r Source #
Generate a fresh HyperLogLog
value using a randomly generated SipKey
:
>>>
generateHyperLogLog $ \(m :: HyperLogLog s 3) -> pure (runHyperLogLog m == V.fromList [0,0,0,0,0,0,0,0])
True
The SipKey
is generated using system entropy, so if cryptographic security
is a primary consideration, use this function to create a HyperLogLog
value instead of manually building one (e.g., by using the HyperLogLog
data constructor or by using mempty
).
class HasHyperLogLog a s p | a -> s p where Source #
Methods
hyperLogLog :: Lens' a (HyperLogLog s p) Source #
Instances
HasHyperLogLog (HyperLogLog s p) (s :: k1) (p :: k2) Source # | |
Defined in Data.HyperLogLog.Type Methods hyperLogLog :: Lens' (HyperLogLog s p) (HyperLogLog s p) Source # |
size :: Reifies p Integer => HyperLogLog s p -> Approximate Int64 Source #
Approximate size of our set
insert :: forall s p a. (Reifies s SipKey, Reifies p Integer, Serial a) => a -> HyperLogLog s p -> HyperLogLog s p Source #
insertHash :: Reifies p Integer => Word32 -> HyperLogLog s p -> HyperLogLog s p Source #
Insert a value that has already been hashed by whatever user defined hash function you want.
intersectionSize :: Reifies p Integer => [HyperLogLog s p] -> Approximate Int64 Source #
cast :: forall p q s. (Reifies p Integer, Reifies q Integer) => HyperLogLog s p -> Maybe (HyperLogLog s q) Source #
coerceConfig :: forall p q r s. (Reifies p Integer, Reifies q Integer, Reifies r SipKey, Reifies s SipKey) => Maybe (Coercion (HyperLogLog r p) (HyperLogLog s q)) Source #
If the two types p
and q
reify the same configuration, and if the two
types r
and s
reify the same SipKey
, then we can coerce between
and HyperLogLog
r p
. We do this by building a hole in
the HyperLogLog
s qnominal
role for the configuration parameter.