hyperloglog-0.5: An approximate streaming (constant space) unique object counter
Copyright(c) Edward Kmett 2013-2025
LicenseBSD3
MaintainerEdward Kmett <[email protected]>
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Data.HyperLogLog.Type

Contents

Description

This package provides an approximate streaming (constant space) unique object counter.

See the original paper for details: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

Synopsis

HyperLogLog

data DefaultSipKey Source #

Instances

Instances details
Reifies DefaultSipKey SipKey Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

reflect :: proxy DefaultSipKey -> SipKey #

data SipKey Source #

SigHash Key

Instances

Instances details
Read SipKey Source # 
Instance details

Defined in Crypto.MAC.SipHash

Show SipKey Source # 
Instance details

Defined in Crypto.MAC.SipHash

Eq SipKey Source # 
Instance details

Defined in Crypto.MAC.SipHash

Methods

(==) :: SipKey -> SipKey -> Bool #

(/=) :: SipKey -> SipKey -> Bool #

Ord SipKey Source # 
Instance details

Defined in Crypto.MAC.SipHash

Reifies DefaultSipKey SipKey Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

reflect :: proxy DefaultSipKey -> SipKey #

reifySipKey :: Word64 -> Word64 -> (forall s. Reifies s SipKey => Proxy s -> r) -> r Source #

Promote a SipKey to the type level for use as part of a HyperLogLog type.

newtype HyperLogLog (s :: k) (p :: k1) Source #

Initialize a new counter:

>>> runHyperLogLog (mempty :: DefaultHyperLogLog 3) == V.fromList [0,0,0,0,0,0,0,0]
True

Please note how you specify a counter size with the n invocation. Sizes of up to 16 are valid, with 7 being a likely good minimum for decent accuracy.

Let's count a list of unique items and get the latest estimate:

>>> size (foldr insert mempty [1..10] :: DefaultHyperLogLog 4)
Approximate {_confidence = 0.9972, _lo = 2, _estimate = 9, _hi = 17}

Note how insert can be used to add new observations to the approximate counter.

The s type parameter configures the SipKey that is passed to the hash function when inserting a new value. Note that if cryptographic security is a primary consideration, it is recommended that you create HyperLogLog values using generateHyperLogLog so that the SipKey is randomly generated using system entropy. In contrast, the HyperLogLog data constructor and the mempty method allow constructing HyperLogLog values with fixed SipKeys, which can result in exponentially inaccurate estimates if exploited by an adversary. (See https://eprint.iacr.org/2021/1139.)

Constructors

HyperLogLog

Construct a HyperLogLog value directly from a Vector.

Note that using this data constructor directly permits the s type parameter to be a fixed SipKey, which can have cryptographic security implications. See the Haddocks for HyperLogLog for more details.

Fields

Instances

Instances details
HasHyperLogLog (HyperLogLog s p) (s :: k1) (p :: k2) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Binary (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

put :: HyperLogLog s p -> Put #

get :: Get (HyperLogLog s p) #

putList :: [HyperLogLog s p] -> Put #

Serial (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

serialize :: MonadPut m => HyperLogLog s p -> m () #

deserialize :: MonadGet m => m (HyperLogLog s p) #

Serialize (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

put :: Putter (HyperLogLog s p) #

get :: Get (HyperLogLog s p) #

NFData (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

rnf :: HyperLogLog s p -> () #

Reifies p Integer => Monoid (HyperLogLog s p) Source #

The Monoid instance "should" just work. Give me two estimators and I can give you an estimator for the union set of the two.

Note that using mempty permits the s type parameter to be a fixed SipKey, which can have cryptographic security implications. See the Haddocks for HyperLogLog for more details.

Instance details

Defined in Data.HyperLogLog.Type

Methods

mempty :: HyperLogLog s p #

mappend :: HyperLogLog s p -> HyperLogLog s p -> HyperLogLog s p #

mconcat :: [HyperLogLog s p] -> HyperLogLog s p #

Semigroup (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

(<>) :: HyperLogLog s p -> HyperLogLog s p -> HyperLogLog s p #

sconcat :: NonEmpty (HyperLogLog s p) -> HyperLogLog s p #

stimes :: Integral b => b -> HyperLogLog s p -> HyperLogLog s p #

Generic (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Associated Types

type Rep (HyperLogLog s p) 
Instance details

Defined in Data.HyperLogLog.Type

type Rep (HyperLogLog s p) = D1 ('MetaData "HyperLogLog" "Data.HyperLogLog.Type" "hyperloglog-0.5-ENpCmeTb1lCF8WcmFPJxDt" 'True) (C1 ('MetaCons "HyperLogLog" 'PrefixI 'True) (S1 ('MetaSel ('Just "runHyperLogLog") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Vector Rank))))

Methods

from :: HyperLogLog s p -> Rep (HyperLogLog s p) x #

to :: Rep (HyperLogLog s p) x -> HyperLogLog s p #

Show (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

showsPrec :: Int -> HyperLogLog s p -> ShowS #

show :: HyperLogLog s p -> String #

showList :: [HyperLogLog s p] -> ShowS #

Eq (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

Methods

(==) :: HyperLogLog s p -> HyperLogLog s p -> Bool #

(/=) :: HyperLogLog s p -> HyperLogLog s p -> Bool #

type Rep (HyperLogLog s p) Source # 
Instance details

Defined in Data.HyperLogLog.Type

type Rep (HyperLogLog s p) = D1 ('MetaData "HyperLogLog" "Data.HyperLogLog.Type" "hyperloglog-0.5-ENpCmeTb1lCF8WcmFPJxDt" 'True) (C1 ('MetaCons "HyperLogLog" 'PrefixI 'True) (S1 ('MetaSel ('Just "runHyperLogLog") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Vector Rank))))

generateHyperLogLog :: forall {k} (p :: k) r. Reifies p Integer => (forall s. HyperLogLog s p -> IO r) -> IO r Source #

Generate a fresh HyperLogLog value using a randomly generated SipKey:

>>> generateHyperLogLog $ \(m :: HyperLogLog s 3) -> pure (runHyperLogLog m == V.fromList [0,0,0,0,0,0,0,0])
True

The SipKey is generated using system entropy, so if cryptographic security is a primary consideration, use this function to create a HyperLogLog value instead of manually building one (e.g., by using the HyperLogLog data constructor or by using mempty).

class HasHyperLogLog a (s :: k) (p :: k1) | a -> s p where Source #

Instances

Instances details
HasHyperLogLog (HyperLogLog s p) (s :: k1) (p :: k2) Source # 
Instance details

Defined in Data.HyperLogLog.Type

size :: forall {k1} {k2} (p :: k1) (s :: k2). Reifies p Integer => HyperLogLog s p -> Approximate Int64 Source #

Approximate size of our set

insert :: forall {k1} {k2} (s :: k1) (p :: k2) a. (Reifies s SipKey, Reifies p Integer, Serial a) => a -> HyperLogLog s p -> HyperLogLog s p Source #

insertHash :: forall {k1} {k2} (p :: k1) (s :: k2). Reifies p Integer => Word32 -> HyperLogLog s p -> HyperLogLog s p Source #

Insert a value that has already been hashed by whatever user defined hash function you want.

intersectionSize :: forall {k1} {k2} (p :: k1) (s :: k2). Reifies p Integer => [HyperLogLog s p] -> Approximate Int64 Source #

cast :: forall {k1} {k2} {k3} (p :: k1) (q :: k2) (s :: k3). (Reifies p Integer, Reifies q Integer) => HyperLogLog s p -> Maybe (HyperLogLog s q) Source #

coerceConfig :: forall {k1} {k2} {k3} {k4} (p :: k1) (q :: k2) (r :: k3) (s :: k4). (Reifies p Integer, Reifies q Integer, Reifies r SipKey, Reifies s SipKey) => Maybe (Coercion (HyperLogLog r p) (HyperLogLog s q)) Source #

If the two types p and q reify the same configuration, and if the two types r and s reify the same SipKey, then we can coerce between HyperLogLog r p and HyperLogLog s q. We do this by building a hole in the nominal role for the configuration parameter.