0% found this document useful (0 votes)
234 views4 pages

Yule-Simon Distribution

The Yule-Simon distribution is a discrete probability distribution that models phenomena that follow Zipf's law, where the frequency of an item is inversely proportional to its rank. It arises from models of preferential attachment where new items are more likely to connect to existing popular items. The probability mass function of the Yule-Simon distribution can be written in terms of the beta function or rising factorial. The distribution's single parameter ρ can be estimated using a fixed point algorithm.

Uploaded by

maddy555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
234 views4 pages

Yule-Simon Distribution

The Yule-Simon distribution is a discrete probability distribution that models phenomena that follow Zipf's law, where the frequency of an item is inversely proportional to its rank. It arises from models of preferential attachment where new items are more likely to connect to existing popular items. The probability mass function of the Yule-Simon distribution can be written in terms of the beta function or rising factorial. The distribution's single parameter ρ can be estimated using a fixed point algorithm.

Uploaded by

maddy555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Yule–Simon distribution

In probability and statistics, the Yule–Simon


distribution is a discrete probability distribution
Yule–Simon
named after Udny Yule and Herbert A. Simon. Probability mass function
Simon originally called it the Yule distribution.[1]

The probability mass function (pmf) of the Yule–


Simon (ρ) distribution is

for integer and real , where is the


beta function. Equivalently the pmf can be written
in terms of the rising factorial as

Yule–Simon PMF on a log-log scale. (Note that the function


where is the gamma function. Thus, if is an is only defined at integer values of k. The connecting lines
integer, do not indicate continuity.)

Cumulative distribution function

The parameter can be estimated using a fixed


point algorithm.[2]

The probability mass function f has the property


that for sufficiently large k we have

This means that the tail of the Yule–Simon


distribution is a realization of Zipf's law: Yule–Simon CMF. (Note that the function is only defined at
can be used to model, for example, the relative integer values of k. The connecting lines do not indicate
frequency of the th most frequent word in a large continuity.)
collection of text, which according to Zipf's law is
inversely proportional to a (typically small) power Parameters shape (real)
of . Support
PMF
Occurrence CDF
Mean for
The Yule–Simon distribution arose originally as the
limiting distribution of a particular model studied
Mode
by Udny Yule in 1925 to analyze the growth in the
number of species per genus in some higher taxa of Variance
for
biotic organisms.[3] The Yule model makes use of
two related Yule processes, where a Yule process is
Skewness
defined as a continuous time birth process which for
starts with one or more individuals. Yule proved
that when time goes to infinity, the limit distribution Ex. kurtosis
of the number of species in a genus selected for
uniformly at random has a specific form and
exhibits a power-law behavior in its tail. Thirty MGF does not exist
years later, the Nobel laureate Herbert A. Simon CF
proposed a time-discrete preferential attachment
model to describe the appearance of new words in
a large piece of a text. Interestingly enough, the limit
distribution of the number of occurrences of each
word, when the number of words diverges, coincides
with that of the number of species belonging to the
randomly chosen genus in the Yule model, for a
specific choice of the parameters. This fact explains
the designation Yule–Simon distribution that is
commonly assigned to that limit distribution. In the
context of random graphs, the Barabási–Albert model
also exhibits an asymptotic degree distribution that
equals the Yule–Simon distribution in correspondence
of a specific choice of the parameters and still presents
power-law characteristics for more general choices of Plot of the Yule–Simon(1) distribution (red) and its
the parameters. The same happens also for other asymptotic Zipf's law (blue)
[4]
preferential attachment random graph models.

The preferential attachment process can also be studied as an urn process in which balls are added to a
growing number of urns, each ball being allocated to an urn with probability linear in the number (of balls)
the urn already contains.

The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is
treated as a function of random variable having an exponential distribution. Specifically, assume that
follows an exponential distribution with scale or rate :

with density

Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:

The pmf of a geometric distribution is

for . The Yule–Simon pmf is then the following exponential-geometric compound


distribution:
The maximum likelihood estimator for the parameter given the observations is the
solution to the fixed point equation

where are the rate and shape parameters of the gamma distribution prior on .

This algorithm is derived by Garcia[2] by directly optimizing the likelihood. Roberts and Roberts[5]

generalize the algorithm to Bayesian settings with the compound geometric formulation described above.
Additionally, Roberts and Roberts[5] are able to use the Expectation Maximisation (EM) framework to
show convergence of the fixed point algorithm. Moreover, Roberts and Roberts[5] derive the sub-linearity
of the convergence rate for the fixed point algorithm. Additionally, they use the EM formulation to give 2
alternate derivations of the standard error of the estimator from the fixed point equation. The variance of the
estimator is

the standard error is the square root of the quantity of this estimate divided by N.

Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an
incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is
defined as

with . For the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use
of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

See also
Zeta distribution
Scale-free network
Beta negative binomial distribution

Bibliography
Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York:
Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule
distribution".)

References
1. Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika. 42 (3–4): 425–
440. doi:10.1093/biomet/42.3-4.425 (https://doi.org/10.1093%2Fbiomet%2F42.3-4.425).
2. Garcia Garcia, Juan Manuel (2011). "A fixed-point algorithm to estimate the Yule-Simon
distribution parameter" (https://zenodo.org/record/848773). Applied Mathematics and
Computation. 217 (21): 8560–8566. doi:10.1016/j.amc.2011.03.092 (https://doi.org/10.101
6%2Fj.amc.2011.03.092).
3. Yule, G. U. (1924). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J.
C. Willis, F.R.S" (https://doi.org/10.1098%2Frstb.1925.0002). Philosophical Transactions of
the Royal Society B. 213 (402–410): 21–87. doi:10.1098/rstb.1925.0002 (https://doi.org/10.1
098%2Frstb.1925.0002).
4. Pachon, Angelica; Polito, Federico; Sacerdote, Laura (2015). "Random Graphs Associated
to Some Discrete and Continuous Time Preferential Attachment Models". Journal of
Statistical Physics. 162 (6): 1608–1638. arXiv:1503.06150 (https://arxiv.org/abs/1503.0615
0). doi:10.1007/s10955-016-1462-7 (https://doi.org/10.1007%2Fs10955-016-1462-7).
S2CID 119168040 (https://api.semanticscholar.org/CorpusID:119168040).
5. Roberts, Lucas; Roberts, Denisa (2017). "An Expectation Maximization Framework for
Preferential Attachment Models". arXiv:1710.08511 (https://arxiv.org/abs/1710.08511)
[stat.CO (https://arxiv.org/archive/stat.CO)].

Retrieved from "https://en.wikipedia.org/w/index.php?title=Yule–Simon_distribution&oldid=1159560958"

You might also like