-
Notifications
You must be signed in to change notification settings - Fork 788
/
Copy pathlevenshtein.xml
222 lines (205 loc) · 6.08 KB
/
levenshtein.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision$ -->
<refentry xml:id="function.levenshtein" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
<refnamediv>
<refname>levenshtein</refname>
<refpurpose>Calculate Levenshtein distance between two strings</refpurpose>
</refnamediv>
<refsect1 role="description">
&reftitle.description;
<methodsynopsis>
<type>int</type><methodname>levenshtein</methodname>
<methodparam><type>string</type><parameter>string1</parameter></methodparam>
<methodparam><type>string</type><parameter>string2</parameter></methodparam>
<methodparam choice="opt"><type>int</type><parameter>insertion_cost</parameter><initializer>1</initializer></methodparam>
<methodparam choice="opt"><type>int</type><parameter>replacement_cost</parameter><initializer>1</initializer></methodparam>
<methodparam choice="opt"><type>int</type><parameter>deletion_cost</parameter><initializer>1</initializer></methodparam>
</methodsynopsis>
<para>
The Levenshtein distance is defined as the minimal number of
characters you have to replace, insert or delete to transform
<parameter>string1</parameter> into <parameter>string2</parameter>.
The complexity of the algorithm is <literal>O(m*n)</literal>,
where <literal>n</literal> and <literal>m</literal> are the
length of <parameter>string1</parameter> and
<parameter>string2</parameter> (rather good when compared to
<function>similar_text</function>, which is <literal>O(max(n,m)**3)</literal>,
but still expensive).
</para>
<para>
If <parameter>insertion_cost</parameter>, <parameter>replacement_cost</parameter>
and/or <parameter>deletion_cost</parameter> are unequal to <literal>1</literal>,
the algorithm adapts to choose the cheapest transforms.
E.g. if <code>$insertion_cost + $deletion_cost < $replacement_cost</code>,
no replacements will be done, but rather inserts and deletions instead.
</para>
</refsect1>
<refsect1 role="parameters">
&reftitle.parameters;
<para>
<variablelist>
<varlistentry>
<term><parameter>string1</parameter></term>
<listitem>
<para>
One of the strings being evaluated for Levenshtein distance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>string2</parameter></term>
<listitem>
<para>
One of the strings being evaluated for Levenshtein distance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>insertion_cost</parameter></term>
<listitem>
<para>
Defines the cost of insertion.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>replacement_cost</parameter></term>
<listitem>
<para>
Defines the cost of replacement.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>deletion_cost</parameter></term>
<listitem>
<para>
Defines the cost of deletion.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect1>
<refsect1 role="returnvalues">
&reftitle.returnvalues;
<para>
This function returns the Levenshtein-Distance between the
two argument strings.
</para>
</refsect1>
<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
<row>
<entry>8.0.0</entry>
<entry>
Prior to this version, <function>levenshtein</function> had to be called
with either two or five arguments.
</entry>
</row>
<row>
<entry>8.0.0</entry>
<entry>
Prior to this version, <function>levenshtein</function> would return <literal>-1</literal>
if one of the argument strings is longer than 255 characters.
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</refsect1>
<refsect1 role="examples">
&reftitle.examples;
<para>
<example>
<title><function>levenshtein</function> example</title>
<programlisting role="php">
<![CDATA[
<?php
// input misspelled word
$input = 'carrrot';
// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
Input word: carrrot
Did you mean: carrot?
]]>
</screen>
</example>
</para>
</refsect1>
<refsect1 role="seealso">
&reftitle.seealso;
<para>
<simplelist>
<member><function>soundex</function></member>
<member><function>similar_text</function></member>
<member><function>metaphone</function></member>
</simplelist>
</para>
</refsect1>
</refentry>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->