0% found this document useful (0 votes)
45 views11 pages

Interfax de Documentaciòn

The document describes functions in the MyFun class for standardizing, preprocessing, analyzing, and sorting documents. Key functions include: 1) DocStandardization - Standardizes a document by removing whitespace, punctuation, and converting to lowercase. 2) RemoveStop - Removes stop words from a document. 3) StatisticsWords - Counts word frequencies, locations, and distances in a document. It returns an array of WORDFRE objects. 4) QuickSort - Sorts the WORDFRE array by entropy difference to rank words by importance. The WORDFRE class represents a word's properties like frequency and entropy difference calculated by the EntropyDifference_Max function.

Uploaded by

alfonsoable
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

Interfax de Documentaciòn

The document describes functions in the MyFun class for standardizing, preprocessing, analyzing, and sorting documents. Key functions include: 1) DocStandardization - Standardizes a document by removing whitespace, punctuation, and converting to lowercase. 2) RemoveStop - Removes stop words from a document. 3) StatisticsWords - Counts word frequencies, locations, and distances in a document. It returns an array of WORDFRE objects. 4) QuickSort - Sorts the WORDFRE array by entropy difference to rank words by importance. The WORDFRE class represents a word's properties like frequency and entropy difference calculated by the EntropyDifference_Max function.

Uploaded by

alfonsoable
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

KEBOED Interface Document

Class MyFun

1.1. DocStandardization

Standardize the input document

Statement

string DocStandardization(string TheDoc);

Belong to the Class

MyFun

Return Value

Return the standardized document

Parameter

TheDoc:The document which is to be standardized

Explanation

DocStandardization. This Function is used to standardize the original document. The


process of standardizing include removing line breaks, consecutive multiple
spaces ,punctuation symbols and convert the English document to lowercase letters
in order to eliminate the impact of the case on the keyword extraction.

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;

namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "When on board H.M.S. Beagle, as naturalist, \n\nI was
much struck with certain facts in the distribution of the
organic beings inhabiting South America, and in the geological relations of the
present to the past inhabitants of that continent. ";
string TheDoc_Standardization = "";

TheDoc_Standardization =
KeywordExtractionAPI.MyFun.DocStandardization(TheDoc);

Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nTheDoc_Standardization:\n" +
TheDoc_Standardization);
Console.ReadKey();
}
}
}

1.2. RemoveStop

Remove the stop words from the input document.

Statement

string RemoveStop(string TheDoc);

Belong to the Class

MyFun

Return Value

If successful, return the document which is removed the stop words. If execution
fails, return "ERROR" for the beginning of the wrong reasons.

Parameter

TheDoc:The document which is to be removed the stop words

Explanation
RemoveStop. Thisfunction is used to remove stop words in the document, the process
by reading the list of stop words, according to the list of words, the stop words
in the document is removed. In addition, this step is optional, remove stop words
may improve the accuracy of keyword extraction, or may not be affected.
Cause of error: No stop words in the file.

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;

namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "origin of species introduction when on board hms beagle
as naturalist i was much struck with certain facts in the distribution of the organic
beings inhabiting south america and in the geological relations of the present to
the past inhabitants of that continent";
string TheDoc_RemoveStop = "";

TheDoc_RemoveStop = KeywordExtractionAPI.MyFun.RemoveStop(TheDoc);

Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nTheDoc_RemoveStop:\n" + TheDoc_RemoveStop);
Console.ReadKey();
}
}
}

1.3. StatisticsWords

Count the word frequency、word loction and word distance of the input document

Statement

WORDSFRE[] StatisticsWords(string TheDoc);


Belong to the Class

MyFun

Return Value

Return WORDSFRE class array. In this array ,each node WORDSFRE class is saved in
a word frequency, location and distance between two successive words.

Parameter

TheDoc:The document which the word segment has been carried out

Explanation

StatisticsWords. This function is used to count word frequency, location and


distance between two successive words in the document. a document is divided into
some words, and then for each word, the same vocabulary of information into a WORDSFRE
class, and finally return after WORDSFRE statistics class array.

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;

namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "origin of species introduction when on board hms beagle
as naturalist i was much struck with certain facts in the distribution of the organic
beings inhabiting south america and in the geological relations of the present to
the past inhabitants of that continent";
KeywordExtractionAPI.WORDSFRE[] wordsfre;

wordsfre = KeywordExtractionAPI.MyFun.StatisticsWords(TheDoc);

Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nwordsfre:");
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
Console.WriteLine(wf.Word + "\t" + wf.Frequency);
}
Console.ReadKey();
}
}
}

1.4. QuickSort

Sorting the WORDSFRE array according to the entropy difference

Statement

bool QuickSort(WORDSFRE[] array, int left, int right);

Belong to the Class

MyFun

Return Value

If successfully, returns true, otherwise returns false.

Parameter

array:The array which type is WORDSFRE, and save the entropy difference

left:left-most position in the array, is generally 0.

right:right-most position in the array, is generally array.Length-1.

Explanation
QuickSort. This function is used to sort the array, the higher entropy difference
between two words will be in the front of the array.

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;
namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "origin species introduction board hms beagle
naturalist struck distribution organic inhabiting south america geological
relations past inhabitants continent seen latter chapters volume throw light origin
species mystery mysteries called philosophers return home occurred 1837 question
patiently accumulating reflecting sorts possibly bearing five allowed speculate
subject drew short notes enlarged 1844 sketch conclusions probable period day
steadily pursued object hope excused entering personal details hasty coming decision
1859 nearly finished complete health strong urged publish abstract especially
induced wallace studying natural history malay archipelago arrived exactly
conclusions origin species 1858 sent memoir subject request forward sir charles
lyell sent linnean society published third volume journal society sir lyell dr hooker
latter read sketch 1844 honoured thinking advisable publish wallaces excellent
memoir brief extracts manuscripts abstract publish necessarily imperfect references
authorities statements trust reader reposing confidence accuracy doubt errors crept
hope cautious trusting authorities conclusions arrived illustration hope suffice
feel sensible necessity hereafter publishing detail references conclusions grounded
hope future am aware scarcely single discussed volume adduced apparently leading
conclusions directly opposite arrived fair result obtained stating balancing
arguments question impossible regret space prevents satisfaction acknowledging
generous assistance received naturalists personally unknown opportunity pass
expressing deep obligations dr hooker fifteen aided stores knowledge excellent
judgment considering origin species conceivable naturalist reflecting mutual
affinities organic embryological relations geographical distribution geological
succession conclusion species independently created descended varieties species
nevertheless conclusion founded unsatisfactory shown innumerable species
inhabiting world modified acquire perfection structure coadaptation justly excites
admiration naturalists continually refer external conditions ";
KeywordExtractionAPI.WORDSFRE[] wordsfre;

wordsfre = KeywordExtractionAPI.MyFun.StatisticsWords(TheDoc);
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
wf.EntropyDifference_Max();
}
KeywordExtractionAPI.MyFun.QuickSort(wordsfre, 0, wordsfre.Length -
1);
Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nwordsfre:");
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
Console.WriteLine(wf.Word + "\t" + wf.ED);
}
Console.ReadKey();
}
}
}

2. Class WORDSFRE

2.1. Class variable list

string Word; //word


int Frequency; //word frequency
int[] Position; //the location of the word in the document
int[] Distance; //the distance between two successive word
Double ED; // Entropy difference

2.2. EntropyDifference_Max

Using the maximum entropy method to calculate entropy difference

Statement

bool EntropyDifference_Max();

Belong to the Class

WORDSFRE

Return Value

If successfully, returns true, otherwise returns false.

Parameter

None
Explanation

EntropyDifference_Max. This function is used to calculate the word entropy


difference based on the maximum entropy

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;

namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "origin species introduction board hms beagle
naturalist struck distribution organic inhabiting south america geological
relations past inhabitants continent seen latter chapters volume throw light origin
species mystery mysteries called philosophers return home occurred 1837 question
patiently accumulating reflecting sorts possibly bearing five allowed speculate
subject drew short notes enlarged 1844 sketch conclusions probable period day
steadily pursued object hope excused entering personal details hasty coming decision
1859 nearly finished complete health strong urged publish abstract especially
induced wallace studying natural history malay archipelago arrived exactly
conclusions origin species 1858 sent memoir subject request forward sir charles
lyell sent linnean society published third volume journal society sir lyell dr hooker
latter read sketch 1844 honoured thinking advisable publish wallaces excellent
memoir brief extracts manuscripts abstract publish necessarily imperfect references
authorities statements trust reader reposing confidence accuracy doubt errors crept
hope cautious trusting authorities conclusions arrived illustration hope suffice
feel sensible necessity hereafter publishing detail references conclusions grounded
hope future am aware scarcely single discussed volume adduced apparently leading
conclusions directly opposite arrived fair result obtained stating balancing
arguments question impossible regret space prevents satisfaction acknowledging
generous assistance received naturalists personally unknown opportunity pass
expressing deep obligations dr hooker fifteen aided stores knowledge excellent
judgment considering origin species conceivable naturalist reflecting mutual
affinities organic embryological relations geographical distribution geological
succession conclusion species independently created descended varieties species
nevertheless conclusion founded unsatisfactory shown innumerable species
inhabiting world modified acquire perfection structure coadaptation justly excites
admiration naturalists continually refer external conditions ";
KeywordExtractionAPI.WORDSFRE[] wordsfre;

wordsfre = KeywordExtractionAPI.MyFun.StatisticsWords(TheDoc);
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
wf.EntropyDifference_Max();
}
KeywordExtractionAPI.MyFun.QuickSort(wordsfre, 0, wordsfre.Length -
1);

Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nwordsfre:");
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
Console.WriteLine(wf.Word + "\t" + wf.ED);
}
Console.ReadKey();
}
}
}

2.3. EntropyDifference_Normal

Using the general entropy method to calculate the entropy difference.

Statement

bool EntropyDifference_Normal();

Belong to the Class

WORDSFRE

Return Value

If successfully, returns true, otherwise returns false.

Parameter

None

Explanation
EntropyDifference_Normal . This function is used to calculate the word entropy
difference based on the general entropy

Example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using KeywordExtractionAPI;

namespace KEDLL
{
class Program
{
static void Main(string[] args)
{
string TheDoc = "origin species introduction board hms beagle
naturalist struck distribution organic inhabiting south america geological
relations past inhabitants continent seen latter chapters volume throw light origin
species mystery mysteries called philosophers return home occurred 1837 question
patiently accumulating reflecting sorts possibly bearing five allowed speculate
subject drew short notes enlarged 1844 sketch conclusions probable period day
steadily pursued object hope excused entering personal details hasty coming decision
1859 nearly finished complete health strong urged publish abstract especially
induced wallace studying natural history malay archipelago arrived exactly
conclusions origin species 1858 sent memoir subject request forward sir charles
lyell sent linnean society published third volume journal society sir lyell dr hooker
latter read sketch 1844 honoured thinking advisable publish wallaces excellent
memoir brief extracts manuscripts abstract publish necessarily imperfect references
authorities statements trust reader reposing confidence accuracy doubt errors crept
hope cautious trusting authorities conclusions arrived illustration hope suffice
feel sensible necessity hereafter publishing detail references conclusions grounded
hope future am aware scarcely single discussed volume adduced apparently leading
conclusions directly opposite arrived fair result obtained stating balancing
arguments question impossible regret space prevents satisfaction acknowledging
generous assistance received naturalists personally unknown opportunity pass
expressing deep obligations dr hooker fifteen aided stores knowledge excellent
judgment considering origin species conceivable naturalist reflecting mutual
affinities organic embryological relations geographical distribution geological
succession conclusion species independently created descended varieties species
nevertheless conclusion founded unsatisfactory shown innumerable species
inhabiting world modified acquire perfection structure coadaptation justly excites
admiration naturalists continually refer external conditions ";
KeywordExtractionAPI.WORDSFRE[] wordsfre;

wordsfre = KeywordExtractionAPI.MyFun.StatisticsWords(TheDoc);
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
wf. EntropyDifference_Normal();
}
KeywordExtractionAPI.MyFun.QuickSort(wordsfre, 0, wordsfre.Length -
1);

Console.WriteLine("TheDoc:\n" + TheDoc);
Console.WriteLine("\nwordsfre:");
foreach (KeywordExtractionAPI.WORDSFRE wf in wordsfre)
{
Console.WriteLine(wf.Word + "\t" + wf.ED);
}
Console.ReadKey();
}
}
}

You might also like