Ics106 Full
Ics106 Full
The digital age has contributed to the unprecedented growth in the usage of
information in various forms and sources. Due to the complexity of the information
environment, individuals are faced with diverse, abundant information choices in
their academic studies, workplaces and in their personal lives. Such information is
available in Libraries, Community resources, Organizations, Media, Internet, CD-
ROMs, Databases and Information centers etc.
Information Literacy: The ability to search for and hence access appropriate
information across a range of genre, formats and systems. The ability to sift, scan
and sort information.
Most of the definitions of information literacy have been in terms of the information
literate person rather than information literacy itself. The following definitions will be
adopted for the purpose of this course.
Doyle (1992) defines an information literate person as one who does the following:
1
Uses information in critical thinking and problem solving.
Category six: working with knowledge and personal perspectives to gain novel
insights.
Lenox and walker (1993) also defined information literacy by characterizing the
information literate person as one who has the analytical and critical skills to
formulate research questions and evaluate results and the skills to search for
and access a variety of information types in order to meet his or her information
needs.
2
Critical analyses of the various definitions of information literacy show that anybody
who possesses the information literacy skills is the master of his own learning. He
goes from simply finding and learning facts to the process of creating new
information. Knowledge creation includes:
Information literacy skills are skills you will need through one’s lifetime. We are
always seeking information in order to take vital decisions, informed choices and
communicate effectively. We need to continually improve our searching, evaluating
and communicating skills for us to be relevant in a changing information
environment. It should be noted that one cannot be information literate or
communication literate overnight. Just as with speaking or writing skills, your abilities
will improve overtime as you gain expertise in the topics you chose to investigates.
This process will give you practice in searching and evaluating the information you
encounter and will allow creating new ideas, and communicating to others using a
variety of technological tools.
The Big six model designed by Michael Eisenbergand Robert Berkowitz (2001)
will be adopted as one of the most widely used model of information skills. This
model defines how an information literate student can approach research. For
instance how would you investigate a research topic
3
Selecting and evaluating your resources (Examination)
-critically analyze your sources.
-Examine your sources for relevance, currency, accuracy credibility,
appropriateness and bias
-Who is my audience?
-How can I most effectively share this information with this audience?
-Did I meet the guidelines or follow the rubric for the project?
The process:
-Did I explore the full scope of available resources and select the best?
-Did I search electronic resources (the Web and licensed databases) using effective,
efficient, strategic search strategies?
4
Digital literacy: The ability to use digital technology, communication tools or
networks to locate, evaluate, use and create information. E.g. surfing the internet,
downloading of digital information.
Technological Literacy: The innate ability to discover how a new or evolved technology
operates; recognizing its limitations and benefits. The ability to choose the most appropriate
tool to access and process information and present new knowledge & understanding. E.g.
Scanner, Photocopier, Answering machines, ATM machines etc.
The digital age has redefined the act of doing things especially in the work place.
Digital Information which has become a valuable asset in organizations is better
managed. Computer has enabled information to be stored electronically,
manipulated, transferred and retrieved for future use.
The digital age is not without its challenges. These challenges are highlighted
below:
Basic knowledge and skills in the application and usage of ICT application tools.
Unequal access to technology especially in the rural areas- digital divide.
Copyright problems- duplication and copying of other people’s work.
Copyright laws are meant to prevent such acts.
Plagiarism
Lack of electricity and telecommunications infrastructures.
Internet frauds- internet hacking, online frauds by spamming personal details
by criminals for wrong use. (Security issues)
Lack of IT professionals
5
ICT is an acronym that stands for Information Communication Technologies. There is
not a universally accepted definition of ICT because the concepts, methods and
applications involved in ICT are constantly changing and evolving on a daily basis.
In a nutshell, ICT covers any product that will store, retrieve, manipulate, transmit or
receive information electronically in a digital form. Examples include personal
computers, digital television, internet, satellite communication gadgets, e-mail,
robots etc. ICT is concerned with storage, retrieval, manipulation, transmission or
receipt of digital data.
ICT has impacted positively to all sectors of the economy- Health, Education,
Banking, Agriculture Government, and Politics.
6
1.7IMPACT OF ICT ON THE NIGERIAN ECONOMY
ICT is the bedrock for National development and survival. Access to global
information to improve all sectors of the economy- Health, Educational,
Agricultural, and Financialto enhance development.
It enhances National security and law thereby facilitating the efforts in
combating crimes through sophisticated Security monitoring gadgets such as
Closed Circuit Television (CCTV), ICT based security networks, Online
surveillance of street, seaports etc.
Participation in the international Market. A lot of on-line transactions are carried
out via the internet. For instance the internet has changed the way goods and
services are produced, delivered sold and purchased.
Creation of wealth: IT Engineers. Software developers have been developing IT
indigenous product to earn foreign exchange.
ICT can create job opportunities: Software and IT companies are absorbing
graduates with IT skills. This has eradicated unemployment and poverty in the
economy.
Competitive advantage for public and private sectors. These sectors are able to
offer value added products and services through faster communication with
their customers, promotional strategies and online distribution of their
products.
7
It increases their efficiency.
Transparency.
Enriching the lives of the people through the use of modern day technologies
such as Medical treatment databases, cellphones to improve livelihoods.
It enhances policy development that would improve the plight of the people.
Access to Government information.
Information Management and retrieval for future purposes.
Social interaction between the government and the governed- Radio and
Television programmes.
Digital divide- unequal access to information (digital haves and haves not)
when such facilities are in place difficulties arise when they are poorly
maintained.
They are dependent on skills and capacity necessary to use, manage and
is the need to address the language and cultural barrier in ICT through
8
HISTORICAL PERSPECTIVES OF THE INTERNET
The internet was established in 1969 by the Advanced Research Project Agency
(ARPA) of the Department of Defense, United States of America. The internet evolved
from a software called ARPANET (Advanced Research Project Agency Network)
developed by the Military to combat communication problems that were anticipated
to take place during the nuclear war.
The first services developed on ARPANET were remote Telnet access and file
transfers. The users of ARPANET adapted electronic mail (e-mail) for data sharing
across the globe. From e-mail, a system of discussion groups which became known
as USENET emerged. The TCP/IP which is now used on the internet assisted a great
deal in the transmission process on the ARPANET. The roadmap of Internet
Communication is called TCP/IP or Transmission Control Protocol/Internet Protocol.
TCP/IP was developed in order to allow all of the U.S. military computers to
communicate with each other easily, regardless of the manufacturer of the system.
TCP/IP is the universal translator between all of the different hardware combinations
that might exist.
The internet can be described as “network of networks” which links some millions of
computer networks and several individual computers- from Universities, Government
and individual personal computers into an electronic web that permits information to
be transferred via telephone lines and cables. In other words, the internet is a
mechanism for information dissemination and a medium for collaboration and
interaction between individuals and their computers regardless for geographic
location. The internet being a large scale network of millions of computers allows for
continuous communication across the globe.
World Wide Web (WWW): The web is a collection of all browsers, servers, files
and browser- accessible services available through the internet. The web was created
9
through a computer scientist named Tim Berners-Lec. It is the most widely used
service of the Internet, accessed through a web browser. The web can be referred to
as a collection of web pages linked together with Hypertext links. The web pages
are multimedia in nature comprising of text, pictures, sounds and graphics.
Non-graphical or text only browsers were the first browsers developed. These
browsers simply display ASCII text on the computer screen. No pictures can be
included. The main advantage of this type of browser is that that they are very fast.
Graphical browsers have the capability of including both text and pictures. These
browsers have the ability to display pictures, play sounds and even shoe video clips.
Unfortunately, graphical browsers tend to run much more slowly than text only
browsers.
Electronic Mail: It is currently the most popular use of the internet. It is the sending
and receiving of electronic messages and files as attachment. Email is used by most
commercial online services for a fee. Before you can send an email, you must know
recipients’ email addresses. These addresses are composed of the user’s
identification followed by the @ sign, followed by the location of the recipient’s
computer. The main benefit of the email is the instantaneous delivery of messages.
Also, identical messages can be sent to different people at the same time. Thirdly, it
saves cost and time.
File Transfer Protocol (FTP): Files are transferred from one computer to the other
by using FTP protocols. FTP will enable people share files like music, videos etc. FTP
allows users to get or download files from an FTP server file directory. This protocol
also allows users to put or upload a file from their local machine to the remote FTP
server
10
Usenet (Unix User Network): It is a kind of electronic newspaper. It is a means of
communicating news over the internet. Individuals, organizations can write an article
and then post the article on a news server. These articles can be read by anyone
with access to a newsreader, a piece of software that allows an individual to access a
newsgroup. It is a system of bulletin boards where you and anyone else can post
messages and people can post messages for people to read and reply to them.
Newsgroup participants are expected to abide by the rules of ‘netiequette’, the
unofficial guide to communicating on the internet.
Gopher: It is a menu based program that enables you to browse for information
without having to know where the material is specifically located. Gopher is one of
the most comprehensive of all browser systems and it allows you to access other
programs including FTP and Telnet. Using Gopher, you can access library catalogs,
files and databases.
Hypertext Transfer Protocol: This is the method by which World Wide Web Pages
are transferred over the network. Hypertext Markup Languages (HTML) are used for
writing pages for the World Wide Web. It allows text to include codes that define
fonts, layout, embedded graphics and hypertext links.
USES OF INTERNET
Social networking (Facebook, Twitter
Dissemination of information through web mails- yahoo, gmail and Hotmail
Teleconferencing (e.g. meeting can be held at the same time among people in
different countries interactively).
E-learning
11
International trade
Online outsourcing
Knowledge sharing (Corporate Organizaions, Institutions of learning, Student
exchange programme).
Entertainment (Games, downloading movies, music etc)
E-commerce (Online buying and selling, distribution and promotion of products,
auctioning etc)
INFORMATION OVERLOAD
We live in a world full of information being thrown at us, every moment of the day,
constantly demanding our attention. In our everyday lives, we are being constantly
hit with streams of incoming information.
Information overload occurs when we try to receive more information than can be
processed. The noise this effort creates in our minds and our lives can be
overwhelming. The negative effects of information overload are discussed below:
Productivity Loss – In the face of too much information, we can easily get lost in
the details. We waste time focusing on unimportant information and lose sight of our
goal and purpose. The extra data distracts away from our major tasks for the day.
How often have you turned on your computer to check email, and ended up surfing
the net for hours?
Mind Cluttering – The noise created by media, and other sources of information,
clutters our mind and takes away from our inner peace.
Lack of Time – Rich or poor, young or old, we all have the same limited amount of
time in a day. A whole lot of time is expended on sifting, sorting and evaluating
information sources.
12
Lack of Personal Reflection – This comes when we constantly consume
information, then forgetting to connect with ourselves. Valuable personal reflection
comes when we create a ‘space’ for it in our lives. An example is the person who
constantly has the radio on. If there is always noise, then we won’t have the mental
capacity to reflect within.
Stress & Anxiety – Information inflow creates the illusion that we have more tasks
to fill our lives, than we have time for. Often, we might suddenly feel nervous without
understanding why. Every piece of information carries with it energy, which demands
our time. Even if we consciously ignore it, part of us saw that data and recorded it
within our subconscious. So, we feel that we have lots and lots to do. This can create
stress. Too much of a good thing is never good, and this is especially true of
information. We can’t live without a certain amount of information, and much of it is
unavoidable anyway.
Many of the traditional principles that are being applied to judge the appropriateness
or quality of printed materials are equally applicable to electronic sources.
Information Managers need to determine on the criteria to determine the relevance
of internet/ electronic sources.
In recent times, with the advent of the World Wide Web, there had been massive
influx of digital information and sources. There is a wide difference between what is
found on the web and what is found in traditional print sources. Therefore,
understanding the differences between the types of sources will assist in evaluating
the sources. For instance most internet sources do not have print equivalent while
13
sources such as journal or newspaper articles can be found in both print and digital
formats.
14
External sources of information
are clearly stated and
identified. E.g. Bibliographies
and relevant citations.
Publication information such as Date of publications is questionable
date of publication, name of on the internet. Date stated on the
publisher, authors and editors website could be date posted or date
are always indicated in print of last update.
sources.
The criteria below will assist you in evaluating web pages for use as academic
sources. These multiple categories should be employed prior to making a decision
regarding the academic quality of a source.
How you located the site can give you a start on your evaluation of the site's
validity as an academic resource.
Was it found via a search conducted through a search engine? Unlike library
databases, the accuracy and/or quality of information located via a search
engine will vary greatly.
The domain of a particular website can be decoded through the Universal Resource
Locator (URL), or Internet address. The origination of the site can provide indications
of the site's mission or purpose. The most common domains are:
15
.net: A site from a network organization or an Internet service provider.;
.il.us : A state government site, this may also include public schools and
community colleges.
.uk (United Kingdom) : A site originating in another country (as indicated by the 2
letter code).
Authority
Who is responsible for the site. Therefore, you look out for information on the
author of the site. This is because on the Internet anyone can pose as an
authority.
Does the author have an affiliation with an organization or institution?
Does the author list his or her credentials? Are they relevant to the information
presented?
Is there available feedback facility- mailing address, telephone numbers and e-
mail address to contact the author?
Is the material available in other forms- print, CDs .
Accuracy and Objectivity
There are no standards or controls on the accuracy of information available via the
Internet.
The Internet can be used by anyone as a sounding board for their thoughts and
opinions.
Does the author support his idea by citing from other sources?
Determine the nature of the article. e.g. scholarly articles must include
citations and bibliographies.
Are the citations and bibliographies complete to find the original sources.
Compare the page to related sources, electronic or print, for assistance in
determining accuracy.
Does the page exhibit a particular point of view or bias? E.g. The implication
of legalizing abortion.
Is the site objective? Is there a reason the site is presenting a particular point
of view on a topic?
Does the page contain advertising? This may impact the content of the
information included. Look carefully to see if there is a relationship between
16
the advertising and the content, or whether the advertising is simply
providing financial support for the page.
Do you have to have a pin or credit card to proceed?
Free from grammar and spelling errors.
Are there links to other resources on the site?
Do links take you to "Not found" pages, out of date or irrelevevant pages?
Currency
This is both an indicator of the timeliness of the information and whether or not the
page is currently maintained. The following questions may be asked to determine
currency of the site.
Is the information provided current?
When was the page created?
Are dates included for the last update or modification of the page?
Are the links current and functional?
The ease of use of a site and its ability to help you locate information you are looking
for are examples of the site's functionality.
Is the site easy to navigate? Are options to return to the home page, tops of
pages, etc., provided?
Is the site searchable?
Is there a site map or table of contents?
17
INFORMATION RETRIEVAL
Information retrieval performance evaluation
• "Recall" and "Precision" are two classic measures to measure the performance
of information retrieval in a single query.
• Both assume that there is an answer set of documents that contain the answer
to the query.
• Performance is optimal if
– the database returns all the documents in the answer set
– the database returns only documents in the answer set
• Recall is the fraction of the relevant documents that the query result has
captured.
• Precision is the fraction of the retrieved documents that is relevant.
Evaluation
• Why Evaluate?
• What to Evaluate?
• How to Evaluate?
Why Evaluate?
• Determine if the system is desirable
• Make comparative assessments
• Test and improve IR algorithms
What to Evaluate?
• How much of the information needed is satisfied.
• How much was learned about a topic.
• Incidental learning:
– How much was learned about the collection.
– How much was learned about other topics.
Relevance
• In what ways can a document be relevant to a query?
– Answer precise question precisely.
– Partially answer question.
– Suggest a source for more information.
– Give background information.
– Remind the user of other knowledge.
– Others ...
• How relevant is the document
– for this user for this information need.
• Subjective, but
• Measurable to some extent
– How often do people agree a document is relevant to a query
• How well does it answer the question?
– Complete answer? Partial?
– Background Information?
– Hints for further exploration?
What to Evaluate?
What can be measured that reflects users’ ability to use system? (Cleverdon 66)
– Coverage of Information
18
– Form of Presentation
– Effort required/Ease of Use
– Time and Space Efficiency Effe
– Recall ctiv
• proportion of relevant material actually retrieved enes
– Precision s
• proportion of retrieved material actually relevant
Precision
• In the field of information retrieval, precision is the fraction of retrieved
documents that are relevant to the search:
• Precision takes all retrieved documents into account, but it can also be
evaluated at a given cut-off rank, considering only the topmost results returned
by the system. This measure is called precision at n or P@n.
• For example for a text search on a set of documents precision is the number of
correct results divided by the number of all returned results.
• Note that the meaning and usage of "precision" in the field of Information
Retrieval differs from the definition of accuracy and precision within other
branches of science and technology.
Recall
• Recall in information retrieval is the fraction of the documents that are relevant
to the query that are successfully retrieved.
• For example for text search on a set of documents recall is the number of
correct results divided by the number of results that should have been returned
19
• In pattern recognition and information retrieval, precision (also called positive
predictive value) is the fraction of retrieved instances that are relevant, while
recall (also known as sensitivity) is the fraction of relevant instances that are
retrieved. Both precision and recall are therefore based on an understanding
and measure of relevance.
• Suppose a program for recognizing dogs in scenes identifies 7 dogs in a scene
containing 9 dogs and some cats. If 4 of the identifications are correct, but 3
are actually cats, the program's precision is 4/7 while its recall is 4/9. When a
search engine returns 30 pages only 20 of which were relevant while failing to
return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall
is 20/60 = 1/3.
• Precision can be seen as a measure of exactness or quality, whereas recall is a
measure of completeness or quantity.
• In simple terms, high recall means that an algorithm returned most of the
relevant results, while high precision means that an algorithm returned
substantially more relevant results than irrelevant.
Get as much good stuff while at the same time getting as little junk as
possible.
20
PLAGIARISM
What is Plagiarism?
Plagiarism is the act of taking another person’s words, ideas or statistics and passing
them off as your own. The complete or partial translation of a text written by
someone else also constitutes plagiarism if you do not acknowledge your source. It
can also be described as an act made when a writer deliberately uses someone
else’s language, ideas or other original material without ac acknowledging its source.
Since we cannot always be original it is entirely acceptable to present another
person’s ideas in your work. However, it must be done properly to avoid plagiarism.
What is Unacceptable
Research has shown that technology Other than the first four words, the text
has been instrumental in increasing has been copied word for word from the
industrial and agricultural production, original document without any quotation
improving transportation and marks that would indicate that the
communications, advancing human passage
health care and overall improving many is a quote.
aspects of human life. However, much • The source you are using is not cited.
of its success is based on the
availability of land, water, energy, and
biological resources of the earth.
Research has shown that the Even though you mention your source,
advancement of technology has been you use many of the author’s words
instrumental in increasing industrial without quotation marks.
and agricultural production, improving
transportation and communications,
health care and overall many aspects
of human life. (Pimental, 1998)
Research has shown that the • Though most of the words have been
advancement of science has been changed, the sentence structure has
beneficial to the areas of agricultural remained the same.
and industrial production and • This is paraphrasing without indicating
communication and transportation the original source.
fields.
Furthermore, science has greatly
21
improved health care and is the prime
factor in a higher standard of life for
many people.
What is Acceptable
In his article on the effects of The author has been acknowledged, and
population growth on the environment, the quoting technique which has been
Pimental argues that “technology has used is adequate since this is an Internet
been instrumental in increasing source.
industrial and agricultural production, However, when you quote a printed
improving transportation and source
communications, advancing human (book, journal, etc.), be sure to include
health care and overall improving many the
aspects of human life. However, much page numbers.
of its success is based on the
availability of land, water, energy, and
biological resources of the earth”
(1998).
According to Pimental, “technology has You have properly quoted and
been instrumental in increasing paraphrased
industrial and agricultural production, the author.
improving transportation and
communications, advancing
human health care and overall
improving many aspects of human life”
(1998). He cautions, however, that
technological progress is dependent on
natural resources.
Consequences of Plagiarism
The consequences of plagiarism can be personal, professional, ethical, and legal.
With plagiarism detection software so readily available and in use, plagiarists are
being caught at an alarming rate. Once accused of plagiarism, a person will most
likely always be regarded with suspicion. Ignorance is not an excuse. Plagiarists
include academics, professionals, students, journalists, authors, and others.
22
barred from entering college from high school or another college. Schools, colleges,
and universities take plagiarism very seriously. Most educational institutions have
academic integrity committees who police students. Many schools suspend students
for their first violation. Students are usually expelled for further offences.
Legal Repercussions
The legal repercussions of plagiarism can be quite serious. Copyright laws are
absolute. One cannot use another person’s material without citation and reference.
An author has the right to sue a plagiarist. Some plagiarism may also be deemed a
criminal offense, possibly leading to a prison sentence. Those who write for a living,
such as journalists or authors, are particularly susceptible to plagiarism issues. Those
who write frequently must be ever-vigilant not to err. Writers are well-aware of
copyright laws and ways to avoid plagiarism. As a professional writer, to plagiarize is
a serious ethical and perhaps legal issue.
Plagiarized Research
Plagiarized research is an especially highest form of plagiarism. If the research is
medical in nature, the consequences of plagiarism could mean the loss of people’s
lives.
23
Assessing Relevance of Search Results with Recall and Precision Ratios
Introduction
The Web has become an ocean of information and resources, which is growing rapidly larger
every microsecond. It has grown from an esoteric system used by a small community of
researchers to the now most used system for obtaining information for billions of digital citizens
or “digizens”. A growing proportion of such people are also digital natives (i.e young people
born only since the digital era began about three decades ago).
The Web is both a huge database of web pages you can search through, as well as a gateway you
can use to get to and search the information systems or databases of various organizations for
information. Such information systems or databases include those of online stores like Amazon
and Jumia, app stores like Google Play or Samsung Galaxy, as well as the online public access
catalogues (OPACs) of libraries. Outside of the Web, people often directly search the information
systems or databases of organizations in which they are staff, students, customers, or permitted
visitors. On the Web however, search engines such as Google or Yahoo often do the searching of
these other systems and databases for people, thereby saving them from having to search these
other systems themselves.
Many people have never encountered, and thus have no interest in the issues and challenges of
retrieving information from such databases (Oppenheiem, et al., 2000). However, all information
found on the Web through search engines or directly from other information systems or
databases usually need to be evaluated and filtered, as it may include plenty of non-relevant
information. The Web surfer may not be aware of the many available search engines that can be
used to get information on a topic, and may use different search strategies, some of which might
not be effective (Kumar and Prakash, 2009).
The basic changes that a searcher faces when searching the Web, other information systems or
databases can be stated in the form of the following questions, which we often ask ourselves
when we search the Web or database: =====researcher prerequisite questions
(a) Which search engine or database would get me quickly the best search results that
include only or mostly relevant information and also exclude all or most of the irrelevant
information?
(b) What search expression (or query) comprising important words, terms, names, URLs,
etc) best describes my information need that I should input to the search engine or
database?
(c) How can I determine quickly which items in the initial search results provided by the
search engine are most relevant to my needs?
(d) How many items or pages of the search results should I look at before determining if the
results are excellent, good, fair, poor or adequate for my needs?
(e) If the initial search results are not adequate, how do I revise or refine my initial query to
get better subsequent search results?
(f) When should I end the search, satisfied or frustrated?
We all ask and attempt to answer these questions in our minds when we search. Accordingly,
searching is usually a multistep and iterative process. As an iterative process, an initial query
may not do too well initially, and may need to be improved which may entail using new words
and terms identified from the previous search result(s), in order to improve the relevancy of
items in the obtained final search result.
Accordingly, search process includes the following steps: ====== search process steps
Define the search request (i.e. describe the information need) as precisely as possible
1. Choose an appropriate information resource (search engine, full test or bibliographic
database(s), library catalogues, document repository)
2. Identify and list relevant search terms derived from your search request that you would
use for searching.
3. Modify the search terms to suit the chosen information resource (i.e. use the vocabulary
dictionaries of each information resource to get equivalent or other terms used by the
resource)
4. Combine the modified or augmented search terms to create a search query
5. Run your initial search
6. Evaluate your initial search to determine how good the search results is, but examining
some of the retrieved items in the search result
7. Modify your search query based on the previous results and run new searches.
8. Copy, paste and save selected search results in a file or reference management system (A
reference management system is a software which provides facilities for organizing,
and storing the bibliographic details and content of sources you desire to use later).
You should bear in mind the following points. Firstly, step 6 above requires that you evaluate
each search result, usually while still working with the search engine, in order to determine how
good each search result is overall, and to also determine which items in each search result you
need to copy, paste and save in Step 8. Secondly, the quality of the search results provided by a
search engine or information system depends critically on (a) what you yourself do in steps 1 to
6, as well as how good the search engine is in matching terms in your search queries with words,
terms, and phrases in its database. Often, because search engines and their databases have been
researched and built to be effective and efficient when searching as much as possible, the
quality of what you get often depends on you!
Let us start from the beginning of the search process by considering step 1 above, which requires
that you define or describe your information need with appropriate words simply, precisely and
adequately. For example, consider an information need implied by this question: What are the
effects of e-books on tertiary education students? Five different key words or concepts can be
drawn out from the question which are: effects, e-books, tertiary, education, students. Then, in
step 2 you need to identify other terms that are synonyms of the initial concepts, as shown in
Table 1. In step 3, you need to find out the terms actually used by the chosen information system,
which might or might not be the same as the initial or synonym concepts. Usually, the best
search result is obtained when a searcher uses the same terms to search as the terms that
were used by a search engine or information system when it was indexing its resources.
Finally, steps 4 to 7 also depend on you – how you combine the concepts in the initial and
subsequent queries, and how you evaluate the initial and subsequent search results. In a nutshell,
in order to obtain the best search results, all the steps in the search process must be well
conducted.
Although what you get usually depends on you, various yardsticks or metrics have been
researched and recommended for use to evaluate the search results performance of search
engines and other information systems. The rest of this chapter examines and explain the most
common of these metrics.
The earliest suggested and most commonly mentioned yardsticks are known as recall and
precision ratios. Recall is the ratio of the number of retrieved relevant records to the total
number of relevant records in the database. It is expressed as a percentage %. Precision is the
ratio of the number of retrieved relevant records to the total number of both irrelevant and
relevant records retrieved. It is usually expressed as a percentage %. A simple yet good illustration
of the ideas of recall and precision is via following possible real life usage of recall and precision.
Imagine that, your girlfriend gave you a birthday surprise every year in the last 10
years. However, one day, your girlfriend asks you “Sweetie, do you remember all
birthday surprises from me?” This simple question is likely to be tough to answer
because you need to recall all 10 surprising events from your memory.
Let us suppose your girlfriend has a particular set of 10 surprises in her mind which is what she
wants to be told or her information need.
Recall ratio is the number of events (surprises) you can correctly recall divided by the number
of all the correct events (that she wants you to recall). Recall ratio means total event she has
in mind. It measures how effectively you are able to recall correct events out of the total correct
events (i.e. the particular set of surprises she has in mind).
So, (1) if you can recall all 10 events correctly, then, your recall ratio is 10 / 10 = 1.0 (or 100%),
while (2) if you can recall only 7 events correctly, your recall ratio is 7 / 10 = 0.7 (70%).
Precision ratio is the number of events you can correctly recall divided by the number of all
events you are able to recall (usually comprising a mix of correct and wrong answers). In other
words, the precision ratio measures how precise and efficient your recall efforts are.
Suppose that in example (1) above you made exactly 10 attempts in getting the 10 correct events.
Then your precision ratio is 10 correct recalls divided by 10 recall attempts, which is also 1.0
(100%). However, in example (2), you also made 10 recall attempts, but got only 7 correct
recalls. So, the preciseness or efficiency of recalling the events is the 7 correct recalls divided by
10 recall attempts, which is 0.7 (70%).
Suppose you can actually recall many surprise events some of which correctly in the last ten
years, while the others were not. Suppose (3) you eventually told her 16 events in 16 recall
attempts, out of which only 8 events are among the particular 10 events she has in mind. In that
case, your recall ratio is the 8 correct events out of the 10 she has in mind, which is 8 / 10 =
0.8 (80%). Your recall ratio improved by 10%, but only after six more attempts beyond scenario
(2) above. You improved your recall ratio by 10%, which means that your effectiveness in
recalling correct events improved by 10%, but at the cost of 6 (60%) more attempts.
Would you say you are becoming more precise or efficient in example (3)? Actually your
precision ratio in example (3) is only 8 correct events out of 16 recalled events, which is only
8 / 16 = 0.5 (50%). Your recall ratio improved by 10%, but your precision ratio decreased
by 20%. So you have become more effective at recalling correctly, but less efficient in doing
so.
Mathematically,
1
Recall =
Precision
Which you can and should confirm this using the recall and precision ratios calculated in
examples (1) to (3) above. Recall ratios range from 0 to 1, likewise precision ratios. The inverse
or tradeoff relationship that exists naturally between them for search results which is provided by
an information system in response to different queries that users provide is illustrated in Figure 2.
In the figure, the two distinct lines may represent the recall - precision graphs of two search
engines or information systems. While the exact slope of the curve may vary between systems,
the general inverse relationship between recall and precision remains for every information
system.
A system can increase its ability to recall by returning more documents; because the
recall ratio is a non-decreasing function of the number of documents retrieved (a non-decreasing
function always rises or stays at same level). A system that returns all documents in its
database for a query will surely have 100% recall of all the relevant items in its database!
But the precision ratio of such response to the query will be very low, due to the likely higher
number of non-relevant items returned along with the relevant items. The converse is also true,
as it is possible for a system to aim for high precision, but at the cost of very low recall of
relevant items from its database.
This naturally occurring inverse relationship between precision and recall ratios forces
information systems designed for general use to go for compromise between them. But, in real
life, some information search tasks particularly need good precision, whereas others need good
recall.
A combined measure that measures simultaneously the recall (R) and Precision (P) performance
of the search results from an information system is the F score (weighted harmonic mean). The
F-score is a measure which combines both recall and precision measures using a weighting
factor α, where high α means that precision is more important.
1 ( β 2+ 1 ) PR
=
F= 1 1 β 2 P+ R … (1)
α +(1−α)
P R
The harmonic mean is a very conservative average. People usually use balanced F1 measure.
1
i.e. with β = 1 (that is, α = 2 ) … (2)
2 PR
F = ( P+ R) …(3)
Concept of Relevance
Relevance is assessed relative to an information need, not a query. An information need differs
in respect to Information seeking behavior, for example, information need might be on
whether studying core IT related courses are more versatile at ameliorating book haram
syndrome than any pure science courses. This might be translated into a query such as: IT AND
courses AND science AND book haram AND syndrome AND versatile. A document is
considered relevant if it addresses the stated information need, not because it houses exact
words in the query. This distinction is often misconstrued in practice, because the information
need is not overt, despite the fact that, an information need is present.
An illustration goes thus, if a user types “Job” into a web search engine, he might intend to
search for available employment or story about Prophet Job in Abrahamic religion. From a word
query, it is very uneasy for a system to know what the precise information need is. But, the
information user has one, who could solely filter the returned results on the basis of their
relevance to the information need. Therefore, to evaluate a system, an overt expression of an
information need is principally required, which can be used for assessing returned documents as
being relevant or non-relevant. At this point, simplicity is made: relevance could be thought of as
a scale, with some documents highly relevant and others contrary.
Mathematically,
Accuracy
TP+TN
Accuracy =
TP+TN + FP+ FN
Accuracy has been seen in literature as not a useful measure for web information retrieval,
this is as a result that only a small fraction of documents in IR system collection are
relevance, (i.e TN >> TP), even if there is a good IR system which only retrieve relevant
documents, the accuracy between this good IR system with a poor system (such as always return
nothing) is small, thus this measurement can’t help us evaluate IR system (Schellekens, 2012).
Term Frequency (TF) – i.e the frequency of occurrence of query keyword in a particular
document
Inverse Document Frequency (IDF) – i.e the number of documents
where the query keyword occurs in, for fewer documents give more importance to
keyword and vice versa.
Hyperlinks to documents – i.e the more the links to a document the worthier its
importance
Relevance ranking based on Term Frequency and Inverse Document Frequency (TF/IDF)
Term Frequency (TF) is the determinant of the degree of relevant documents to a query.
Therefore, use of term frequencies makes “spamming” easy. Below are the means of having TF
in documents.
n(d , t)
TF (d, t) = log (1 + n( d) ) (4)
The log factor in (4) is to prevent excessive weight to the frequent terms
In such essence, each page gets a hub status regarding the authorities prestige it points towards,
while each page gets an authority status regarding the hubs prestige it point towards.
Calculations
Two Information retrieval systems, A and B, are to be compared. Provided both are given the
same query which is applied to a collection of 1000 documents. The result obtained showed that
System A returns 420 documents, of which only 50 are relevant to the query, while System B
returns 90 documents, of which only 25 are relevant to the query. 80 documents are relevant to
the query within the whole collection.
Tabulate the results for each system, and compute the following:
Recall;
Precision;
Accuracy; and
F score for both A and B.
Solution
TP TP
System A’s Recall = TP+ FN System A’s Precision = TP+ FP
50 50
System A’s Recall = = 0.625 = 62.5% System A’s Precision = =
80 420
0.119 = 11.9%
TP TP
System B’s Recall = TP+ FN System B’s Precision = TP+ FP
25 25
System B’s Recall = = 0.313 = 31.3% System B’s Precision = =
80 90
0.278 = 27.8%
TP+TN
System A’s Accuracy = TP+TN +FP+ FN
TP+TN
System B’s Accuracy = TP+TN + FP+ FN
25+855 880
System B’s Accuracy = = = 0.88 = 88%
25+ 855+65+55 1000
2 PR
System A’s F = (P+ R)
2 × 0.119 ×0.625 0.149
System A’s =
(0.119+ 0.625)
= 0.744
= 0.2
2 PR
System B’s F = ( P+ R)
2 × 0.278× 0.313 0.174
System B’s =
(0.278+0.313)
= 0.591
= 0.294
Do-It-Yourself Exercise
Assume a database contains 800 records on a particular topic, a search was conducted on that
topic and 620 records were retrieved, of the 620 records retrieved, 405 were relevant. Calculate
the precision, recall, accuracy, and F scores for the search
Conclusion
The two fundamental IR evaluation measures are Precision and Recall. Both are the foundations
for many other developed metrics because of their easier understanding by all information users.
To the practitioner’s view, these two evaluation measures are essential because they lead to more
intuitive resolutions such as, the time spent by people in reading worthless documents (low
precision), or the number of relevant documents being missed (low recall). This is to buttress the
fact that recall is inversely proportional to the number of relevant documents per topic.
Both precision and recall are to be addressed more considerations when evaluating retrieval
systems. It is not sufficient to pick one at the expense of the other; this is by the virtue that
dependence on just one of the duo can lead to extreme but unhelpful solutions. For example, a
system that returns every document indiscriminately has 100% recall; while one that returns only
a single correct document is 100% precise. As information retrieval systems, the former is no
plausible at all, and the latter is not much better. From the analysis, it could be observed that we
use precision, recall, and F for evaluation, but not accuracy.
References
EDUCATION BANKING
ICT
INDUSTRY business
To find useful info E-learning
Eg: Internet
1.EDUCATION
To manage books
Eg: library automation
system
The Role of ICT in our day to day life.
E-Learning
Student and lecturer can communicate to each
other if there is something problem or have to
make discussion no matter how far the distance
to each other.
Internet
We have an internet to get more information
about our learning.
The Role of ICT in our day to day life.
Online TO withdraw
Banking or check
2. BANKING money eg :
Eg : ATM Machine
Maybank2u.com
The Role of ICT in our day to day life.
E-Banking
- Online services such as transfer money and pay
bill online (myBank2U.com).
ATM Machine
- to withdraw and to transfer money
The Role of ICT in our day to day life.
Automobile manufacturing
Industry using robotic
Eg : factory
3. INDUSTRY
E-commerce.
Buying and
selling
something
from the
internet
4. COMMERCE
Eg : online
payment
Advertising.
Eg : billboard,
magazine
USAGES OF ICT IN EVERYDAY LIFE