0% found this document useful (0 votes)

8 views

UNIT_2-PART_1

Uploaded by

teamkiller334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

UNIT_2-PART_1

Uploaded by

teamkiller334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT II CATALOGING AND INDEXING SUBTOPICS PART I

History and objectives of indexing History

Objectives
Indexing process Scope of indexing
Pre-coordination and linkages
Automatic indexing
Information extraction
Cataloguing and indexing:

Indexing :

 Indexing in information retrieval systems is like creating a quick-reference guide (or) a map for a
huge collection of information such as a library of books (or) a database of documents
(or)
 It is the process of building a map of words and documents to make search faster and more
accurate

 Indexing makes searching fast and efficient

Ex :

Books example :

 If you have 100 books and want to find ones about “ dinosaurs ” and you would look at an index
in the back of the book
 The index might list
 “ Dinosaurs ” – Book 1 : Page3, Page 45
 “ Dinosaurs ” – Book 5 : Page 12, Page 89

Digital example :

 In a search engine, when you type “ dinosaurs ”, the system uses its index to find all the web
pages that mention “ dinosaurs ” and shows you a list of results in seconds
History and objectives of indexing :

History :

 Indexing is also called as cataloguing

 It is one of the oldest methods used to help people to find information
 The goal of indexing is to create access points like keywords (or) topics and that will help users
find what they are looking for in a collection of items such as books (or) documents

Historical background :

Old methods :

 For centuries, indexing was done manually

 Librarians (or) professional indexers would create cards with details about each book such as
title, author and subject
 These cards were then organized in card catalogs to help users in finding the books on specific
topics

Dewey decimal system :

 In the late 1800s, a more structured way to index books like the Dewey decimal system was
introduced
 This system organized books into categories and subcategories making it easier to find them

MARC system :

 In the 1960s, computers began to assist with cataloguing through the MARC ( Machine
Readable Cataloguing ) system, which standardized how information about books was stored
electronically
 This made it easier to share catalogs between libraries

DIALOG system :

 In 1965’s a system called DIALOG was developed for NASA and later it became a commercial
indexing system, allowing users to search technical publications
Changes with technology :

Manual to digital :

 Originally, indexing was done manually with indexers choosing keywords to represent the
content of books
 With the advent of computers, this process became digital, allowing librarians to share and
manage indexes more efficiently

Full-text search :

 By the 1990s, with decrease in computer costs and availability of full-text documents (like digital
books and articles), the role of human indexers changed
 Instead of relying only on manually chosen keywords, users could search full text of documents
directly
 This means users can now search for any words (or) phrase within the entire content not just in
the index, making searches more accurate

Ex :

 Suppose a library where books are stored on shelves

 In the past, if you wanted to find a book on “ dinosaurs ” you would go to the card catalog look
under “ D ” for “ dinosaurs ” and that card would tell you which shelf to find the book on
 Librarians would have written down the subject “ dinosaurs ” on the card
 With modern technology, instead of just looking at a card with the word “ dinosaurs ” and you
could now type “ dinosaurs ” into a computer and it would search the full text of every book to
find any that mention dinosaurs even if “ dinosaurs ” was not listed as a subject by the librarian

Objectives :

 The objectives of indexing has evolved with advancements in information retrieval systems
 Traditionally, manual indexing involved selecting a few key terms (index terms) from a
controlled vocabulary that represented the main topics of a document
 This helped standardize searches and make finding information easier
 However, with modern IRS, entire documents are searchable, meaning every word in a
document can potentially be an index term
 This is known as “ total document indexing ”
 In this system every word in a document is considered when searching for information, making
it easier to find specific content within documents
Indexing process :

 There are different types of index files in an information retrieval system

 The different index files is shown in the below figure

Document file :

 It includes all the words and content with in a document

 Every word in the document is part of the total document file
 This is the broadest and most detailed level of indexing

Public index file :

 It contains important keywords (or) concepts that are relevant to the general public
 These keywords are selected to help everyone find the most important topics in the document
 It is more focuses than the document file because it only includes important terms

Private index file :

 It is even more focused and specific

 It includes only those concepts and keywords that are important to a particular user (or)small
group of users
 It’s like a personalized subset of the public index file that reflects individual needs (or) interests

How they relate :

 Document file contains everything, so it overlaps with both the public and private index files
 Public index file overlaps the document file because it includes some of the key topics but it is
smaller and more focused
 Private index file is the most specific and only overlaps with the document file in areas that
match your personal interests
Ex :

Document file (Entire library collection) :

 Suppose the library has every book on cooking, gardening, science and more
 This collection is like document file containing all possible information

Public index file (General guide for all users ) :

 The library creates a general guide for visitors, highlighting popular sections like “ cooking books
”, “ gardening books ” and “ science books ”
 This guide is like the public index file, helping most people find the major topics they are
interested in

Private index file (Personalized guide for specific needs) :

 Now, let’s say you love Italian cooking

 You create your own list that focuses only on “ Italian cooking books ”
 Private index can access you only and other users will not access that private index

Scope of indexing :

 When creating an index for documents (or) articles, the process involves deciding which key
terms( words or concepts ) best represent the content of the text
 This can be tricky, especially when done manually, because both the person writing the
document (author) and the person creating the index (indexer) might use different words to
describe the same ideas
 The indexer may also not fully understand some of the specialized topics in the document,
leading to differences in how well the document is indexed

Key concepts in manual indexing :

Exhaustivity :

 How many different topics (or) ideas from a document are included in the index

Ex :

 Suppose a book about animals

 If you index only “Mammals” and “birds” you have low exhaustivity
 But if you include “Mammals”, “Birds”, “Fish” and “ Insects” then you are using high exhaustivity
because you are covering more topics from the book
Specificity :

 How detailed (or) precise the index terms are

Ex :

 In the same book about animals, using “ Animals ” as an index term is low specificity
 But if you use more specific terms like “ Elephants ”, “ Sparrows ” and “ Lizards ” then you are
using high specificity because the terms are more detailed

Note :

 Exhaustivity – How many topics you cover

 Specificity – How detailed the terms you use are

Weighting index terms :

 Assigning importance to each index term based on how much it is important in the document

Ex :

 If a document heavily discusses “ Pentium processors ” the indexer might give the term “
Pentium ” a higher weight
 This helps search systems prioritize documents that are more relevant to that concept

Note :

 Suppose you have an article about “ healthy eating ”

 The article mentions “ Vegetables ” many times and “ Fruit ” a few times
 If you are indexing this article
 Vegetables might get a higher weight (example a score of 10) because it is
discussed in depth and is a key part of the article
 Fruit might a lower weight ( example a score of 5) because it is mentioned but
not as thoroughly discussed
 The use of weighting index terms is to help a search system understand which topics (or)
concepts in a document are more important than others and it also makes searches more
accurate and relevant
Pre-coordination and linkages :

 In indexing, a decision needs to be made about whether (or) not to create linkages between
related terms that describe the same concept
 This is important to ensure that terms connected to the same idea are grouped together

Pre-coordination : (Linking at index creation)

 When you create the index, you decide how related terms are linked together
(Or)
 This is when related index terms are linked together at the time of indexing (before searching)

Ex :

 Suppose you create an index entry that links both the animal and the habitat (natural
environment) together
 Suppose your index entry is like “ Elephant – Africa ”
 Here “ Elephant ” and “ Africa ” are linked together in the index at the time of indexing
 If someone searches for “ Elephant Africa ” then system will find entries where both terms are
linked and Finally it return books specifically about elephants in Africa

Post coordination : (Linkage at search time)

 The system only links terms together when you search, usually by combining terms with “ AND “
(Or)
 This happens when terms are combined at the search stage, not during indexing

Ex :

 If you search for “climate change” AND “carbon emissions” the system looks for documents that
have both terms
 They are not linked at the indexing stage but only when the user makes the search

Factors in linkage process :

Number of terms linked :

 How many terms you can link together

Ex :

 You might link “ CITGO ”, “ Oil drilling” and “ Mexico ” all together to specify where the drilling is
happening
Ordering constraints :

 Whether the order of the terms matters

Ex :

 If you have “ CITGO ” first and “ oil drilling ” second, it might suggest a different meaning than if
they were reversed

Additional descriptors :

 Extra information about the terms

Ex :

 You might add descriptors like “ source ”, “ problem ” (or) “ affected ” to clarify the roles of each
term in the context

Note :

 These are extra pieces of information added to the index terms to provide more context (or)
detail
 Suppose index term is “ Gardening – Beginners ” and additional descriptors is “ How-To-Guide”
 Suppose a user searches for “ Gardening – Beginners – How – To –Guide ”
 The system will find and display “ Gardening Basics ” because it matches the combined term and
additional descriptor

Positional roles :

 Each term’s position in the list indicates its role

Ex :

 For “ CITGO introduces oil refineries in Peru, Bolivia and Argentina “ you would list
 Entry 1 : “ CITGO introduces oil refineries in Peru ”
 Entry 2 : “ CITGO introduces oil refineries in Bolivia ”
 Entry 3 : “ CITGO introduces oil refineries in Argentina ”
 Each entry has a fixed role based on its position

Modifiers :

 Extra words added to terms to describe their role

Ex :

 Instead of separate entries, you could have one entry like “CITGO introduces oil refineries in
Peru, Bolivia, Argentina” with modifiers indicating the affected countries
Note :

 Suppose you are taking an index term like “ Smartphone – Latest model ” and modifier as “
Advanced features ”
 Suppose a user searches for “ Smartphone – Latest Model ” with the modifier “ Advanced
Features ”
 The system finds and displays products that match both the main term and the modifier
 For example, “ Smartphone – Latest Model with Advanced Features ”

Automatic indexing :

 When a computer system automatically selects the keywords (or) index terms that best
represent the main ideas of a document, without human intervention
 Automatic indexing helps to organize and retrieve documents quickly by choosing keywords
automatically
 It can do this by treating all words equally (un weighted) (or) by giving more importance to
certain words (weighted)
 This makes searching for documents more efficient, especially in large database

Ex :

 Suppose you have a document about “ The benefits of exercise for mental health ”
 Now automatic indexing system would scan the document and might choose keywords like
 “ Exercise ”
 “ Mental Health ”
 “ Benefits ”
 These keywords help you find the document later when you search for related topics

Weighted indexing :

 The system gives more importance (weight) to keywords that appear more frequently (or) are
more central to the document’s main ideas

Ex :

 In a document where “ Mental health ” is mentioned many times and “ Exercise ” only a few
 “ Mental Health ” be given a higher weight, meaning it’s considered more important in
representing the document’s content
Unweighted indexing :

 All keywords are treated equally, without trying to figure out which ones are more important

Ex :

 If a document mentions “ Exercise ” once and “ Mental Health ” ten ties, both keywords would
still be treated the same

Note :

 Automated indexing – The system scans the document and automatically identifies key terms to
include in the index
 Weighted indexing – The systems assigns a weight to each term based on its frequency (or)
importance in the document
 The system includes all extracted terms in the index without considering their importance (or)
frequency

Ex :

 The automated indexing system scans the document and extract terms like “healthy”, “eating”,
“nutrition” and “diet”
 Weighted indexing – The term “eating” appears frequently in the document, while “diet” is
mentioned less often. The weighted index might look like Eating(Weight-0.8), Healthy(Weight-
0.5), Nutrition(Weight-0.4), Diet(Weight-0.3)
 The weights help indicate the relative importance of each term, making searches more precise
 Unweighted indexing – In the index, you might see Healthy, Eating, Nutrition, Diet
 All terms are treated equally, so there is no differentiation between how frequently (or)
significantly each term appears in the document

Advantages of automatic indexing :

Cost and speed :

 After the initial setup, automatic indexing is cheaper and faster than using humans to do the
work

Ex :

 It might take a human several minutes to read and choose keywords for a document but a
computer can do it in seconds
Consistency :

 The computer follows the same rules every time, so the keywords it picks are consistent

Ex :

 If two different human indexers read the same document, they might choose slightly different
keywords
 But an automatic system will pick the same keywords every time

Information extraction :

 Information extraction is the process of automatically identifying and pulling out important
details (or) facts from a large amount of text and organizing them in a structured format like
table (or) database
 Information extraction uses two key processes are fact extraction and summarization
 Fact extraction and summarization are used for extracting important information from a text,
but they do it in different ways

Fact extraction :

 Fact extraction is like picking specific details (or) facts from a document to update a database
 The goal is to find important information based on set criteria, without trying to understanding
the entire document
 The system looks for specific types of information like names, dates, places and so on

Ex :

 Imagine a news article about a company’s CEO changing

 The system’s job to fill certain slots in a database
Company name: XYZ Corporation
Old CEO: John
New CEO: David
Date: September 8th 2024
 The system only looks for these specific facts and fills them into the correct slots in the database
 It does not care about the rest of the article that might talk about the CEO’s background (or) the
company’s future plans

Summarization :

 Summarization involves taking a long document and creating a shorter version (summary) that
captures the main ideas
 The is more complex because it needs to consider all the major concepts, not just specific facts
Ex :

 Imagine you have a long research paper about climate change

 Summarization would extract the key points such as
 Climate change is causing rising sea levels
 Global temperatures are increasing
 Urgent action is needed to reduce carbon emissions
 This summary gives you the essential ideas of the research paper without having to read the
entire document

Note :

 In this context pulling out means extracting (or) selecting specific pieces of information from a
larger text
 Summarization – Highlighting

IRS Notes
No ratings yet
IRS Notes
40 pages
Book Indexing: A Step-by-Step Guide
From Everand
Book Indexing: A Step-by-Step Guide
Stephen Ullstrom
No ratings yet
Lis-311 Indexing and Abstracting: Lecture On
No ratings yet
Lis-311 Indexing and Abstracting: Lecture On
72 pages
Unit-Ii: Cataloging and Indexing
100% (3)
Unit-Ii: Cataloging and Indexing
13 pages
API Pentesting Mindmap ATTACK
No ratings yet
API Pentesting Mindmap ATTACK
1 page
IRSUnit 2
No ratings yet
IRSUnit 2
21 pages
Unit-2 Irs
No ratings yet
Unit-2 Irs
28 pages
Irs Unit Ii Part 1
No ratings yet
Irs Unit Ii Part 1
16 pages
UNIT 2 IRS Up
No ratings yet
UNIT 2 IRS Up
42 pages
IRS Unit 2 by Krishna
No ratings yet
IRS Unit 2 by Krishna
39 pages
IRS Unit-2
No ratings yet
IRS Unit-2
37 pages
Indexing - Library Scinece
No ratings yet
Indexing - Library Scinece
92 pages
Development of Indexes Indexing
No ratings yet
Development of Indexes Indexing
22 pages
IRS_Unit_2
No ratings yet
IRS_Unit_2
15 pages
Index
No ratings yet
Index
40 pages
IRS unit-2
No ratings yet
IRS unit-2
63 pages
Indexing methods and tools (Week-10)
No ratings yet
Indexing methods and tools (Week-10)
5 pages
Unit - Ii Irs
No ratings yet
Unit - Ii Irs
10 pages
Aex 11
No ratings yet
Aex 11
5 pages
IRS Unit-2
50% (4)
IRS Unit-2
13 pages
IRS Cataloging and Indexing 2.1
No ratings yet
IRS Cataloging and Indexing 2.1
12 pages
Chapter 2
No ratings yet
Chapter 2
64 pages
Introduction To Indexing
No ratings yet
Introduction To Indexing
18 pages
Written Report - Indexing
No ratings yet
Written Report - Indexing
12 pages
types of index
No ratings yet
types of index
13 pages
II_1 Unit
No ratings yet
II_1 Unit
34 pages
REPORT
No ratings yet
REPORT
5 pages
Indexing and Abstracting
No ratings yet
Indexing and Abstracting
48 pages
Unit Twenty: Index: This Tennis A Latin Word That Means He Who or That.w, Hich
No ratings yet
Unit Twenty: Index: This Tennis A Latin Word That Means He Who or That.w, Hich
29 pages
User Centered Indexing
No ratings yet
User Centered Indexing
5 pages
Indexingand Abstracting Services
No ratings yet
Indexingand Abstracting Services
27 pages
Cataloging
No ratings yet
Cataloging
53 pages
IndexingandAbstractingServices
No ratings yet
IndexingandAbstractingServices
27 pages
Indexing Notes
No ratings yet
Indexing Notes
12 pages
IR chapter 2 class 1 (2)
No ratings yet
IR chapter 2 class 1 (2)
20 pages
Unit 2
No ratings yet
Unit 2
40 pages
IRS - Unit 2
No ratings yet
IRS - Unit 2
12 pages
Glenda Browne, Jon Jermey - The Indexing Companion (2007)
No ratings yet
Glenda Browne, Jon Jermey - The Indexing Companion (2007)
263 pages
UNIT II
No ratings yet
UNIT II
61 pages
Prepared By: Daryl L. Superio Central Philippine University Iloilo City, Philippines
No ratings yet
Prepared By: Daryl L. Superio Central Philippine University Iloilo City, Philippines
27 pages
Indexing: Indexing: Is The Process of Analyzing The Information
No ratings yet
Indexing: Indexing: Is The Process of Analyzing The Information
2 pages
Unit-Ii Notes
No ratings yet
Unit-Ii Notes
17 pages
Indexing The Manual Of Good Practice Pat F Booth pdf download
100% (1)
Indexing The Manual Of Good Practice Pat F Booth pdf download
77 pages
Indexing Pat F Booth download
100% (1)
Indexing Pat F Booth download
78 pages
Why Automate The Library?: - Efficiency - Productivity - Access
No ratings yet
Why Automate The Library?: - Efficiency - Productivity - Access
33 pages
DIFFERENT SEARCH MECHANISM
No ratings yet
DIFFERENT SEARCH MECHANISM
6 pages
Why Automate The Library?: - Efficiency - Productivity - Access
No ratings yet
Why Automate The Library?: - Efficiency - Productivity - Access
33 pages
Irs I
No ratings yet
Irs I
20 pages
indexing
No ratings yet
indexing
2 pages
raman on journal indexing
No ratings yet
raman on journal indexing
80 pages
Module_A__Sample2014-4
No ratings yet
Module_A__Sample2014-4
37 pages
L001
No ratings yet
L001
49 pages
Lesson 12
No ratings yet
Lesson 12
8 pages
Index: Back Matter Book Library Catalogue
No ratings yet
Index: Back Matter Book Library Catalogue
9 pages
BLI-223 EM GP_unlocked
No ratings yet
BLI-223 EM GP_unlocked
26 pages
IMD312 - Topic 9 - Index and Abstract
No ratings yet
IMD312 - Topic 9 - Index and Abstract
24 pages
Search engines
No ratings yet
Search engines
4 pages
Irs Unit III
No ratings yet
Irs Unit III
74 pages
Unit-3 Irs
No ratings yet
Unit-3 Irs
46 pages
UNIT I
No ratings yet
UNIT I
65 pages
Imc451 - Topic 6
No ratings yet
Imc451 - Topic 6
42 pages
The Wall Datasheet 200305 Web
No ratings yet
The Wall Datasheet 200305 Web
8 pages
Untitled
No ratings yet
Untitled
5 pages
14 Spring Boot Thymeleaf
No ratings yet
14 Spring Boot Thymeleaf
392 pages
I. Put The Tenses in This Dialogue in The Correct Form: Past Simple or Present Perfect
No ratings yet
I. Put The Tenses in This Dialogue in The Correct Form: Past Simple or Present Perfect
3 pages
PHP Documentation
No ratings yet
PHP Documentation
11 pages
Test Exam VXRAIL
No ratings yet
Test Exam VXRAIL
20 pages
Basic Concepts of C
No ratings yet
Basic Concepts of C
6 pages
Pranks
No ratings yet
Pranks
7 pages
Bc66-Opencpu: Low Power Management Application Note
No ratings yet
Bc66-Opencpu: Low Power Management Application Note
24 pages
Scheduling (eRAN3.0 06)
No ratings yet
Scheduling (eRAN3.0 06)
227 pages
Vectron Elektronik Manual Datasheet
No ratings yet
Vectron Elektronik Manual Datasheet
8 pages
Astro Bot
No ratings yet
Astro Bot
10 pages
WEB APPLICATION I-M.SC CS
No ratings yet
WEB APPLICATION I-M.SC CS
3 pages
M3CS2536B P2 MuhammadAriffArsyadBinChaizul 2019415886 PDF
No ratings yet
M3CS2536B P2 MuhammadAriffArsyadBinChaizul 2019415886 PDF
11 pages
T D Erwrrl) T T Oi: Q' Ful Qit N
No ratings yet
T D Erwrrl) T T Oi: Q' Ful Qit N
5 pages
Moses Jasper CV Updated-2
No ratings yet
Moses Jasper CV Updated-2
4 pages
Best Resume Format Ms Word
100% (1)
Best Resume Format Ms Word
6 pages
DEYE SUN-3.6-5K-SG03LP1-EU-user-manual-Ver1.6-Deye PDF
No ratings yet
DEYE SUN-3.6-5K-SG03LP1-EU-user-manual-Ver1.6-Deye PDF
36 pages
CorelDRAW Engraving Software Tutorial
No ratings yet
CorelDRAW Engraving Software Tutorial
13 pages
Operator Manual - E7229-0C - CA-270
No ratings yet
Operator Manual - E7229-0C - CA-270
342 pages
Student Login Process - Explorer 2.0 - WF
No ratings yet
Student Login Process - Explorer 2.0 - WF
6 pages
Hillcrest 1st Term Exam
No ratings yet
Hillcrest 1st Term Exam
34 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
AIX Performance Tuning VUG May2418
No ratings yet
AIX Performance Tuning VUG May2418
50 pages
Handbook On Predictive Maintenance Through Network of Data Loggers Ver2 PDF
No ratings yet
Handbook On Predictive Maintenance Through Network of Data Loggers Ver2 PDF
96 pages
Local Media1463388342140973093
No ratings yet
Local Media1463388342140973093
2 pages
Questionnaire For MS Excel 2016 (Part 4)
No ratings yet
Questionnaire For MS Excel 2016 (Part 4)
4 pages
TD Bridge TRG_E_F_08
No ratings yet
TD Bridge TRG_E_F_08
100 pages
Apoorva 7th sem seminar Report
No ratings yet
Apoorva 7th sem seminar Report
45 pages

UNIT_2-PART_1

Uploaded by

UNIT_2-PART_1

Uploaded by

UNIT II CATALOGING AND INDEXING SUBTOPICS PART I

History and objectives of indexing History

 Indexing makes searching fast and efficient

 Indexing is also called as cataloguing

 For centuries, indexing was done manually

Dewey decimal system :

 Suppose a library where books are stored on shelves

 There are different types of index files in an information retrieval system

 It includes all the words and content with in a document

Public index file :

Private index file :

 It is even more focused and specific

How they relate :

Document file (Entire library collection) :

Public index file (General guide for all users ) :

Private index file (Personalized guide for specific needs) :

 Now, let’s say you love Italian cooking

Key concepts in manual indexing :

 Suppose a book about animals

 How detailed (or) precise the index terms are

 Exhaustivity – How many topics you cover

Weighting index terms :

 Suppose you have an article about “ healthy eating ”

Pre-coordination : (Linking at index creation)

Post coordination : (Linkage at search time)

Factors in linkage process :

Number of terms linked :

 How many terms you can link together

 Whether the order of the terms matters

 Extra information about the terms

 Each term’s position in the list indicates its role

 Extra words added to terms to describe their role

Advantages of automatic indexing :

Cost and speed :

 Imagine a news article about a company’s CEO changing

 Imagine you have a long research paper about climate change

You might also like