0% found this document useful (0 votes)
47 views

Research Report On Bangla Tagset

This report describes the design of a POS tagset for Bangla, based on the Penn Treebank design. The resulting tagset contains 53 morpho-syntactic tags.

Uploaded by

Roni Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Research Report On Bangla Tagset

This report describes the design of a POS tagset for Bangla, based on the Penn Treebank design. The resulting tagset contains 53 morpho-syntactic tags.

Uploaded by

Roni Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/47552512

Research Report on Bangla Tagset

Article · September 2010


Source: OAI

CITATION READS

1 727

2 authors:

Altaf Mahmud Mumit Khan


BRAC University BRAC University
4 PUBLICATIONS   41 CITATIONS    119 PUBLICATIONS   1,376 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Bangla Language Processing View project

Bangla Machine Translation View project

All content following this page was uploaded by Mumit Khan on 17 December 2014.

The user has requested enhancement of the downloaded file.


Research Report on Bangla Tagset

Altaf Mahmud and Mumit Khan


Center for Research on Bangla Language Processing, BRAC University, Dhaka, Bangladesh.
[email protected], [email protected]

Abstract 1. Introduction
This report describes the design of a POS tagset This report describes the design of a tagset for
for Bangla, based on the Penn Treebank design. The Bangla, based on the Penn Treebank design. The
resulting tagset contains 53 morpho-syntactic tags. design is heavily influenced by the wok on Penn
Treebank tagset, and follows the same methodology
[1, 2, 3].

2. Bangla Tagset

Table 1: Bangla Tagset

# Level 1 Level 2 Tag Examples

1 !oun Proper NNP u, ak




2 Common NNC  ,  

3 Verbal NNV  ,   ,  ,  

4 Temporal NNT  ,   , я, 


, 

5 Pronoun First Person PR1 , 

6 Second Person PR2  ,   , o

7 Third Person PR3 , ,   ,  

8 *on Person PRN s,  я, 


i, , u

9 Creditable PRC  ,  ,  ,   ,   ,  

10 Insignificant PRD  i,   , o

 ,   ,  ,  ,   , o,


11 Possessive PR$  ,  

12 TO Pronoun PRTO  ,   ,  ,  ,  ,  


# Level 1 Level 2 Tag Examples

13 Adjective Simple AJ  n,  , , !", !", !"

, #, #,  , # , # ,


14 Verb First Person VB1 
,  i

15 Second Person VB2 , #, #, #, #,  o

, #, #, , #, #,  ,


16 Third Person VB3 r, % 

17 *on Person VBN ,  

 , # , # ,  , # ,


18 Creditable VBC # , 

19 Insignificant VBD r, #, #, 

20 Infinite VBIF , ,  

21 Adverb Adverb AV s, dr, ),  , * 

22 Conjunction Co-ordinating CC e
, o, 
, a,
, 

23 Subordinating CS  i, 

24 Inflectors AT ICAT e, , 
e,  (i-- -/NNC+ICBY a  
25 BY ICBY % % %#)

26 Plural ICS  , e , g, /

27 TO ICTO , , e, , 

28 Possessive IC$ e, 

29 Determinative ICDT - , -

30 Adverbial ICAV o

31 Definitive ICDF i

32 Postposition Common PP d  , 1


2 , %, %i, ,

33 Possessive PP$ я 3, 4, 4 i

34 Interjection Interjection UH
h!, oh! % !

, a
3, 
, % ,   , 
2 k ,
35 Indeclinables Simple ID 
4

38
# Level 1 Level 2 Tag Examples

36 Infinite IDIF 

37 Particle Particle PT , , ,  ,


-

38 Onomatopes Onomatopes ON - - ,   , 7 7

39 Cardinal Cardinal CD e, di, 1, 2

40 Determiner Singular DT e-, o-, 

41 Plural DTS 
, o
, , 
;,  ,  , ei, <, #

42 Predeterminer DTP ei/DTP /DTI,  /DTP # /DTI

43 Symbol Symbol SYM =


j  
a st  4h

44 Taka Taka /= A (-   4h)

Sentence Final
45 Punctuation Sentence Final Punctuation | |, ?, !

46 Comma Comma , ,
Colon, Semi-
47 colon Colon, Semi-colon : :, ;

48 Bracket Left Bracket ( ([

49 Right Bracket ) )]

50 Quotation Opening Single Quote ' `

51 Closing Single Quote ' '

52 Opening Double Quote " "

53 Closing Double Quote " "

3. Results /NNP+ICTO 34!5/NNC 


/VBIF$я !

/NNT+IC$+i/PP$+ICDF!#/NNC+ICAT
A sample text tagged with the tagset is shown /VBIF $g'
/NNC+IC$ 7 /NNC я/VBC |/|
below. n /NNP 012 io /NNP+ICAV я
/NNC+ICATя8/VBC,/, !#/NNC+ICAT
/AJ яl- l
/NNC+IC$a/NNC /VBIF ps/AJ o/PRC+ICAV|/| я
/VBIF tt /AJ 

/NNC o/CC /NNC /NNC+ICAT я /AJ /NNC+ICAT
/NNC s

/NNC+IC$/NNC+ICAT :;
/NNC+IC$ k=/NNC o/CC s /NNC
p/AJ i /CD !/NNC e#/NNP o/CC > /AV
/VBC|/| u0 /DTI3/NNC #@7 /AJ
$o%/NNP !%&
/NNP+IC$ '-
/AJ  /NNC b5/NNC+ICAT /NNC+ICDT
m /NNC #)/NNC+ICAT !#/NNC 'c/VB3 я/VBC|/|
$я !
/NNC+IC$ +i/PP$+ICDF|/| $o%/NNP !#/NNC+ICAT/VBIFi/CD!
/NNC+I
!%&
/NNP 
/AJ m /NNC $b!/NNP C$ps/NNC1 Bn/AJ'o/VB3
я!!/NNP & !/NNT 
/NNP #
/NNC+ICAT 3
/NNC+IC$0n/AJs

/NNC+IC$
/NNC+I
e#
/NNP+IC$ '/NNC o/CC s%/AJ C$+/PP$ss
/NNC+IC$


nt%/NNC $b!/NNP n/NNP 012 i 0/NNC3F/VBIF)c/VB3|/|

39
0n/AJ
яG /AJ!/NNC
  /NNC+ICDT+ICTO i /AJ
!/VBIFs&/NNCя8/VB3|/|
a+/ID e/DTI as
/NNC+IC$ +i/PP$+ICDF
$я/NNT

/NNP 1/CD ak
/NNP e#/NNP
o/CC 
/PR$ 

/NNC+ICS #!/NNC
8/VB3
‘/’ 304/ AJp/AJ ‘/ /NNC |/|
  /NNC+ICDT+ICTO 
%/AJ !/NNC
$o%/NNP !%&/ NNP#!/NNC
8/VB3 ‘/’
!/AJ /NNC ‘/ '/PP |/| !/AJ
яG /AJ
 4/NNC 
/NNC+ICAT i/CD !
/NNC+IC$
+/PP$ K/NNC 3L
/NNC+IC$ +/PP$
a /PRC $я
/NNT+IC$ ei/DTI
4 /NNC+ICTO #)#1=/ NNC 'i/PP+ICDF
3F8/VBC |/|

4. Conclusion
This report presents a Bangla part-of-speech
(POS) tagset that is based on the Penn Treebank
tagset design. The tagset contains 53 2-level tags. A
sample text tagged with this tagset is shown.

5. References

[1] B. Santorini, Part-of-speech tagging guidelines


for the Penn Treebank Project. Technical report MS-
CIS-90--47, Department of Computer and
Information Science, University of Pennsylvania,
1990.

[2] M.P. Marcus, M.A. Marcinkiewicz, and B.


Santorini, “Building a large annotated corpus of
English: the Penn Treebank”, Comput. Linguist. 19,
2, June, 1993, pp. 313-330.

[3] M. Marcus, G. Kim, M.A. Marcinkiewicz, R.


MacIntyre, A. Bies, M. Ferguson, K. Katz, and B.
Schasberger, “The Penn Treebank: annotating
predicate argument structure”, In Proceedings of the
Workshop on Human Language Technology, Human
Language Technology Conference, Association for
Computational Linguistics, Morristown, Plainsboro,
NJ, March 08 - 11, 1994, pp. 114-119.

40

View publication stats

You might also like