Research Report On Bangla Tagset
Research Report On Bangla Tagset
net/publication/47552512
CITATION READS
1 727
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mumit Khan on 17 December 2014.
Abstract 1. Introduction
This report describes the design of a POS tagset This report describes the design of a tagset for
for Bangla, based on the Penn Treebank design. The Bangla, based on the Penn Treebank design. The
resulting tagset contains 53 morpho-syntactic tags. design is heavily influenced by the wok on Penn
Treebank tagset, and follows the same methodology
[1, 2, 3].
2. Bangla Tagset
2 Common NNC ,
3 Verbal NNV , , ,
22 Conjunction Co-ordinating CC e
, o,
, a,
,
23 Subordinating CS i,
24 Inflectors AT ICAT e, ,
e, (i-- -/NNC+ICBY a
25 BY ICBY % % %#)
29 Determinative ICDT - , -
30 Adverbial ICAV o
31 Definitive ICDF i
34 Interjection Interjection UH
h!, oh! % !
, a
3,
, % , ,
2 k ,
35 Indeclinables Simple ID
4
38
# Level 1 Level 2 Tag Examples
38 Onomatopes Onomatopes ON --, , 7 7
41 Plural DTS
, o
, ,
;, , , ei, <, #
Sentence Final
45 Punctuation Sentence Final Punctuation | |, ?, !
46 Comma Comma , ,
Colon, Semi-
47 colon Colon, Semi-colon : :, ;
49 Right Bracket ) )]
/NNT+IC$+i/PP$+ICDF!#/NNC+ICAT
A sample text tagged with the tagset is shown /VBIF $g'
/NNC+IC$ 7 /NNC я/VBC |/|
below. n /NNP 012 io /NNP+ICAV я
/NNC+ICATя8/VBC,/, !#/NNC+ICAT
/AJ яl- l
/NNC+IC$a/NNC /VBIF ps/AJ o/PRC+ICAV|/| я
/VBIF tt /AJ
/NNC o/CC /NNC /NNC+ICAT я /AJ /NNC+ICAT
/NNC s
/NNC+IC$/NNC+ICAT :;
/NNC+IC$ k=/NNC o/CC s /NNC
p/AJ i /CD !/NNC e#/NNP o/CC > /AV
/VBC|/| u0 /DTI3/NNC #@7 /AJ
$o%/NNP !%&
/NNP+IC$ '-
/AJ /NNC b5/NNC+ICAT
/NNC+ICDT
m /NNC #)/NNC+ICAT !#/NNC 'c/VB3 я/VBC|/|
$я !
/NNC+IC$ +i/PP$+ICDF|/| $o%/NNP !#/NNC+ICAT/VBIFi/CD!
/NNC+I
!%&
/NNP
/AJ m /NNC $b!/NNP C$ps/NNC1 Bn/AJ'o/VB3
я!!/NNP & !/NNT
/NNP #
/NNC+ICAT 3
/NNC+IC$0n/AJs
/NNC+IC$
/NNC+I
e#
/NNP+IC$ '/NNC o/CC s%/AJ C$+/PP$ss
/NNC+IC$
nt%/NNC $b!/NNP n/NNP 012 i 0/NNC3F/VBIF)c/VB3|/|
39
0n/AJ
яG /AJ!/NNC
/NNC+ICDT+ICTO i /AJ
!/VBIFs&/NNCя8/VB3|/|
a+/ID e/DTI as
/NNC+IC$ +i/PP$+ICDF
$я/NNT
/NNP 1/CD ak
/NNP e#/NNP
o/CC
/PR$
/NNC+ICS #!/NNC
8/VB3
‘/’ 304/ AJp/AJ ‘/ /NNC |/|
/NNC+ICDT+ICTO
%/AJ !/NNC
$o%/NNP !%&/ NNP#!/NNC
8/VB3 ‘/’
!/AJ /NNC ‘/ '/PP |/| !/AJ
яG /AJ
4/NNC
/NNC+ICAT i/CD !
/NNC+IC$
+/PP$ K/NNC 3L
/NNC+IC$ +/PP$
a /PRC $я
/NNT+IC$ ei/DTI
4 /NNC+ICTO #)#1=/ NNC 'i/PP+ICDF
3F8/VBC |/|
4. Conclusion
This report presents a Bangla part-of-speech
(POS) tagset that is based on the Penn Treebank
tagset design. The tagset contains 53 2-level tags. A
sample text tagged with this tagset is shown.
5. References
40