Dependency Parser

DEPENDENCY PARSER FOR MALAYALAM

A Dependency parser is important for applications like sentence structure identification, machine translation, clause boundary identification,question answering system etc. No work has been done so far in the field of dependency parser for Malayalam using machine learning approaches. The main concept of Dependency parser is that each linguistic units (words) are connected to each other by a direct link. Dependency grammar has the ability to deal with the languages that are morphologically rich and have relatively free order ie, Some words such as adverbs can freely be moved inside a sentence without influencing its correctness or meaning. A majority of human languages and Indian languages are included in such category. The vast majority of the parsers for morphologically rich free word order(MoR-FWO) languages, (for example,Czech, Turkish, and so on. ) have adopted the dependency grammatical framework. Malayalam is considered as the less word order language. Which means that it is not a complete free word order language since all words cant be freely moved.

The implementation of dependency parsing involves a sequence of steps. The input sentences are first tokenized. The tokenized words are first sent to stemmer and splitted as simple words then entered in to the pos tagger to get the pos tagged data. The pos tagged sentences are then passed to the chunker to get the chunked data. Now the words contain the pos tags and the chunk tags. It is then preprocessed to the parser input format and then passed to the parser. The parsed output is then converted to a digraph format which serves as the input to the tree viewer. The tree viewer gives the parsed output to be viewed in a tree format. The main processes involved in the building of proposed system are tokenization, POS tagging, phrase tagging, format conversion, parsing, and tree generation. Transition based approach is used in our model for parsing. Focus on projective dependency parsing. The method developed for the parser uses the pos tag and chunk tag as its features for parsing the new sentences. This approach used a greedy algorithm in which a single action is chosen at every point.

Express Interest

prev next

Related Projects

Automate & Simplify The Whole Process

Dwanimam – POS editor

Dhwanimam – Malayalam POS tag checker is a web based application, developed by the team at ICFOSS, which provides an opportunity to edit Parts-of-Speech (POS) tags assigned to Malayalam words.

Lekha OCR

OCR is a tool that converts malayalam document images to editable malayalam text. Lekha OCR version 2.0 is designed to be easy,fast and simple to use. It has add-on features like scanning, croping,rotating and skew correction of images.

Malayalam ChatBot

ICFOSS developed a Malayalam Chat-Bot, which interacts with the system users, like a human conversational partner in MALAYALAM. A chatbot can be used as a conversational agent which gives information to the public.

Malayalam Filthy Comment Detecor from Facebook

Social media are interactive computer-mediated technologies that facilitate the creation and sharing of information, ideas, career interests and other forms of expression via virtual communities and networks.

Malayalam Transliteration Tool

Machine Transliteration is the practice of converting a character or word written in one language to another .Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms, cross language applications, data mining and information retrieval system. This tool is to convert a malayalam name entity to corresponding English word.