Language Technology
Social Media Analytics
The presence of social media has made a tremendous changes in the way how people see the world outside. It started to influence people opinion and emotions regarding the happenings around them. This project aims to analyse the user posts and comments in the social media and determine the hidden sentiments in the posts thus by leading to find the degree negativity in each user comment or posts.
The entire architecture of the proposed system can be categorized into two according to the learning mechanisms incorporated in the work. There is a CRF learning module and a SVM learning module, which are arranged in a linear fashion, such that output of the one module is given as the input to the another one. After the pre-processing of the data, there is a word level learning mechanism using CRF which involves semantic tagging. The output from the CRF model is given into the SVM module for sentence level learning. The sentences in dataset are categorized according to the sentiment bearing words in the sentence. First stage is identifying such words and are assigned with well defined semantic tags along with the BIS tags already associated with the words. This newly defined tags gives the information about the context with which that sentence appears. The second stage involves tagging the whole sentence according to the context, positive, negative and neutral.
Sentiment identification step deals with finding the polarity of sentences, which depends up on the sentiment bearing words in the sentence, which are tagged with some semantic tags. Identification and tagging of subjective words are done manually. We are also looking for intensifiers, the words or adverbs which give force or emphasis to the word appears next. In the sentiment classification step the whole sentences are classified into corresponding target classes like positive, negative and neutral. Two machine learning algorithms are incorporated here, CRF and SVM. The methodology consists two levels of learning. It make use of word level semantic labeling followed by the sentence level classification. Word level tagging can be considered as sequence labeling problem and sentence level classification is a text classification problem. CRF is a popular supervised learning algorithm well suited for sequence labeling problems. SVM is efficient for text classification problem. Here we using CRF for word level tagging and SVM for sentence level classification.
The result from SVM will be the classified form of the input sentences. Sentences may belong to one of these categories: Positive, Negative, or Neutral. After that the negative sentences are again subjected to the training for finding the degree negativity in the sentences. These are again classified according to the polarity of negativity in the text such as: mildly, highly, extremely negative and thus by finding the use of negativity in the social media platforms.
Gitlab Repository
Express Interest