Retail Financial Services Telecom Ecommerce/Social Media Education
Behavioral Segmentation Scorecard Development Market Basket Analysis Personalized Recommendations Churn Management Text Mining Campaign Management Social Network Analysis Loyalty Measurement
Applied Data Mining Techniques Statistics Essentials Statistical Model Development Text Mining Analytics for Marketing Managers Analytics for Risk Managers Analytics for Collections Managers Mechanics of Financial Products Emerging Trends in Analytics
Free Pricing Piracy or Free Promotion Impact of Color Promotion thru' Spokes-characters Chief Customer Officer Speech Analytics Number Portability Cause Marketing Fear Marketing Global Recipe with Local Spices Facebook: Social Media Marketing
Management Team Partners Spotlight
Popular Quotes Do You Know? Glossary Ten Things in Retail Best Practices - Amazon Best Practices - Singapore Airlines Best Practices - Shopper's Stop Best Practices - Tesco

Text Mining

The analytic solutions today primarily revolve around structured information like customer’s demographic information, registration data, transactions details, etc. or data that can be arranged in tables or spreadsheets. With increasing competition as well as customers becoming ever demanding, it’s imperative to go beyond and understand what customers say. Increasingly, existing and potential customers call-up toll-free numbers to request and receive services, or write emails in all aspects of their lives. Extracting insights from large volume of textual information can be cumbersome and hence, much of this valuable information is still unused.

In a CRM environment, customer’s comments, complaints or requests are captured by call center agents that could be mined to get valuable insights. This information can be utilized in numerous ways as mentioned below:

  • Analyze call center transcripts to identify customer concerns. Use this information back to understand customer actions and segments
  • Improve customer retention by determining which customer complaints are most likely to result in attrition, and take proactive action
  • Discover what drives customers to your customer service call center and identify trends in product defects or areas for service improvement
  • Identify common customer complaints from online customers by analyzing customer e-mail and instant message transcripts. Use this information to identify areas of your site that need improvement

Some of this information could potentially be used to achieve 10-50% improvement in prediction. According to a Gartner study, it is estimated that unstructured information is doubling in quantity every three months (Autonomy, 2005b). Automatic text categorization has so far been confined to research papers from academia and is being increasingly used by businesses to improve processes.

Text Mining involves pattern searching across textual information. A typical text mining process involves

  • Text Preprocessing
    • Normalization
    • Tagging
    • Tokenization
    • Dimensionality Reduction
    • Stemming and Lemmatization
  • Feature Extraction and Selection
  • Text Classification
  • Text Summarization

Natural Language Processing (NLP) is a critical component in text mining for processing text. Natural Language Toolkit (NLTK), originally created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania, is extensively used for this purpose.

Text Preprocessing

A document is represented as Bag-of-Words in d dimensional space where d is total number of words. Each possible word forms a dimension and the value corresponding to that direction will be frequency count of that word in the document. Thus a text is represented as a d dimensional space. The text processing can be done using the following steps:

Normalization: Document available in the form of text have words in uppercase and lower case which causes difference in handling same word with only difference in case. The text needs to be normalized to lowercase to make it uniform.

Tagging: Part-of-speech (POS) tagging is the process of assigning a part-of-speech like noun, verb, pronoun, preposition, adverb, adjective or other lexical class marker to each word in a sentence. The input to a tagging algorithm is a string of words of a natural language sentence and a specified tagset (a finite list of Part-of-speech tags). The output is a single best POS tag for each word.

Tokenization: It is the process of reducing a message to its colloquial components. These components can be individual words, word pairs, or other small chunks of text. Data generated by the tokenizer is ultimately passed for analysis, where it is interpreted. How the data is interpreted is important, but not necessarily as important as the quality of the data being passed. In other words, the way that a text is tokenized is more important than what we do with it later; even a simple change in tokenization can affect the accuracy of the filter.

Dimensionality Reduction: The idea behind Dimensionality Reduction is to remove the non-context words which occur in the document such “the”, “that”, “this”, etc. These words occur with very high frequency in most documents and they do not carry any semantic meaning for categorization. These words are called stop words. List of stop words is not fixed. It changes from data set to data set. A word may be significant for one data set and insignificant for another one. A classifier should be tuned to the real characteristic of the training data. Therefore these words are better removed to improve accuracy. It also helps in decreasing processing time while classifying text.

Stemming and Lemmatization: Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. The Porter's stemming algorithm consists of 5 phases of word reductions, applied sequentially. Within each phase there are various conventions to select rules, such as selecting the rule from each rule group that applies to the longest suffix. Stemming is not a well-defined process, and we typically pick the stemmer that best suits the application we have in mind. The Porter Stemmer is a good choice if you are indexing some texts and want to support search using alternative forms of words which uses object oriented programming techniques.

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

Feature Extraction and Selection 

After text processing, each document can be characterized by the most important features in the entire text collection with the highest weighted terms expressing the central concepts. There are multiple weighting schemes – frequency of words, distance between the words, position of words in the document and discrimination between documents or document categories. While weighting terms, length of document, presence of nouns, adjective, verbs, etc. need to be considered. Proper names are identified based on heuristic rules that take into account patterns of capitalized words in a text and reoccurrence of these patterns in this text. The feature extraction also includes n-gram analysis to identify keyword combinations.

Text Classification

Text classification deals with classification/categorization of documents into classes based on some similarity measurement. The first successful method in this field was Bag-of-Words approach by weighting the occurrence of each word with a function which depends on term frequency and inverse term frequency. These counts help in assigning a significant score to a term in given context. There are a number of algorithms that can be used for text classification. Some of the algorithms are discussed as follows:

Probabilistic Classifier: a probabilistic classifier find class of document as:

      P(ci|dj) = P(ci) P(dj|ci)

Thus it finds a probability that a given document di belongs to cj. P(ci) is the probability that a randomly picked document   belongs to class ci. P(dj) is the probability that a randomly picked document has representation dj   and it remains constant for all classes so need not be calculated. P(dj|ci) is the probability that class ci  had vector representation dj . Naives Bayes classifier is well known example of this approach.

Decision Tree Classifiers: A decision tree classifier is a tree in which internal nodes are labeled by terms, and branches originated from them are labeled by tests on the weight that the term has in its test document and leaves are labeled by categories. Such a classifier categorizes a test document dj by recursively testing for the weights that the terms label ling the internal nodes have in vector dj  until a leaf node is reached, the label of this node is assigned to document.

Neural Network Classifiers: A Neural network classifier is a network of units where input unit represents terms. Output units represent categories of intersect and weight on the edges containing units represents dependency relations. For classifying a document vector d, its term weights are loaded into input units, the activation of these units is propagated forward through the network and the value of output unit determines the categorization decision.

Example Based Classifiers: These classifiers do not build explicit representations of class ci.. But depend on class labels attached with training documents similar to test document. Nearest neighbor classifier is well known example of this type of classifiers.

Support Vector Machines Classifiers: SVM tries to find among all surfaces (same as the no. of classes) that separates the positive from negative training examples by widest possible margin.

Some more noteworthy approaches are Regression classifiers, Genetic algorithms and Decision rule classifier. 

Text Summarization

In order to generate a summary, we have to identify the most important pieces of information from the document, omitting irrelevant information and minimizing details, and assemble them into a compact coherent report. This however, is easier said than done as it involves some of the main problems of natural language processing. To produce a domain-independent system would require work in natural language understanding, semantic representation, discourse models, world knowledge, and natural language generation. Successes in domain-independent systems are few and limited to identifying key passages and sentences of the document. More successful systems have been produced for limited domain applications such as report generations for weather, financial and medical databases.


You may also like to read:

Behavioral Segmentation
Speech Analytics
Social Network Analysis
Net Promoter Score