One of the things that surprised me most in this early study of the python language, and the ease of string manipulation, anyone passing through c ++ or Java knows how complicated it is to slice phrases, sort word frequencies, and so on, in this example we'll talk about mining of feelings in texts, Imagine a social network of a great company with millions of followers, tastings and comments, imagine the work to evaluate the satisfaction of the user through the messages, for problems of this type we have the mining of feelings.
In addition to the libraries that we have already used Pandas, Numpy and Matplotlib, we will also import the NLTK (Natural Linguage Toolkit) library and the library to work with human linghagam, we will be able to donload the PUNKT that is one of the dependencies of our project and finally FreqDist , which is an object to help us count the unique frequency of each word in the text
in [160] we have the text to be analyzed, in this case I am typing, but there are APIs where I can search in social networks or news feeds, then I apply in the text the lower () method to leave everything small, since Python does distinguish between capital letters and menus, notice that on exit out [160] 'O' this minuscule.
The word_tokenize () method separates the words and stores each value in an index of the vector P
in [167]: we create the frequency vector and use the FreqDist method already imported to count the word frequency of the vector P created in [166],
in[174] criamos o vetor palavras guardando somente as chaves do vetor de frequência, perceba que ele retira as palavras repetidas,
in [175] we display a graph that shows the words, and their number of repetitions.
Only this does not solve because we do not mining anything, we just separate and count the words, within machine learning we have a technique called learning supervises, where we train the machine in this case we use keywords from some comments in social networks
in[162] in this case we categorize 3 types of text, positive, negative and doubtful, a good mining of feelings must have between 200 and 500 key words, in this case we use only a few to illustrate the example. We also initiate 3 counters to be incremented each time we find a word from the vector of feelings.
Interagimos o vetor P com o vetor Duvida, Negativo e Positivo incrementando seus contadores
[181] final output of the program.
Training process
insert texts from clients, or simulate human texts to train our system, at first the system will not have the key words registered
in[9]we register the words in the vector and execute again, repeating this training until reaching a rasoavel quantity of words and being able to infer a satisfactory result
one more workout
In addition to the libraries that we have already used Pandas, Numpy and Matplotlib, we will also import the NLTK (Natural Linguage Toolkit) library and the library to work with human linghagam, we will be able to donload the PUNKT that is one of the dependencies of our project and finally FreqDist , which is an object to help us count the unique frequency of each word in the text
in [160] we have the text to be analyzed, in this case I am typing, but there are APIs where I can search in social networks or news feeds, then I apply in the text the lower () method to leave everything small, since Python does distinguish between capital letters and menus, notice that on exit out [160] 'O' this minuscule.
The word_tokenize () method separates the words and stores each value in an index of the vector P
in [167]: we create the frequency vector and use the FreqDist method already imported to count the word frequency of the vector P created in [166],
in[174] criamos o vetor palavras guardando somente as chaves do vetor de frequência, perceba que ele retira as palavras repetidas,
in [175] we display a graph that shows the words, and their number of repetitions.
Only this does not solve because we do not mining anything, we just separate and count the words, within machine learning we have a technique called learning supervises, where we train the machine in this case we use keywords from some comments in social networks
in[162] in this case we categorize 3 types of text, positive, negative and doubtful, a good mining of feelings must have between 200 and 500 key words, in this case we use only a few to illustrate the example. We also initiate 3 counters to be incremented each time we find a word from the vector of feelings.
Interagimos o vetor P com o vetor Duvida, Negativo e Positivo incrementando seus contadores
[181] final output of the program.
Training process
insert texts from clients, or simulate human texts to train our system, at first the system will not have the key words registered
in[9]we register the words in the vector and execute again, repeating this training until reaching a rasoavel quantity of words and being able to infer a satisfactory result
one more workout