Need help understanding this Python Viterbi algorithm -
I am trying to replace a Python implementation of the Viterbi algorithm found in Ruby. The whole story can be found in the lower part of this question with my observations.
Unfortunately, I know very little about Python so that the translation is proving as difficult as possible for me. Even so, I have made some progress now, the only line that is completely melting my mind is this:
prob_k, k = max ((probs [j] * word_prob (text [J: i]), j) Does anyone please tell me what is this doing? Here is the full Python script: imported by importer imported by # text to the group, the group will have a 'compound word' such as 'bad weather' DEF viterbi_segment (text): probe, stays = [1.0], [0] # in compound Iterate on letters # Such as [W, icked icked we ather ather], [y, cecade weather] and so on. I (1, lane + 1) in the category: # I do not know what this line is doing and I Range (max (0, i - max_word_length), i) adding value to # ARMs, prob_k, k = max ((probs [j] * word_prob (text [j: I]), j) keeps Probs.append (prob_k) .append (k) word = [] i = len (text) while 0 & lt; i: words.append (text [i]: i] ) I = remains [i] word. Reverse () returns word, probes [-1] # words in dictionary The likely glossary exists Def word_prob (words): # dictionary.get (key) will return value to the specified key. # In this case, the number of occurrences of word thw in the word # words. The second argument is a basic value, if the word is not found, then return it. Return dictionary.get (word, 0) / total # ensures that we deal with it rather than full letters, instead of each # separate letter. Definitely make words normal. Def word (text): Return refund ('[A-Z +]', text.over ()) # This gives us a hash where there are key words and the value of the dictionary #Occurrence the number is. Dictionary = dict ((w, lane (list (ws))) # / usr / share / dixt / The word Nillin is a file of marginal words. For W, by group (sorted (words (open ('/ usr Specify the length of the word in the longest word in the dictionary. Max_word_length = max (map (lane, dictionary)) Assign the total number of words in dictionary # this one Float is # because we are going to divide it later. Total = float (sum (dictionary. (Value)) # Finite words for running algo on a file of new line. Conjunction = words (open ('compounds.txt'). Read compounds for compounds ()) Print vibration, ":", viterbi_segment (comp)
The extended version looks like this:
all_probs = [class = "post-text" itemprop = "text"> range in j (max (0 , I - max_word_length), i): all_probs.append ((probs [j] * word_prob (text [j: i]), j) prob_k, k = max (all_probs)
I hope that helps explain it. If this does not happen, feel free to edit your question and point out the statements that you do not understand.
Comments
Post a Comment