"creatures"
"we"
Sort Words Based on Frequency
Find the unique words in sonnetWords. Count them and sort them based on their
frequency.
To count words that differ only by case as the same word, convert sonnetWords to
lowercase. For example, The and the count as the same word. Find the unique words
using the unique function. Then, count the number of times each unique word occurs
using the histcounts function.
sonnetWords = lower(sonnetWords);
[words,~,idx] = unique(sonnetWords);
numOccurrences = histcounts(idx,numel(words));
Sort the words in sonnetWords by number of occurrences, from most to least common.
[rankOfOccurrences,rankIndex] = sort(numOccurrences,'descend');
wordsByFrequency = words(rankIndex);
Plot Word Frequency
Plot the occurrences of words in the Sonnets from the most to least common words. Zipf's
Law states that the distribution of occurrences of words in a large body text follows a
power-law distribution.
loglog(rankOfOccurrences);
xlabel('Rank of word (most to least common)');
ylabel('Number of Occurrences');
Analyze Text Data with String Arrays