Given a text, how could I count the density / count of word lengths, so that I get an output like this
- 1 letter words : 52 / 1%
- 2 letter words : 34 / 0.5%
- 3 letter words : 67 / 2%
Found this but for python
Given a text, how could I count the density / count of word lengths, so that I get an output like this
Found this but for python
You could start by splitting your text into words, using either explode()
(as a very/too simple solution) or preg_split()
(allows for stuff that's a bit more powerful) :
$text = "this is some kind of text with several words";
$words = explode(' ', $text);
Then, iterate over the words, getting, for each one of those, its length, using strlen()
; and putting those lengths into an array :
$results = array();
foreach ($words as $word) {
$length = strlen($word);
if (isset($results[$length])) {
$results[$length]++;
}
else {
$results[$length] = 1;
}
}
If you're working with UTF-8, see mb_strlen()
.
At the end of that loop, $results
would look like this :
array
4 => int 5
2 => int 2
7 => int 1
5 => int 1
The total number of words, which you'll need to calculate the percentage, can be found either :
foreach
loop, array_sum()
on $results
after the loop is done.And for the percentages' calculation, it's a bit of maths -- I won't be that helpful, about that ^^