php - word frequency in text

Below the python script below calculates the frequency of words in the text (continuous letter sequences, with the exception of punctuation marks) and displays the results table.

works correctly. The question is what: can I do the same easier (for example, less code lines) on Python, Bash, PHP, Perl or is it the best way?

import sys
Import String
File = Open (sys.argv [1], "R")
text = file.read ()
File.Close ()
Table = String.MakeTrans ("", "")
Words = Text.Lower (). Split (None)
Frequencies = {}
For Word in Words:
  TrimMed = Word.Translate (Table, String.punctuation)
  Frequencies [trimmed] = frequencies.get (TrimMed, 0) + 1
Keys = Sorted (Frequencies.keys ())
For Word in Keys:
  Print "% -32S% d"% (Word, Frequencies [Word])

1, Authority 100%

In your example, encountered a similar line: “AA, BB, CC” it is considered as “AABBCC 1”, and should be:
AA 1.
BB 1.
CC 1

So, my version on Perl:

#! / usr / bin / perl
Use Strict;
My% result;
While (& lt; & gt;) {
  $ result {lc $ _} ++ for / (\ w +) / g;
}
Printf "% -32S% d \ n", $ _, $ result {$ _} for sort keys% result;

You can, of course, can be composed in one line. But it will be unreadable.

2, Authority 67%

python, regquins

import re
Import sys.
Import operator.
File = Open (sys.argv [1], "R")
text = file.read (). Decode ("UTF8")
File.Close ()
Words = Re.Findall (R "(\ W +)", Text, Re.unicode)
STATS = {}
For Word in Words:
  STATS [WORD] = STATS.GET (Word, 0) + 1
STATS_LIST = Sorted (Stats.iterItems (), Key = operator.itemgetter (1))
For Word, Count in Stats_List:
  Print "% -32S% d"% (Word, Count)

3, Authority 50%

My Option:

#! / usr / bin / perl
Use Strict;
My% Frec;
SUB CALC {
  $ frec {$ 1} ++ while ($ _ [0] = ~ / \ b (\ s +) \ b / g);
}
My $ FileName = SHIFT or DIE ("Uasge: $ 0 filenameWithText");
Open FF, $ FileName;
Calc ($ _) for (& lt; ff & gt;);
Foreach (Sort {$ frec {$ b} & lt; = & gt; $ frec {$ a}} Keys% Frec) {
  PrintF ("% -32S% d \ n", $ _, $ frec {$ _});
}
Close FF;

4, Authority 50%

Script on Bash / AWK, for the collection:

# / bin / bash
if [-z "$ 1"]
Then.
 Echo "Usage:` Basename $ 0` FILENAME "
 EXIT 1.
fi
for x in $ (sed -rn 's / \ w + / / gp' $ 1);
do.
 Echo $ X.
Done | AWK '{Print Tolower ($ 0)}' | Sort | awk '
{
 If (! Word) {
  Word = $ 1
  Num = 0.
 } ELSE If (Word == $ 1) {
  Num ++.
 } else {
  Print Word, Num + 1
  Word = $ 1
  Num = 0.
 }
} '

5

PHP Frequency Frequency Options in the Text:

$ text = preg_replace ("/ [\ s \. \, \!] + /", "", $ text );
$ Text = Explode ("", $ text);
Foreach (Array_Count_Values ($ Text) AS $ Key = & gt; $ value)
{
   Echo '& lt; br & gt;. $ Key .'-- & gt;. $ value;
}

word frequency in text

1, Authority 100%

2, Authority 67%

3, Authority 50%

4, Authority 50%

5

Programmers, Start Your Engines!

Recent questions

yandex cards disappear labels with zoom

Embarcadero C++ Builder 10.3 does not give prompts by code

Found input variables with inconsistent numbers of samples error

Return to previous page

Lua C++ error handling