Below the python script below calculates the frequency of words in the text (continuous letter sequences, with the exception of punctuation marks) and displays the results table.
works correctly. The question is what: can I do the same easier (for example, less code lines) on Python, Bash, PHP, Perl or is it the best way?
import sys
Import String
File = Open (sys.argv [1], "R")
text = file.read ()
File.Close ()
Table = String.MakeTrans ("", "")
Words = Text.Lower (). Split (None)
Frequencies = {}
For Word in Words:
TrimMed = Word.Translate (Table, String.punctuation)
Frequencies [trimmed] = frequencies.get (TrimMed, 0) + 1
Keys = Sorted (Frequencies.keys ())
For Word in Keys:
Print "% -32S% d"% (Word, Frequencies [Word])
1, Authority 100%
In your example, encountered a similar line: “AA, BB, CC” it is considered as “AABBCC 1”, and should be:
AA 1.
BB 1.
CC 1
So, my version on Perl:
#! / usr / bin / perl
Use Strict;
My% result;
While (& lt; & gt;) {
$ result {lc $ _} ++ for / (\ w +) / g;
}
Printf "% -32S% d \ n", $ _, $ result {$ _} for sort keys% result;
You can, of course, can be composed in one line. But it will be unreadable.
2, Authority 67%
python, regquins
import re
Import sys.
Import operator.
File = Open (sys.argv [1], "R")
text = file.read (). Decode ("UTF8")
File.Close ()
Words = Re.Findall (R "(\ W +)", Text, Re.unicode)
STATS = {}
For Word in Words:
STATS [WORD] = STATS.GET (Word, 0) + 1
STATS_LIST = Sorted (Stats.iterItems (), Key = operator.itemgetter (1))
For Word, Count in Stats_List:
Print "% -32S% d"% (Word, Count)
3, Authority 50%
My Option:
#! / usr / bin / perl
Use Strict;
My% Frec;
SUB CALC {
$ frec {$ 1} ++ while ($ _ [0] = ~ / \ b (\ s +) \ b / g);
}
My $ FileName = SHIFT or DIE ("Uasge: $ 0 filenameWithText");
Open FF, $ FileName;
Calc ($ _) for (& lt; ff & gt;);
Foreach (Sort {$ frec {$ b} & lt; = & gt; $ frec {$ a}} Keys% Frec) {
PrintF ("% -32S% d \ n", $ _, $ frec {$ _});
}
Close FF;
4, Authority 50%
Script on Bash / AWK, for the collection:
# / bin / bash
if [-z "$ 1"]
Then.
Echo "Usage:` Basename $ 0` FILENAME "
EXIT 1.
fi
for x in $ (sed -rn 's / \ w + / / gp' $ 1);
do.
Echo $ X.
Done | AWK '{Print Tolower ($ 0)}' | Sort | awk '
{
If (! Word) {
Word = $ 1
Num = 0.
} ELSE If (Word == $ 1) {
Num ++.
} else {
Print Word, Num + 1
Word = $ 1
Num = 0.
}
} '
5
PHP Frequency Frequency Options in the Text:
$ text = preg_replace ("/ [\ s \. \, \!] + /", "", $ text );
$ Text = Explode ("", $ text);
Foreach (Array_Count_Values ($ Text) AS $ Key = & gt; $ value)
{
Echo '& lt; br & gt;. $ Key .'-- & gt;. $ value;
}