Home php word frequency in text

word frequency in text

Author

Date

Category

Below the python script below calculates the frequency of words in the text (continuous letter sequences, with the exception of punctuation marks) and displays the results table.

works correctly. The question is what: can I do the same easier (for example, less code lines) on Python, Bash, PHP, Perl or is it the best way?

import sys
Import String
File = Open (sys.argv [1], "R")
text = file.read ()
File.Close ()
Table = String.MakeTrans ("", "")
Words = Text.Lower (). Split (None)
Frequencies = {}
For Word in Words:
  TrimMed = Word.Translate (Table, String.punctuation)
  Frequencies [trimmed] = frequencies.get (TrimMed, 0) + 1
Keys = Sorted (Frequencies.keys ())
For Word in Keys:
  Print "% -32S% d"% (Word, Frequencies [Word])

1, Authority 100%

In your example, encountered a similar line: “AA, BB, CC” it is considered as “AABBCC 1”, and should be:
AA 1.
BB 1.
CC 1

So, my version on Perl:

#! / usr / bin / perl
Use Strict;
My% result;
While (& lt; & gt;) {
  $ result {lc $ _} ++ for / (\ w +) / g;
}
Printf "% -32S% d \ n", $ _, $ result {$ _} for sort keys% result;

You can, of course, can be composed in one line. But it will be unreadable.


2, Authority 67%

python, regquins

import re
Import sys.
Import operator.
File = Open (sys.argv [1], "R")
text = file.read (). Decode ("UTF8")
File.Close ()
Words = Re.Findall (R "(\ W +)", Text, Re.unicode)
STATS = {}
For Word in Words:
  STATS [WORD] = STATS.GET (Word, 0) + 1
STATS_LIST = Sorted (Stats.iterItems (), Key = operator.itemgetter (1))
For Word, Count in Stats_List:
  Print "% -32S% d"% (Word, Count)

3, Authority 50%

My Option:

#! / usr / bin / perl
Use Strict;
My% Frec;
SUB CALC {
  $ frec {$ 1} ++ while ($ _ [0] = ~ / \ b (\ s +) \ b / g);
}
My $ FileName = SHIFT or DIE ("Uasge: $ 0 filenameWithText");
Open FF, $ FileName;
Calc ($ _) for (& lt; ff & gt;);
Foreach (Sort {$ frec {$ b} & lt; = & gt; $ frec {$ a}} Keys% Frec) {
  PrintF ("% -32S% d \ n", $ _, $ frec {$ _});
}
Close FF;

4, Authority 50%

Script on Bash / AWK, for the collection:

# / bin / bash
if [-z "$ 1"]
Then.
 Echo "Usage:` Basename $ 0` FILENAME "
 EXIT 1.
fi
for x in $ (sed -rn 's / \ w + / / gp' $ 1);
do.
 Echo $ X.
Done | AWK '{Print Tolower ($ 0)}' | Sort | awk '
{
 If (! Word) {
  Word = $ 1
  Num = 0.
 } ELSE If (Word == $ 1) {
  Num ++.
 } else {
  Print Word, Num + 1
  Word = $ 1
  Num = 0.
 }
} '

5

PHP Frequency Frequency Options in the Text:

$ text = preg_replace ("/ [\ s \. \, \!] + /", "", $ text );
$ Text = Explode ("", $ text);
Foreach (Array_Count_Values ​​($ Text) AS $ Key = & gt; $ value)
{
   Echo '& lt; br & gt;. $ Key .'-- & gt;. $ value;
}

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions