Home python Quick item search in the Python list

Quick item search in the Python list

Author

Date

Category

Two files are given: one contains text, in the second word and the coefficients characterizing these words, while so that only one word and one coefficient in one line. About this so

21230 Apple
23121 Pillow

in files with text more than 30,000 words. The file with the words is always used by one and its contents unchanged, but it has more than a million rows.

You need to get the coefficients of all words from the first file, while spending as little time as possible.
At this stage there is a list (let it call firstarr) of all words from the text and the list of words with coefficients (let it be called SECARR), in which the word can be accessed via

Secarr [Word] [1]

and to the coefficient

Secarr [SOEF] [0]

I never previously used Python and do not know many aspects of the language, so all what I reached it to check the invested cycle

for word in firstarr:
  For Word in Range (Len (SECARR)):
   IF Word == SECARR [WORD] [1]:
     COEF + = INT (SECARR [I] [0])
     Break

Also there is an idea to do the same in meaning, but with

for word in firstarr:
  IF Word in Secarr:
   COEF + = INT (SECARR [SECARR.INDEX (WORD)] [0])

But it is very doubtful that it shortens the search time (again, it seems to me that maybe it is not so). Perhaps there are any more elegant solutions to the time of time spent on the search?


Answer 1, Authority 100%

Search coefficients from a million list by direct prosperity is one of the most inefficient ways. Sampling Values ​​from the dictionary (hash table) is the solution of your task.

From the second file, create a dictionary (DICT) and save to the file. For example, so:

import pickle
# Dictionary with coefficients
d = {
  'Word1': 101,
  'Word2': 102,
  'Word3': 103,
  'Word4': 104,
# ...
}
# Dictionary can be made from your "second" file, but for this you need to know its structure
DICTFILE = OPEN ('DICT.PICKLE', 'WB')
Pickle.DUMP (D, Dictfile)

Then before calculating the coefficients, download the dictionary from the

file

import pickle
DICTFILE = OPEN ('DICT.PICKLE', 'RB')
SECARR = Pickle.load (DICTFILE)
COEF = 0.
For Word in Firstarr:
  IF Word in Secarr:
    COEF + = SECARR [WORD]

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions