java - Runtime of this implementation to find k most frequently recurring words in a string -

April 15, 2014

here's algorithm:

(a) make wordoccurrence class contains string sentence , int numberofoccurrences. let's assume sentence has n words. make class implement comparable, , make comparator numberofoccurrences first, , (arbitrarily,) alphabetical (natural) ordering of word second.

(b) iterate through sentence .split(‘ ‘) (or in place iteration save space). create new wordoccurrence object each unique word, updating occurrence , placing wordoccurrence objects treemap<wordoccurrence>.

(c) create new wordoccurrence object each unique word, placing wordoccurrence objects treemap<wordoccurrence> (and updating words' occurrences along way).

(d) call highestkey() on treemap, , place returned word resultant list. call lowerkey returned word (k - 1) times, placing words same resultant list.

(e) return resultant list.

my question: runtime of this? understanding:

step (b) takes o(n) time.

step (c) takes o(n*log(n)), each of n words, insertion o(log n).

step (d) takes o(k*log(n)), each call of highestkey() or lowerkey() takes o(log n) time.

so total runtime be: o(n + n*log(n) + k*log(n), o(n*log n).

is there tighter bound algorithm, or way o(n)?

thanks.

yours poor approach several reasons:

swaddling in object going make constant factor here unreasonably high. you're creating @ least 2 objects per word in string; given average english word 6 letters, can figure out average blowup.
you can use treemap<string, integer> instead of using wordoccurrence thingamajig. still horrible first reason, reduces amount of code have write.
your reasoning not take consideration fact keys strings , hence variable-length. not think affects anything, mark down if teaching assistant.
for data structure, use hash table instead. can use hashmap in java instead , better. might @ trove or hashmap values ints instead of integers.

also, there data structure called suffix tree. can build suffix tree on string in linear time , explore figure out distinct words , count of each. can linear-time selection find top k. loses hash table in practise, in theory avoids constant-time-lookup hash tables.

Search This Blog

My

java - Runtime of this implementation to find k most frequently recurring words in a string -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

c# - How do I get the Nth largest element from a list with duplicates, using LINQ? -

jsf - How to ajax update an item in the footer of a PrimeFaces dataTable? -