2022-07-11

Why next word prediction application ?

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking. But typing on mobile devices can be a serious pain. which made the incentive to develop an algorithm to predict the next word based on the precedent words and make the typing fast and easy for users .

How does it work ?

The user types a string of words and then the application predicts the next word based on the five precedent words following the Stupid backoff algorithm [1] which takes four words at first then try to find the fifth matching word if there is no match it takes the three precedent words then try to find the forth matching word and so on until it finds seven matching words, if there is no match the app returns the top seven frequent words in the corpus.

[1]“Large language models in machine translation” by T. Brants et al, in EMNLP/CoNLL 2007 http://www.aclweb.org/anthology/D07-1090.pdf

Data

The data is collected from a large database of textual data from three sources blogs,news and tweets in four languages English, German, Russian and Finnish. The English database was chosen for training the predictive algorithm.

The English Corpus files.
File_name Size_mb Lines
en_US.blogs.txt 200.4242 899288
en_US.news.txt 196.2775 77259
en_US.twitter.txt 159.3641 2360148

Data product

The application is hosted on shinyapps.io, The user writes a string of words then the app plots a wordcloud of the expected words, Word’s scale is significant, large-sized words are more accurate ,The colors of the words are random. shiny app link : Next word prediction application