Show simple item record

dc.contributor.advisorPu, Ken
dc.contributor.authorDeighton, Richard A.
dc.date.accessioned2013-01-10T15:19:07Z
dc.date.accessioned2022-03-29T17:30:12Z
dc.date.available2013-01-10T15:19:07Z
dc.date.available2022-03-29T17:30:12Z
dc.date.issued2012-12-01
dc.identifier.urihttps://hdl.handle.net/10155/295
dc.description.abstractThis thesis represents the results of a study into using fingerprints generated according to the Rabin-Karp Algorithm, and a database LevelDB to achieve Text Search times below GREP, which is a standard command-line UNIX text search tool. Text Search is a set of algorithms that find a string of characters called a Search Pattern in a much larger string of characters in a document we call a text file. The Rabin-Karp Algorithm iterates through a text file converting character strings into fingerprints at each location. A fingerprint numerically represents a window length string of characters to the left of its location. The algorithm compares the calculated fingerprint to the Search Pattern’s fingerprint. When fingerprints are not equal, we can guarantee the corresponding strings will not match. Whereas when fingerprints are, the strings probably match. A verification process confirms matches by checking respective characters. Our application emerges after making the following major changes to the Rabin-Karp Algorithm. First, we employ a two-step technique rather than one. During step 1, the preprocessing step, we calculate and store fingerprints in a LevelDB database called an Index Database. This is our first major change unique to us. Step 2, the matching step, is our second unique change. We use the Index Database to look-up the Search Pattern’s fingerprint and gather its set of locations. Finally, we allow the pattern to be any length relative to the window length. We even created an equation to check if the difference in length is too long for the fingerprint’s number system base. We facilitated our performance experiments by first building our application and testing it against GREP for a wide range of different parameters. Our conclusions and recommendations determine that although we currently only outperform GREP in about half the cases, we identify some promising opportunities to modify some parts of our application so that we can outperform GREP in all instances.en
dc.description.sponsorshipUniversity of Ontario Institute of Technologyen
dc.language.isoenen
dc.subjectRabin-Karp fingerprintsen
dc.subjectText searchen
dc.subjectLevelDBen
dc.titleUsing Rabin-Karp fingerprints and LevelDB for faster searchesen
dc.typeThesisen
dc.degree.levelMaster of Science (MSc)en
dc.degree.disciplineComputer Scienceen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record