Intrinsic Plagiarism Detection in Digital Data

Paper Topic :

Pattern Recognition

Author Name :

Netra Charya

Abstract :

Eventual growth in the field of research is leading to the publication of many research papers and articles over the World Wide Web. Chances of the data being repeated are high. This leads to plagiarism in the contents of research papers thus violating the authenticity of the achievements in that particular research field. Much progress is made into creating tools that determine data being plagiarized from web sources. But we present novel software that can determine plagiarized sections in digital data taken from sources, unavailable over the internet. The major idea behind this software is the analysis of the grammar usage and sentence constructions used by the author. The sentence are compared with each other to determine the deviation among them by using pq-gram distances computed between pairs of grammar trees formed for every pair of sentences in the submitted data snippet and performing mathematical calculations. Thus the possibly plagiarized sentences in digital data are determined. For thorough examination of the authenticity of digital data on the World Wide Web, the proposed system can be used as a complementary tool to the available online tools.

Download Article