About: A significant amount of the textual content available on the Web is stored in PDF files. These files are typically converted into plain text before they can be processed by information retrieval or text mining systems. Automatic conversion typically introduces various errors, especially if OCR is needed. In this empirical study, we simulate OCR errors and investigate the impact that misspelled words have on retrieval accuracy. In order to quantify such impact, errors were systematically inserted at varying rates in an initially clean IR collection. Our results showed that significant impacts are noticed starting at a 5% error rate. Furthermore, stemming has proven to make systems more robust to errors.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: A significant amount of the textual content available on the Web is stored in PDF files. These files are typically converted into plain text before they can be processed by information retrieval or text mining systems. Automatic conversion typically introduces various errors, especially if OCR is needed. In this empirical study, we simulate OCR errors and investigate the impact that misspelled words have on retrieval accuracy. In order to quantify such impact, errors were systematically inserted at varying rates in an initially clean IR collection. Our results showed that significant impacts are noticed starting at a 5% error rate. Furthermore, stemming has proven to make systems more robust to errors. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	A significant amount of the textual content available on the Web is stored in PDF files. These files are typically converted into plain text before they can be processed by information retrieval or text mining systems. Automatic conversion typically introduces various errors, especially if OCR is needed. In this empirical study, we simulate OCR errors and investigate the impact that misspelled words have on retrieval accuracy. In order to quantify such impact, errors were systematically inserted at varying rates in an initially clean IR collection. Our results showed that significant impacts are noticed starting at a 5% error rate. Furthermore, stemming has proven to make systems more robust to errors.
Subject	Computer-related introductions in 1993 Unicode Information retrieval Artificial intelligence applications Computational linguistics
part of	Assessing the Impact of OCR Errors in Information Retrieval
is abstract of	Assessing the Impact of OCR Errors in Information Retrieval
is hasSource of	covid:ann/target/4a0c72dbb7cbfe8b423b7cdd00a156d00428d34f covid:ann/target/12d0de9b2095f7f19d177bdd8b09d7d353bcde4f covid:ann/target/dc2bffabc891a478c9f902cdc5455c657ad50863 covid:ann/target/5e35536509d1a127cf830003d54514442f00f59d covid:ann/target/9100382affea93e2270870e8e196ba52f0ec4b55 covid:ann/target/a19ddb6262e1200975293b746286a467bf441bba covid:ann/target/eb756eeab613cfc8289716e5a6f47d48f82c475a covid:ann/target/42446da1fc73383b07f4d5069aafc2df7cf1bcde covid:ann/target/34479f547c9ff6f3199c09af87b139ee991d67ea covid:ann/target/a7702c7d95a9132743028ad00dab6a32b6636705 covid:ann/target/b23443bfe7d0f09aa4365d22caa1f58501c92735 covid:ann/target/a17d50d6ff3609bf9b3851ff0972dc691242a5b6 covid:ann/target/6b53da2d6a2ab3283528346b085588d4aecd916e covid:ann/target/ed2fa54e8f9e03e02d36def26db6225cf754ce94 covid:ann/target/1b352b5f4f34e64a042d43e2d367691b96a2e4aa covid:ann/target/5172863c5eb88e92f811afca78ef0cfd2081eb9f covid:ann/target/79b6df08b45c80c165e2c3d27108f662eb732914 covid:ann/target/6acf44a437a323df5d733e057f54efc90d3b0810 covid:ann/target/b90a1678585523b4351572373150f987de9960f4 covid:ann/target/913bd70b2711ec62465bf159bb4ebce62d62b878 covid:ann/target/8c6e207391d791801e83b1e98d6533cf7d790c55 covid:ann/target/4c79ee8c6acf90fe2b4e6ab5b852289b9e75a222 covid:ann/target/b3f6b8da2699b492195b24884580e22d7f2952f9 covid:ann/target/8edc77af630a88e234dfa8f618bca5a7b4e2ad80 covid:ann/target/b9f1309e6ed7189399b5b7fe725dbac74f88de00 covid:ann/target/0e976cff1615b4baab6d889cc48e0ce3ce4ab24c covid:ann/target/ef818486937290f0ff2975bf4053724d99a6946d covid:ann/target/473ddaeb6a0f127ff854882e60d13d06519644bb covid:ann/target/a2391138627a670fd7ab91fb9cd04ea45ad395f5 covid:ann/target/143925c092a6cdfb9cf16d9708bbd58c5f0bf88b covid:ann/target/53eb2a0b099d0e6c9660f79867c2ad15c440bb42 covid:ann/target/0ba745f11ffdc11ac2e801078980ed3a60cc4072 covid:ann/target/fface75b5df97c2349736097b17c7c8b2f121610 covid:ann/target/3229d3c3e8131249d5ba04cb8d8b5334a53ff734 covid:ann/target/b0e1dff4a06d96244d41acf11fc026b98acf5e76 covid:ann/target/c0d19a1187133b933f24f2a28531a198d263b4ac covid:ann/target/5324475275b21d30142fc201b3cb73deb82784c7 covid:ann/target/3672abf961a64387270e075808953df173f19abe covid:ann/target/09c808df8eddd0ff4c2cbfb7db8335bca446b46a covid:ann/target/18bedb707d8bdebf5e6336346cabe545db728a7e covid:ann/target/1e8d4f8de3a88aff9e02f3bc1ddcdea623dd6764 covid:ann/target/a23849ffa8e918f6017919e5718763df265d6cb0 covid:ann/target/25d6c8736581a8a92cd9a21d265d2f9a034af293 covid:ann/target/194e06bb07e12f0c676c5e1c0326af4a16d8c8ca covid:ann/target/2b27d7d2f737fef0759aa860185f9279870480b2 covid:ann/target/5516d42a49d0ac0dd4fc996eb3487ba95e9d3d41 covid:ann/target/834f1e055eadf9e4a56d2846064aa38a312bdb6f covid:ann/target/d97d6c914c0b802f72a9c77b950f55d40d5d22c8 covid:ann/target/fb79b3c30c66449a63e1567ce89348919f1b26cb covid:ann/target/2aed08c03c1d51a50829819f0324a170378f0dcb covid:ann/target/1db4d11d619f246a23ba980e9f7b5370e1086ad8 covid:ann/target/cad7880993dceb155fa71676086046ea20d8400b covid:ann/target/e9bfffdf094b374047900e1c80ac505ab1ccfc22 covid:ann/target/e439418ca2dfca30517e184e608ac19e18b29b34 covid:ann/target/27b13048e36fa30dc3d7d6f73ac28c5d71ca708a covid:ann/target/95c5712861cbfe9bf3bd784a650c56eea11b76d3 covid:ann/target/2f659acc43c35fbf4ea0640bda0a2917e915061d covid:ann/target/98a4173a67a17afa69561cce402392a1a452b3a4 covid:ann/target/fee1d166edd959aa7abc6ce74979279fbfb5d417 covid:ann/target/7d8376e8b0cf9e673acd39a645021b7e0ebcaae0 covid:ann/target/159c56e0ecb47279ffe553f251d78beac508d1a6 covid:ann/target/34d6328c834d4ea6b389375f24eb5dad792c599e covid:ann/target/50fffed1ae061c52e3bc0ec0fe4a665bfc6a1dd0 covid:ann/target/ecae4396f4ffe98831d4ee53bbca03413188a3d5 covid:ann/target/357975d6141d6f1a34df540842357012fb57d2a1 covid:ann/target/f2d745f08c0bf35571026c3325cfea7711b8f68f covid:ann/target/88569fff3efa7cdbf4d83bdbddedacdebbc238bd covid:ann/target/9771bdabeed4e0b2f9cee31fb8615e8a7276e06c covid:ann/target/9bf27107747e47a4f452c6b9316d9652002bd61f covid:ann/target/c89e9a5ebb8a60c746d608d9e0ac1bf60e0d1926 covid:ann/target/78e414c470417c22dbad14ebf80ee67399007dc5 covid:ann/target/c4406f968e7158fbf78f9a6e30343ce851438b52 covid:ann/target/ca2a182a7d8183c66b1c6cfebc2b5d75b58b775e covid:ann/target/cd66fb1b89b6cfc7cc2946d3174ba7e0c20bcaeb covid:ann/target/ce41dd7e4bf9decae9abf1a0c5f0383f1efedb65 covid:ann/target/f171bc04e76ea91a0153056560f99b7f73f536a7 covid:ann/target/3b602e7eaf01e4115c6afe206cfda16facb69b0c covid:ann/target/5bb098772e7cb18fcb756c050a4bbb53b3012385 covid:ann/target/dff5fe1f7ab8df3f9dc6c0455c9abd20e7bb11bc covid:ann/target/fd396bf434fcc52341485f0da49a7c1fdcb32795 covid:ann/target/c3d2bc9dbb69ec2a18f5b26e1f34b52397e45045 covid:ann/target/b988f105d6a7a7f71fd2bc96eb81b746ad90d31e covid:ann/target/d28cc89b64a9ee3cc5ec5d612f24e4a74ce81e56 covid:ann/target/f88be36841c6b1369cbae742f4f13db3e6cc4fc3 covid:ann/target/03907d4a44093b76c659cd3b93a856dfcc080433 covid:ann/target/0b460499a3ad1b524e551d9b63a49942200cb919 covid:ann/target/4c99cdcedb7b893c4f35227af17fa553161d6a3c covid:ann/target/4f68ca424aca7aebfb4f6353814db6bc9a621916 covid:ann/target/ebc55e80bb87964620f145cb72dbd9c73ce47a22 covid:ann/target/f5f61ae21ba03e31938ee162222ae25038918f08

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software