About: In order to combat the COVID-19 pandemic, society can benefit from various natural language processing applications, such as dialog medical diagnosis systems and information retrieval engines calibrated specifically for COVID-19. These applications rely on the ability to measure semantic textual similarity (STS), making STS a fundamental task that can benefit several downstream applications. However, existing STS datasets and models fail to translate their performance to a domain-specific environment such as COVID-19. To overcome this gap, we introduce CORD19STS dataset which includes 13,710 annotated sentence pairs collected from COVID-19 open research dataset (CORD-19) challenge. To be specific, we generated one million sentence pairs using different sampling strategies. We then used a finetuned BERT-like language model, which we call Sen-SCI-CORD19-BERT, to calculate the similarity scores between sentence pairs to provide a balanced dataset with respect to the different semantic similarity levels, which gives us a total of 32K sentence pairs. Each sentence pair was annotated by five Amazon Mechanical Turk (AMT) crowd workers, where the labels represent different semantic similarity levels between the sentence pairs (i.e. related, somewhat-related, and not-related). After employing a rigorous qualification tasks to verify collected annotations, our final CORD19STS dataset includes 13,710 sentence pairs.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: In order to combat the COVID-19 pandemic, society can benefit from various natural language processing applications, such as dialog medical diagnosis systems and information retrieval engines calibrated specifically for COVID-19. These applications rely on the ability to measure semantic textual similarity (STS), making STS a fundamental task that can benefit several downstream applications. However, existing STS datasets and models fail to translate their performance to a domain-specific environment such as COVID-19. To overcome this gap, we introduce CORD19STS dataset which includes 13,710 annotated sentence pairs collected from COVID-19 open research dataset (CORD-19) challenge. To be specific, we generated one million sentence pairs using different sampling strategies. We then used a finetuned BERT-like language model, which we call Sen-SCI-CORD19-BERT, to calculate the similarity scores between sentence pairs to provide a balanced dataset with respect to the different semantic similarity levels, which gives us a total of 32K sentence pairs. Each sentence pair was annotated by five Amazon Mechanical Turk (AMT) crowd workers, where the labels represent different semantic similarity levels between the sentence pairs (i.e. related, somewhat-related, and not-related). After employing a rigorous qualification tasks to verify collected annotations, our final CORD19STS dataset includes 13,710 sentence pairs. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	In order to combat the COVID-19 pandemic, society can benefit from various natural language processing applications, such as dialog medical diagnosis systems and information retrieval engines calibrated specifically for COVID-19. These applications rely on the ability to measure semantic textual similarity (STS), making STS a fundamental task that can benefit several downstream applications. However, existing STS datasets and models fail to translate their performance to a domain-specific environment such as COVID-19. To overcome this gap, we introduce CORD19STS dataset which includes 13,710 annotated sentence pairs collected from COVID-19 open research dataset (CORD-19) challenge. To be specific, we generated one million sentence pairs using different sampling strategies. We then used a finetuned BERT-like language model, which we call Sen-SCI-CORD19-BERT, to calculate the similarity scores between sentence pairs to provide a balanced dataset with respect to the different semantic similarity levels, which gives us a total of 32K sentence pairs. Each sentence pair was annotated by five Amazon Mechanical Turk (AMT) crowd workers, where the labels represent different semantic similarity levels between the sentence pairs (i.e. related, somewhat-related, and not-related). After employing a rigorous qualification tasks to verify collected annotations, our final CORD19STS dataset includes 13,710 sentence pairs.
subject	Science in society Language modeling 2019 disasters in China
part of	CORD19STS: COVID-19 Semantic Textual Similarity Dataset
is abstract of	CORD19STS: COVID-19 Semantic Textual Similarity Dataset
is hasSource of	covid:ann/target/19126e336466385d74761c765c5e1f363a47fd25 covid:ann/target/0b2c876c7a22bc65eb21878d2398c8f86edd61a7 covid:ann/target/3074d78115c1ecb107ad282dbef1a37a323c7096 covid:ann/target/f452ed8cc27149d1d225c51f2aa9ee9752b413ff covid:ann/target/9c132e9f6cb04cfd12161c8d49b9cb0850132fb2 covid:ann/target/188942d6d6d376d0f6138f34e5d13cb8b56f424b covid:ann/target/442fcd33993f6251738d4be9bd2b572b5138b7c9 covid:ann/target/8c1e44d554c2c8e80dacfab369c6a7f3fe53a04c covid:ann/target/9c4b2bd2913f3292f44c2dae23e8ac326b8246b6 covid:ann/target/a0f62d840dc718b322fd387da60f59af6e980e98 covid:ann/target/9b11a6e9b427bc197abd5a5bb90ca6c3df68a188 covid:ann/target/d1225ed2596fc6b620bd19ccda99b93968af178d covid:ann/target/ba7f3ce04c01571ed956f910b552c521c5cc42f1 covid:ann/target/4897c80b2573ef889a8bed3a492914f1763796f7 covid:ann/target/99b831a2f26bd6be724bbbd59dd0f627f169437f covid:ann/target/e1c3e17e5285c3105c835be06b83ca90d3a8eb25 covid:ann/target/aac5200ae020b62717f860e295825f1d27c14848 covid:ann/target/11b470da1e03384c9001ac55082873d108eb73ae covid:ann/target/4318ad4fa82eaa59efb7fb217194f22dccdbbb31 covid:ann/target/96d996f84b24ad60bb88c1422f3332974a3cfad5 covid:ann/target/9dc5908d744517549823b1ec9929ed1a4a7050fc covid:ann/target/b746dd3fe94190c8961d389b01d97f3259797933 covid:ann/target/c40309a10017c7f4f307b7a7b5d684fcfe087c25 covid:ann/target/012f551ce1ba95c7cb442114125c0be548f14c52 covid:ann/target/358b9b8a2161dd887c586746dcccdcc6e981e7cf covid:ann/target/3de710c5e1c79e7db28fb590095dd4a4c7039fdf covid:ann/target/98d6765dc5775db1b045ee5d5be823ade7f73b65 covid:ann/target/99f1de444ef7f1c70adf0297be919b9b53239de8 covid:ann/target/dca1d8beaa29cde7541d5bcd1911eda6a1216629 covid:ann/target/7cb9be4f1abc816ab585d872fb6aa31d91474d10 covid:ann/target/8aa2cf6c413ac6da8205ac970431ca8742980947 covid:ann/target/ae11e8624bac6ba54fd52a4d31e605cab4b841db covid:ann/target/af62348a0436d2998c533dfa0ad26e7918d0e591 covid:ann/target/a43f79ee6e63040e481bcf2f5144259628a8bcaf covid:ann/target/bdc2972d35b957f684c6670b2f25a36cecb774e3 covid:ann/target/0fcba307f9273543dda18031e5d9a7a883bd6019 covid:ann/target/d8685dc3f90aab26f3e3dbe7d47c9b5ec00bd81d covid:ann/target/0022e96ef165e93a0074c652fcbd4085e3e83a34 covid:ann/target/2a84eaf245d27c7ff35d7cbb30bd830884d45c8a covid:ann/target/320d42dae386a9f84328c005b630b6e4ee054b6b covid:ann/target/8fb3bb760fa38d83ce59094790b010b45cbf7d55 covid:ann/target/a9f9f036d02b6bc48a850f9b271ab11475272266 covid:ann/target/ca291505c5ad35b359119df3021372415aab9cd5 covid:ann/target/2f6ee9653e10cd996bef537c9d8be93105f24ca1 covid:ann/target/9abe9bcaf98f41593c65b0e285c6758c7acfc643 covid:ann/target/b56da9d7ba6602538ae5378e6ca9f2076718dadc covid:ann/target/f9f4f7cdc272a92aca010d6d6fa10e8068d469e7 covid:ann/target/0211e788501089323a4cfc0571d031323fe0e146 covid:ann/target/0fc6c224540c670e323a28b68b1f24e890b5a337 covid:ann/target/2311d2d42d62adc6c3f063fc1c32ed922c823d07 covid:ann/target/4bc9747595c4ea1af514c12f109ba8bdff2c8028 covid:ann/target/13bfe18decbfc2c50957931f7b807bde8107c6d6 covid:ann/target/88cc75515055634ab251e194453eb19e2a267f52 covid:ann/target/b0d0b3faae018b5cde5c028f1094e16b2c30fbd5 covid:ann/target/8a7e8aec9338c6bb8970fc403453c737e2314f0c covid:ann/target/06d9d1d346ad9f14c223e0abf36933f6f3fad9e1 covid:ann/target/2abdb4fab03d30ec5b419e92e56e69b1d51c19df covid:ann/target/3ded0d4e606b5ef7d9b98eb42836acced9dcece2 covid:ann/target/6ed7ae9ed069443fba62839dfda14aedd301755d covid:ann/target/a46f9444b188c8a67e72d68e0c68ae89aef355b3 covid:ann/target/a4dd45f2a66687475a747381a5a7104beb8bb7cc covid:ann/target/c0d2e06d06ff13c15302a44e01816eb4320cfc19 covid:ann/target/f93f51d45bdd86e13bbdd3fb3a640cfeae189c67 covid:ann/target/24bce7221efec48cf216505ed68d1de2da9c33f7 covid:ann/target/9871db924be0899e50622e0f480f3d2ce4932f65 covid:ann/target/b81cee343b9c3a1fec16e90fdb98904274a4bad9 covid:ann/target/b9492b1f3bea8dc82435986869166c7ee8be86fb covid:ann/target/ed4c2be6ec9c7a909879d55e72fcc5870998c321 covid:ann/target/1de2f8c5c02924e74d1b45e97d359da587c957f4 covid:ann/target/42735816a02afa552e662cf8e8ebf8037777ae48 covid:ann/target/70d9aa7426b18a499d1070e5f5a1ad45a0e258b1 covid:ann/target/8f910d07a901a4e53ee89ac80179e203ebc7f5ff covid:ann/target/8a6ed30e184a41ab4e8ab34caf5b7351c1f1b1e3 covid:ann/target/c5a70848bd7cee81e48b7ab800790931cbf128d1 covid:ann/target/2c36cc8d6603c91b9e200fccaf369f2b0fcf8d25 covid:ann/target/507749e7b4cc9922137871476514492796b3dc83 covid:ann/target/5da9c594a5284697cebb4ff0c5ce7aa7af859174 covid:ann/target/15a52aa8cef64dbea686a29e7f961ae18211eb02 covid:ann/target/4d2c12e22f9617ae54972d9f288c18cd3406e08e covid:ann/target/a4be24a8d9f95c50048e5c48b0a6bde049ffb455 covid:ann/target/bb83f3ee43c542d4c351d1571c3e06453ab75701 covid:ann/target/ecf3e25158bd482d17db279422221cf97d4c9628 covid:ann/target/7cc5ae6b76d0163d4410dba651b89ddf87a907f4 covid:ann/target/8b29d2f14b535229d38e56aa7bf2896070cab2c4 covid:ann/target/baf09b203cdd302345a4e6bd504f8179eff53635 covid:ann/target/65229baa84c20ecad84004927638608ad8f65c3c covid:ann/target/703fda811d1cbc16d606b7caf596ebb8b53eaab4 covid:ann/target/723a9e0dc8879e1bb4de3ea37934bbf541467226 covid:ann/target/16e0f93bbf388c9da0f813e02b3d94bd34009edc covid:ann/target/474f43bef458a80c068519f8b248396f729000d4 covid:ann/target/68e7d2baddc06a915c2e9f9ef7bedc621daeedc2 covid:ann/target/744abd6e6409b2fe1c90eb5c15765387bbffe67f covid:ann/target/b2a5d33cb17cd33fb6e44f32e5cab989da3aea25

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software