About: This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc.
subject	Data mining Blogs Non-fiction genres
part of	Creating Classification Models from Textual Descriptions of Companies Using Crunchbase
is abstract of	Creating Classification Models from Textual Descriptions of Companies Using Crunchbase
is hasSource of	covid:ann/target/36a3f47e4ce409ba3b6a34905532e69eab8e9807 covid:ann/target/c02dcdae00be43486d1933a2b08ce177b337a4de covid:ann/target/0a1ed88e010d4ab4a90c328abcc9e373550b6049 covid:ann/target/e0be3ce9d1aabfdc17254f30ff5b56325cdcc750 covid:ann/target/a836ffa23880e7a00f6069f8be6e0b235bd3b320 covid:ann/target/8bcd6595d36a6bbc9f2b00802f181c0aa7e08a4c covid:ann/target/028f5ee3717b1f7d7996b4d1433db81a1cb2c758 covid:ann/target/1a0b9d37dc5fa3d922e91a8da2f22fd07e3e7398 covid:ann/target/4bfb8fe4aa92e450d4ce2d6cc78fa2e5a590e070 covid:ann/target/63bad4d41e4f61ee98d8b07c1fd92da0086edf16 covid:ann/target/c17595f5d917d22c1143302dc01ce17afe6510b8 covid:ann/target/d8d37bfcd277a514b6e2dba1fa40ad3e8f9190a7 covid:ann/target/0b90bfc88e1053b7dd8b6ce6deed70a7afae1890 covid:ann/target/6925511800bec3cf7292c2f8deda48b0355fb581 covid:ann/target/86e2aa9381f8457c68330b08c22b6deccefaa0e6 covid:ann/target/ac4cd1f15edb8e1a352fab90384c43f1612c0352 covid:ann/target/af3f8e908330f2afaf991f536c3e787dcba85b03 covid:ann/target/7808a17cd4cb98c3809c97e5b1f44dfe2269081e covid:ann/target/5f18df87f0e72efd203f0ad4359acdee1c09960c covid:ann/target/ae82002f1d7c75f9a2676d16891c9bc576de1ecb covid:ann/target/e7d497700fda69386a008cf8ef1aa98a8ae02455 covid:ann/target/9ecbf6c7e2396ddbf85d8bfb8638a067cc03f870 covid:ann/target/ea6fe06b486257b11b580c25d7f96e54437ebe18 covid:ann/target/40ceaa141008dfda6e21cb95587d577c04ac8a74 covid:ann/target/2b36f76ac31bd67ea0741975461ed060c48f15cf covid:ann/target/7177e1d978df4c9ef2f50d793badd0fa0eed4e55 covid:ann/target/66c12cf30c000c801c4d6c8665fe5a470ffbdb06 covid:ann/target/7599095f58f4639078a1e1c9fe2e787e22581857 covid:ann/target/7e237f101fe9faf78a04a6bf138210c243a3ce37 covid:ann/target/51b27c1a6aa10b16493d7474c102649ea27f53c0 covid:ann/target/e40c9e54c7d3fd529be8cfb6a8eaaed645280c80 covid:ann/target/884da730a36d72df323db6b49ce491a46623d559 covid:ann/target/ac132542ae27243ad9b7708bcbf5c2a8a2096b9e covid:ann/target/eec0ee8bf90e0eefc05111844b873b0cd0873c30 covid:ann/target/e2a2d4eadc8e529cbb89c051f14a564deb117c73 covid:ann/target/6de2f1fc472072b830ee71d008519ec1753774a5 covid:ann/target/c2f2669930aad7c7b00bf0d177ec8bc1601558d5 covid:ann/target/205494242b3532695969d43ac59dfba5c0b93289 covid:ann/target/30c428301afc7603d117b52ac5a8f4ba44365753 covid:ann/target/5101bcfcd46456cf419dc49a842f49771222e292 covid:ann/target/0d3b6aea9f629e7f54931147342f493bc669a087 covid:ann/target/258258c0b58f95702d955acbfa46081ea1a7ed69 covid:ann/target/d5e4e85aa6c5cb92605a1c6f6de174fd3beee2b1 covid:ann/target/01e3901048f2f3aa22689179b99daf55ec31da07 covid:ann/target/445c6359cda287e0983af45583ce61eb562896a4 covid:ann/target/b41ed2946f5fc0495226ce386dab5de6ae250ed0 covid:ann/target/0d6b54507cb7ef54e7efac912dc1fabb5e21aa3d covid:ann/target/4160ebd511a15e9a6d7098a09d3f847f53e129bf covid:ann/target/3d86b610d41a1f37cbc67d0190f151a8d7c3f544 covid:ann/target/2d3de7fd01cfd93331abda4b5826475de82c267e covid:ann/target/da2b007107e155ebedc4d1def94056944f292810 covid:ann/target/41dd494eceed5088e102b9fdbc04d4ef64a7865e covid:ann/target/e58a1466082054be647ea207819c061b1a98365d covid:ann/target/58cf4cf6361de6e93a1ec1cff1d60a676b90bc17 covid:ann/target/ae3c3246163aa8599aa68cec0a5868c28ce6eab2 covid:ann/target/603534afd9d98a28e02ad08ac618a63002fc6448 covid:ann/target/318f7ff56e63789da246a604ba53143f8f7f610f covid:ann/target/517680f07245d4604b41a44f5081b4df6bfaac20 covid:ann/target/7f7e10467716c4fc0a3608039c5cebfd5d723975 covid:ann/target/9f2bf6f1fc8518e810ea7e3f9f18129c561047b6 covid:ann/target/a7af2612fa66bdcb7ff1bc9b1bb7ef5341f0f171 covid:ann/target/00e3d55bb6fbb345d612f025c8632fe9387c8d41 covid:ann/target/d0ca00d58aefe354167eb7f7c47eb2bbb0f6129d covid:ann/target/e5ee360f7623af9af962891798588e18d84eecab covid:ann/target/4891fe16dca61a833b138f78ead5a45d152d07d7 covid:ann/target/2dc428a29277cc6bd44af4dbe375785d9b1840c7 covid:ann/target/18f63f7f041fc61c14c371732803113570e8c6eb covid:ann/target/89c09d1d06ab533febef57e681b629ab76029ad5 covid:ann/target/64be5b748fe43619504f101f7ccb5197fb075b56 covid:ann/target/10fb5f3ade04142768bbd6284b9b8ba5a28fdce5 covid:ann/target/5a3fa6bb45356516b823516c716053a45ceb99ba covid:ann/target/c9dd8e9161a1b04feeb414b87c649f58a9965bad covid:ann/target/9282495c7238a5838168c7bb9bdd554f054fecea covid:ann/target/b0b77c8e34e55bd7c112e8e473f555239d6bafd6 covid:ann/target/bd7de9806944241e387ad6c5ad65014f823418c0 covid:ann/target/c1a4ff1dc6872f13d0fc9bd5d3b0ecfcef392f6a covid:ann/target/fb38fde33e7a1ea73cf6b15efa4e544ab180ee26 covid:ann/target/2a7caf13a6fff8d7383562308be5fa5844dbec28 covid:ann/target/2b9611a3399820f9e60519ea3c5e6c60ab621d5f covid:ann/target/5b3d1145c1a57087a32bf1d92adc596ad18efbde covid:ann/target/7e2108a935e11890a3441f778896984f2da2d888 covid:ann/target/86625b847e9299a3b76ac6eab82cbc6766977de0 covid:ann/target/da03cf053c2a50f5ac6133773b8eaf2b513b1c8d covid:ann/target/034529f4d29ee8d77110f634ff4e252cce239e47 covid:ann/target/265bd52c49316c626a66fabddde81eaeed91e537 covid:ann/target/36b92c89e8a1c2080547558cb518c14fa7558712 covid:ann/target/39bc70ff5cb0ffeb1f2c8937cb97ed996cef7c95 covid:ann/target/684725262b00d6ca6af8d4f336118e82e0bf6bd5 covid:ann/target/751b666e420796567010e22ab4eef85969206aaf covid:ann/target/9a97e44b0ca3dec57a6caa6193ac85abeb995baa covid:ann/target/a28dcb95d8341b43a77436aaf63a50df49279f7d covid:ann/target/a39d3447eda2b15686bfbe6b1008d552e4c5a809 covid:ann/target/8374c04dde04544c1cdf5d1bce128eb548aa8d0c

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software