Araneum Hispanicum
Spanish Web Corpus
Crawled in September 2013 by
SpiderLing
0.72
No top-level domain restriction
Language similarity threshold set to 0.5
Tagged by
Tree Tagger
using the Spanish parameter file based on the
simplified CRATER Tagset
Native tagset mapped to Araneum Universal Tagset
Paragraph-level deduplicated by
Onion
, tokens in duplicate paragraphs marked
Versions available
Araneum Hispanicum Maius: 1,200,000,617 tokens, 892,069,964 unmarked words
Araneum Hispanicum Minus: 121,570,580 tokens, 103,801,505 unmarked words
Revision history
14.07 Initial publicly released version