SVMTool 1.3
A generator of sequential taggers based on Support Vector Machines (SVM)


Part-of-speech Tagging Demo

Input Text
        Tagging Options

Language       Strategy

Direction     Weight filtering


Please, type the text to analyze in the text area above."; } else { # sentence is not empty if (strlen($frase)> $MAXLENGTH) { #sentence is too long print "Input text exceeds the maximum length allowed in this demo ($MAXLENGTH characters)"; } else { print "
Tagging Results

"; if ($llengua == "en") { $tokenizer = "/home/operador/public_html/svmtool/demo/TOK/tokenizer.pl"; $model = "/home/operador/public_html/svmtool/demo/SVMTool/eng/WSJTP"; } else if ($llengua == "es") { $tokenizer = "/home/operador/public_html/svmtool/demo/TOK/spa/tokespa.perl"; $model = "/home/operador/public_html/svmtool/demo/SVMTool/spa/3LB.SPA"; } else if ($llengua == "ca") { $tokenizer = "/home/operador/public_html/svmtool/demo/TOK/cat/tokecat.perl"; $model = "/home/operador/public_html/svmtool/demo/SVMTool/cat/3LB.CAT"; } $input = preg_replace("/\"/", "\\\"", $frase); $command = "echo \"".$input."\""." | ".$tokenizer." | /home/operador/public_html/svmtool/demo/SVMTool/SVMTagger -S ".$direction." -V 0 -T ".$strategy." -K ".$weightf." -U ".$weightf." ".$model." | /usr/bin/gawk '{print \"\" \$1 \" \" \$2 \"\"; }'"; // print "$command\n"; system($command); } } ?>

Tagset description:   Penn Treebank Tagset    Parole Reduced Tagset


The SVMTool is a very simple and effective generator of sequential taggers based on Support Vector Machines. It has been successfully applied to a number of NLP problems, such as Part-of-speech Tagging and Base Phrase Chunking, for different languages.

By means of a rigorous experimental evaluation, we conclude that the proposed SVM-based tagger is robust and flexible for feature modelling (including lexicalization), trains efficiently with almost no parameters to tune, and is able to tag thousands of words per second, which makes it really practical for real NLP applications.

Regarding evaluation, the SVM-based tagger significantly outperforms the TnT PoS tagger exactly under the same conditions, and achieves a very competitive accuracy of 97.2% on the WSJ corpus, which is comparable to the best PoS taggers reported up to date.

NOTES:


 
back to SVMTool website ...