SVMTool 1.3
A generator of sequential taggers based on Support Vector Machines (SVM)
$frase=$_POST['frase'];
$llengua=$_POST['llengua'];
$strategy=$_POST['strategy'];
$direction=$_POST['direction'];
$weightf=$_POST['weightf'];
if (empty($frase)) {
print "Please, type the text to analyze in the text area above.";
}
else { # sentence is not empty
if (strlen($frase)> $MAXLENGTH) { #sentence is too long
print "Input text exceeds the maximum length allowed in this demo ($MAXLENGTH characters)";
}
else {
print "
Tagging Results
";
if ($llengua == "en") {
$tokenizer = "/home/operador/public_html/svmtool/demo/TOK/tokenizer.pl";
$model = "/home/operador/public_html/svmtool/demo/SVMTool/eng/WSJTP";
}
else if ($llengua == "es") {
$tokenizer = "/home/operador/public_html/svmtool/demo/TOK/spa/tokespa.perl";
$model = "/home/operador/public_html/svmtool/demo/SVMTool/spa/3LB.SPA";
}
else if ($llengua == "ca") {
$tokenizer = "/home/operador/public_html/svmtool/demo/TOK/cat/tokecat.perl";
$model = "/home/operador/public_html/svmtool/demo/SVMTool/cat/3LB.CAT";
}
$input = preg_replace("/\"/", "\\\"", $frase);
$command = "echo \"".$input."\""." | ".$tokenizer." | /home/operador/public_html/svmtool/demo/SVMTool/SVMTagger -S ".$direction." -V 0 -T ".$strategy." -K ".$weightf." -U ".$weightf." ".$model." | /usr/bin/gawk '{print \"\" \$1 \" \" \$2 \"\"; }'";
// print "$command\n";
system($command);
}
}
?>
The SVMTool is a very simple and effective generator of sequential taggers
based on Support Vector Machines. It has been successfully applied to a number of NLP
problems, such as Part-of-speech Tagging and Base Phrase Chunking,
for different languages.
By means of a rigorous experimental evaluation, we conclude that the
proposed SVM-based tagger is robust and flexible for feature modelling
(including lexicalization), trains efficiently with almost no parameters
to tune, and is able to tag thousands of words per second, which makes it
really practical for real NLP applications.
Regarding evaluation, the SVM-based tagger significantly outperforms the
TnT PoS tagger exactly under the same conditions, and achieves a very competitive
accuracy of 97.2% on the WSJ corpus, which is comparable to the best
PoS taggers reported up to date.
NOTES:
- This version is running slow because of the initialization time.
- Discussion on features and bugs of this software as well as information about oncoming updates takes place on the SVMTool group, to which you can subscribe at the SVMTool Group and post messages at .