regex - UIMA RUTA : regular expression in WORDLIST -
is there way have regular expressions in wordlist? need implement same mentioned in https://issues.apache.org/jira/browse/uima-3382.
or there alternate way resolve it?
edit : wordlist defined list of text items. if have list of regular expressions want mark same type. there way it?
for e.g. - want find date in document, there number of format date, regular expressions more concise way cover possible cases. trying use syntax below, matches cases there single word without special regex syntax.
declare date; wordlist dateformatlist='dateformat.regex'; document{-> markfast(date, dateformat, true,1)};
what can change in rules items in dateformatlist treated regular expressions?
thanks
regular expressions in wordlists not supported in near future, if not volunteer implements it. problem wordlists use trie , not fst lookup process, makes desired functionality not straightforward implement.
it possible simulate desired functionality wordlists in rare situations, e.g., optional sequences.
if want detect dates, acutally recommend use normal rules in uima ruta. it's easier combine , exploit stuff. common example simple rule this:
any{inlist(monthslist) -> mark(month), mark(date,1,3)} period? num{regexp(".{2,4}") -> mark(year)};
if want stick regular expressions, can use list of simple regexp rules:
"regexp1" -> date; "regexp2" -> date; "regexp3" -> date;
these rules support feature assignments , capturing groups. difference functionality want use consists in syntax (several rules instead of simple list) , in performance (the regular expressions applied sequentially).
(i developer of uima ruta)
Comments
Post a Comment