java - Regular Expression to preserve quotes , single quotes , hyphens and split at white space -
I use the Java pattern class to specify regex as a string.
For example, I should spider-man: "Peter Parker"
list Spider-Man and "Peter Parker" as a separate token. Thanks
Try {Buffford Reader br = New buffed reader (New Flamerdder (F)); Stringbilder sb = new stringbiller (); String line = br.readLine (); While (line! = Null) {sb.append (line); Line = br.readLine (); } String everything = sb.toString (); & Lt; String & gt; Results = New Arreelist & lt; String & gt; (); Pattern pat = Pattern.compile ("([\" ']. *? [\ "'] | [^] +)"); PatternTokener pt = New PatternTonager (new stringreader (everything), Pat, 0); While (pt.incrementToken ()) {result.add (pt.getAttribute (CharTermAttribute.class) .toString ()); }} Hold (exception e) {new runtime exception (e); } So I think why "some words" is not working, because each token is a string, is any signal? This is what you want:
"([ \ "']. *? [\"'] | (? & Lt; = [:] | ^) [a-zA-Z0-9 -] + (? = [:] | $)) "
I think you do not have (single / double) quotes (single / double) inside the quote.
There is also a perception about the delimiter: nothing to match the space and : as a delimiter will not be matched in "foo_bar" . If you want to add more delimiters, such as ; , . , , , ? , both of which appear in the character class to add it and claim, such as: "[[\" '] | (? & Lt; = [:;;,?] | ^) [A-zA-Z0-9 -] + (? = [:;,,?] | $)) " Not tested on every input yet, but I have tested this input on:
"sdfsdf \" SDFs sdfsdfs \ "\" sdfsdf \ "sdfsdf sdfsd dsfshj SDFsdfsdf 'SDFsdfsdf sd f' "// is used to check replace all the group .replaceAll (" ([[""]. *] [\ "'] | (? & Lt; = [:] | ^) [A-zA-Z0-9 -] + (? = [:] | $)), "X $ 1Y") And it works fine for me .
If you want a more generous capturing, but still about the notion Cited with:
"([\" '] * [\.?' '| | [^] +) " For: Matcher m = Pattern.compile (regex) .matcher (inputString); string & gt; token = new arrelist & lt; string & gt; (); While (m.find ()) {tokens.add (m.group (1));}
Comments
Post a Comment