Part of the Sus Filter Tools: Scans a text for keywords and outputs a sus file.
text2sus [<option>] [<label>|<label>=<regexp>|<regexp>] ...
'-')
'-')
--next=\n
~/.labrc and ./labrc
@FILE or @ FILE (some) command-line options are read
from FILE (see section
).
There are three possibilities how to define a label:
"label",
"label=label",
"label:label[:= \t]*(\w+)", and
"label[:= \t]*(?P<label>w+)"
SPMquotlabel[:= ]*(?P<label>[+-]?+?([eE][+-]?+)?|+)"
A regular expression (or REGEXP) specifies a set of strings that matches it.
Regular expressions can be concatenated to form new regular
expressions; if A and B are both regular expressions,
then AB is
also an regular expression. If a string p matches A and another
string q matches B, the string *pq* will match AB.
A brief explanation of a part the format of regular expressions follows. For further information and a gentler presentation, consult the Regular Expression HOWTO, accessible from http://www.python.org/doc/howto/.
Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like `A', `a', or `0', are the
simplest regular expressions; they simply match themselves. You can
concatenate ordinary characters, so "last" matches the string
`last'. (In the rest of this section, we'll write REGEXP's in
"this special style", usually without quotes, and strings to be
matched `in single quotes'.)
Some characters, like `|' or `(', are special. Special
characters either stand for classes of ordinary characters, or affect how the
regular expressions around them are interpreted.
The special characters are:
^'
(Caret.) Matches the start of the string and
immediately after each newline.
"foo" matches both 'foo' and 'foobar',
while the regular expression "foo$" matches only
'foo'.
"ab*" will
match 'a', 'ab', or 'a' followed by any number of 'b's.
"ab+" will match 'a' followed by any non-zero
number of 'b's; it will not match just 'a'.
"ab?" will match either 'a' or 'ab'.
`*', `+', and `?' qualifiers are all "greedy"; they match as
much text as possible. Sometimes this behaviour isn't desired; if
the REGEXP "<.*>" is matched against `<H1>title</H1>', it will match
the entire string, and not just `<H1>'. Adding `?' after the
qualifier makes it perform the match in "non-greedy" or "minimal"
fashion; as few characters as possible will be matched. Using
".*?" in the previous expression will match only `<H1>'.
\'
Either escapes special characters (permitting you to match
characters like `*', `?', and so forth), or signals a special
sequence; special sequences are discussed below.
If you are not using a command-line file (see section
), remember
that most shells also use the backslash as an escape sequence in
the command line; therefore you have to put the regular expression
into '-quotes to prevent an interpretation by the shell.
`-'. Special
characters are not active inside sets. For example, "[!akm]" will
match any of the characters `a', `k', `m', or `!'; "[a-z]" will
match any lowercase letter, and `[a-zA-Z0-9]' matches any letter
or digit. Character classes such as `\w' or `\S' (defined below)
are also acceptable inside a range. If you want to include a `]'
or a `-' inside a set, precede it with a backslash, or place it as
the first character. The pattern "[]]" will match `]', for
example.
You can match the characters not within a range by "complementing"
the set. This is indicated by including a `^' as the first
character of the set; `^' elsewhere will simply match the `^'
character. For example, "[^5]" will match any character except
`5'.
`A|B', where A and B can be arbitrary REGEXPs, creates a regular
expression that will match either A or B. This can be used inside
groups (see below) as well. To match a literal `|', use "\|", or
enclose it inside a character class, as in "[|]".
`(' or `)', use "\(" or
"\)", or enclose them inside a character class: "[(] [)]".
`?' following a `(' is not
meaningful otherwise). The first character after the `?'
determines what the meaning and further syntax of the construct is.
Extensions usually do not create a new group; "(?P<NAME>...)" is
the only exception to this rule. Following are the some of the currently
supported extensions.
"..." matches next, but doesn't consume any of the
string. This is called a lookahead assertion. For example,
"Isaac (?=Asimov)" will match `Isaac ' only if it's followed by
`Asimov'.
"..." doesn't match next. This is a negative lookahead
assertion. For example, "Isaac (?!Asimov)" will match
`Isaac '
only if it's not followed by `Asimov'.
The special sequences consist of `\' and a character from the list
below. If the ordinary character is not on the list, then the
resulting REGEXP will match the second character. For example, "\$"
matches the character `$'.
\A'
Matches only at the start of the string.
\b'
Matches the empty string, but only at the beginning or end of a
word. A word is defined as a sequence of alphanumeric characters,
so the end of a word is indicated by whitespace or a
non-alphanumeric character.
\B'
Matches the empty string, but only when it is not at the
beginning or end of a word.
\d'
Matches any decimal digit; this is equivalent to the set "[0-9]".
\D'
Matches any non-digit character; this is equivalent to the set
"[^0-9]".
\s'
Matches any whitespace character; this is equivalent to the set
"[ \t\n\r\f\v]".
\S'
Matches any non-whitespace character; equivalent to the
set "[^ \t\n\r\f\v]".
\w'
This is equivalent to the set "[a-zA-Z0-9_]", the alphanumeric characters.
\W'
This is equivalent to the set "[^a-zA-Z0-9_]", the
non-alphanumeric characters.
\Z'
Matches only at the end of the string.
\\'
Matches a literal backslash.
A sus file containing data from the text.