xnedit regex - infoBAG

  <prompt>
  <input_data>
    <source_text>
      <![CDATA[[[
~~~source_text
placeholder
~~~
      ]]]]>
    </source_text>
    <target_text>
      <![CDATA[[[
~~~target_text
placeholder
~~~
      ]]]]>
    </target_text>
    <xnedit_documentation_text>
      <![CDATA[[[
```xnedit_documentation_text
Regular Expressions
===================

Basic Regular Expression Syntax
-------------------------------

  Regular expressions (regex's) are useful as a way to match inexact sequences
  of characters.  They can be used in the `Find...' and `Replace...' search
  dialogs and are at the core of Color Syntax Highlighting patterns.  To specify
  a regular expression in a search dialog, simply click on the `Regular
  Expression' radio button in the dialog.

  A regex is a specification of a pattern to be matched in the searched text.
  This pattern consists of a sequence of tokens, each being able to match a
  single character or a sequence of characters in the text, or assert that a
  specific position within the text has been reached (the latter is called an
  anchor.)  Tokens (also called atoms) can be modified by adding one of a number
  of special quantifier tokens immediately after the token.  A quantifier token
  specifies how many times the previous token must be matched (see below.)

  Tokens can be grouped together using one of a number of grouping constructs,
  the most common being plain parentheses.  Tokens that are grouped in this way
  are also collectively considered to be a regex atom, since this new larger
  atom may also be modified by a quantifier.

  A regex can also be organized into a list of alternatives by separating each
  alternative with pipe characters, `|'.  This is called alternation.  A match
  will be attempted for each alternative listed, in the order specified, until a
  match results or the list of alternatives is exhausted (see Alternation_
  section below.)

3>The 'Any' Character

  If a dot (`.') appears in a regex, it means to match any character exactly
  once.  By default, dot will not match a newline character, but this behavior
  can be changed (see help topic Parenthetical_Constructs_, under the
  heading, Matching Newlines).

3>Character Classes

  A character class, or range, matches exactly one character of text, but the
  candidates for matching are limited to those specified by the class.  Classes
  come in two flavors as described below:

     [...]   Regular class, match only characters listed.
     [^...]  Negated class, match only characters ~not~ listed.

  As with the dot token, by default negated character classes do not match
  newline, but can be made to do so.

  The characters that are considered special within a class specification are
  different than the rest of regex syntax as follows. If the first character in
  a class is the `]' character (second character if the first character is `^')
  it is a literal character and part of the class character set.  This also
  applies if the first or last character is `-'.  Outside of these rules, two
  characters separated by `-' form a character range which includes all the
  characters between the two characters as well.  For example, `[^f-j]' is the
  same as `[^fghij]' and means to match any character that is not `f', `g',
  `h', `i', or `j'.

3>Anchors

  Anchors are assertions that you are at a very specific position within the
  search text.  XNEdit regular expressions support the following anchor tokens:

     ^    Beginning of line
     $    End of line
     <    Left word boundary
     >    Right word boundary
     \B   Not a word boundary

  Note that the \B token ensures that neither the left nor the right character
  are delimiters, **or** that both left and right characters are delimiters.
  The left word anchor checks whether the previous character is a delimiter and
  the next character is not. The right word anchor works in a similar way.

  Note that word delimiters are user-settable, and defined by the X resource
  wordDelimiters, cf. X_Resources_.

3>Quantifiers

  Quantifiers specify how many times the previous regular expression atom may
  be matched in the search text.  Some quantifiers can produce a large
  performance penalty, and can in some instances completely lock up XNEdit.  To
  prevent this, avoid nested quantifiers, especially those of the maximal
  matching type (see below.)

  The following quantifiers are maximal matching, or "greedy", in that they
  match as much text as possible (but don't exclude shorter matches if that
  is necessary to achieve an overall match).

     *   Match zero or more
     +   Match one  or more
     ?   Match zero or one

  The following quantifiers are minimal matching, or "lazy", in that they match
  as little text as possible (but don't exclude longer matches if that is
  necessary to achieve an overall match).

     *?   Match zero or more
     +?   Match one  or more
     ??   Match zero or one

  One final quantifier is the counting quantifier, or brace quantifier. It
  takes the following basic form:

     {min,max}  Match from `min' to `max' times the
                previous regular expression atom.

  If `min' is omitted, it is assumed to be zero.  If `max' is omitted, it is
  assumed to be infinity.  Whether specified or assumed, `min' must be less
  than or equal to `max'.  Note that both `min' and `max' are limited to
  65535.  If both are omitted, then the construct is the same as `*'.   Note
  that `{,}' and `{}' are both valid brace constructs.  A single number
  appearing without a comma, e.g. `{3}' is short for the `{min,min}' construct,
  or to match exactly `min' number of times.

  The quantifiers `{1}' and `{1,1}' are accepted by the syntax, but are
  optimized away since they mean to match exactly once, which is redundant
  information.  Also, for efficiency, certain combinations of `min' and `max'
  are converted to either `*', `+', or `?' as follows:

     {} {,} {0,}    *
     {1,}           +
     {,1} {0,1}     ?

  Note that {0} and {0,0} are meaningless and will generate an error message at
  regular expression compile time.

  Brace quantifiers can also be "lazy".  For example {2,5}? would try to match
  2 times if possible, and will only match 3, 4, or 5 times if that is what is
  necessary to achieve an overall match.

3>Alternation

  A series of alternative patterns to match can be specified by separating them
  with vertical pipes, `|'.  An example of _alternation would be `a|be|sea'.
  This will match `a', or `be', or `sea'. Each alternative can be an
  arbitrarily complex regular expression. The alternatives are attempted in
  the order specified.  An empty alternative can be specified if desired, e.g.
  `a|b|'.  Since an empty alternative can match nothingness (the empty string),
  this guarantees that the expression will match.

3>Comments

  Comments are of the form `(?#<comment text>)' and can be inserted anywhere
  and have no effect on the execution of the regular expression.  They can be
  handy for documenting very complex regular expressions.  Note that a comment
  begins with `(?#' and ends at the first occurrence of an ending parenthesis,
  or the end of the regular expression... period.  Comments do not recognize
  any escape sequences.
   ----------------------------------------------------------------------

Metacharacters
--------------

3>Escaping Metacharacters

  In a regular expression (regex), most ordinary characters match themselves.
  For example, `ab%' would match anywhere `a' followed by `b' followed by `%'
  appeared in the text.  Other characters don't match themselves, but are
  metacharacters. For example, backslash is a special metacharacter which
  'escapes' or changes the meaning of the character following it. Thus, to
  match a literal backslash would require a regular expression to have two
  backslashes in sequence. XNEdit provides the following escape sequences so
  that metacharacters that are used by the regex syntax can be specified as
  ordinary characters.

     \(  \)  \-  \[  \]  \<  \>  \{  \}
     \.  \|  \^  \$  \*  \+  \?  \&  \\

3>Special Control Characters

  There are some special characters that are  difficult or impossible to type.
  Many of these characters can be constructed as a sort of metacharacter or
  sequence by preceding a literal character with a backslash. XNEdit recognizes
  the following special character sequences:

     \a  alert (bell)
     \b  backspace
     \e  ASCII escape character (***)
     \f  form feed (new page)
     \n  newline
     \r  carriage return
     \t  horizontal tab
     \v  vertical tab

     *** For environments that use the EBCDIC character set,
         when compiling XNEdit set the EBCDIC_CHARSET compiler
         symbol to get the EBCDIC equivalent escape
         character.)

3>Octal and Hex Escape Sequences

  Any ASCII (or EBCDIC) character, except null, can be specified by using
  either an octal escape or a hexadecimal escape, each beginning with \0 or \x
  (or \X), respectively.  For example, \052 and \X2A both specify the `*'
  character.  Escapes for null (\00 or \x0) are not valid and will generate an
  error message.  Also, any escape that exceeds \0377 or \xFF will either cause
  an error or have any additional character(s) interpreted literally. For
  example, \0777 will be interpreted as \077 (a `?' character) followed by `7'
  since \0777 is greater than \0377.

  An invalid digit will also end an octal or hexadecimal escape.  For example,
  \091 will cause an error since `9' is not within an octal escape's range of
  allowable digits (0-7) and truncation before the `9' yields \0 which is
  invalid.

3>Shortcut Escape Sequences

  XNEdit defines some escape sequences that are handy shortcuts for commonly
  used character classes.

   \d  digits            0-9
   \l  letters           a-z, A-Z, and locale dependent letters
   \s  whitespace        \t, \r, \v, \f, and space
   \w  word characters   letters, digits, and underscore, `_'

  \D, \L, \S, and \W are the same as the lowercase versions except that the
  resulting character class is negated.  For example, \d is equivalent to
  `[0-9]', while \D is equivalent to `[^0-9]'.

  These escape sequences can also be used within a character class.  For
  example, `[\l_]' is the same as `[a-zA-Z@_]', extended with possible locale
  dependent letters. The escape sequences for special characters, and octal
  and hexadecimal escapes are also valid within a class.

3>Word Delimiter Tokens

  Although not strictly a character class, the following escape sequences
  behave similarly to character classes:

     \y   Word delimiter character
     \Y   Not a word delimiter character

  The `\y' token matches any single character that is one of the characters
  that XNEdit recognizes as a word delimiter character, while the `\Y' token
  matches any character that is ~not~ a word delimiter character.  Word
  delimiter characters are dynamic in nature, meaning that the user can change
  them through preference settings.  For this reason, they must be handled
  differently by the regular expression engine.  As a consequence of this,
  `\y' and `\Y' cannot be used within a character class specification.
   ----------------------------------------------------------------------

Parenthetical Constructs
------------------------

3>Capturing Parentheses

  Capturing Parentheses are of the form `(<regex>)' and can be used to group
  arbitrarily complex regular expressions.  Parentheses can be nested, but the
  total number of parentheses, nested or otherwise, is limited to 50 pairs.
  The text that is matched by the regular expression between a matched set of
  parentheses is captured and available for text substitutions and
  backreferences (see below.)  Capturing parentheses carry a fairly high
  overhead both in terms of memory used and execution speed, especially if
  quantified by `*' or `+'.

3>Non-Capturing Parentheses

  Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate
  grouping only and do not incur the overhead of normal capturing parentheses.
  They should not be counted when determining numbers for capturing parentheses
  which are used with backreferences and substitutions.  Because of the limit
  on the number of capturing parentheses allowed in a regex, it is advisable to
  use non-capturing parentheses when possible.

3>Positive Look-Ahead

  Positive look-ahead constructs are of the form `(?=<regex>)' and implement a
  zero width assertion of the enclosed regular expression.  In other words, a
  match of the regular expression contained in the positive look-ahead
  construct is attempted.  If it succeeds, control is passed to the next
  regular expression atom, but the text that was consumed by the positive
  look-ahead is first unmatched (backtracked) to the place in the text where
  the positive look-ahead was first encountered.

  One application of positive look-ahead is the manual implementation of a
  first character discrimination optimization.  You can include a positive
  look-ahead that contains a character class which lists every character that
  the following (potentially complex) regular expression could possibly start
  with.  This will quickly filter out match attempts that cannot possibly
  succeed.

3>Negative Look-Ahead

  Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as
  positive look-ahead except that the enclosed regular expression must NOT
  match.  This can be particularly useful when you have an expression that is
  general, and you want to exclude some special cases.  Simply precede the
  general expression with a negative look-ahead that covers the special cases
  that need to be filtered out.

3>Positive Look-Behind

  Positive look-behind constructs are of the form `(?<=<regex>)' and implement
  a zero width assertion of the enclosed regular expression in front of the
  current matching position.  It is similar to a positive look-ahead assertion,
  except for the fact that the match is attempted on the text preceding the
  current position, possibly even in front of the start of the matching range
  of the entire regular expression.

  A restriction on look-behind expressions is the fact that the expression
  must match a string of a bounded size.  In other words, `*', `+', and `{n,}'
  quantifiers are not allowed inside the look-behind expression. Moreover,
  matching performance is sensitive to the difference between the upper and
  lower bound on the matching size.  The smaller the difference, the better the
  performance.  This is especially important for regular expressions used in
  highlight patterns.

  Positive look-behind has similar applications as positive look-ahead.

3>Negative Look-Behind

  Negative look-behind takes the form `(?<!<regex>)' and is exactly the same as
  positive look-behind except that the enclosed regular expression must
  ~not~ match. The same restrictions apply.

  Note however, that performance is even more sensitive to the distance
  between the size boundaries: a negative look-behind must not match for
  **any** possible size, so the matching engine must check **every** size.

3>Case Sensitivity

  There are two parenthetical constructs that control case sensitivity:

     (?i<regex>)   Case insensitive; `AbcD' and `aBCd' are
                   equivalent.

     (?I<regex>)   Case sensitive;   `AbcD' and `aBCd' are
                   different.

  Regular expressions are case sensitive by default, that is, `(?I<regex>)' is
  assumed.  All regular expression token types respond appropriately to case
  insensitivity including character classes and backreferences.  There is some
  extra overhead involved when case insensitivity is in effect, but only to the
  extent of converting each character compared to lower case.

3>Matching Newlines

  XNEdit regular expressions by default handle the matching of newlines in a way
  that should seem natural for most editing tasks.  There are situations,
  however, that require finer control over how newlines are matched by some
  regular expression tokens.

  By default, XNEdit regular expressions will ~not~ match a newline character for
  the following regex tokens: dot (`.'); a negated character class (`[^...]');
  and the following shortcuts for character classes:

     `\d', `\D', `\l', `\L', `\s', `\S', `\w', `\W', `\Y'

  The matching of newlines can be controlled for the `.' token, negated
  character classes, and the `\s' and `\S' shortcuts by using one of the
  following parenthetical constructs:

     (?n<regex>)  `.', `[^...]', `\s', `\S' match newlines

     (?N<regex>)  `.', `[^...]', `\s', `\S' don't match
                                            newlines

  `(?N<regex>)' is the default behavior.

3>Notes on New Parenthetical Constructs

  Except for plain parentheses, none of the parenthetical constructs capture
  text.  If that is desired, the construct must be wrapped with capturing
  parentheses, e.g. `((?i<regex))'.

  All parenthetical constructs can be nested as deeply as desired, except for
  capturing parentheses which have a limit of 50 sets of parentheses,
  regardless of nesting level.

3>Back References

  Backreferences allow you to match text captured by a set of capturing
  parenthesis at some later position in your regular expression.  A
  backreference is specified using a single backslash followed by a single
  digit from 1 to 9 (example: \3).  Backreferences have similar syntax to
  substitutions (see below), but are different from substitutions in that they
  appear within the regular expression, not the substitution string. The number
  specified with a backreference identifies which set of text capturing
  parentheses the backreference is associated with. The text that was most
  recently captured by these parentheses is used by the backreference to
  attempt a match.  As with substitutions, open parentheses are counted from
  left to right beginning with 1.  So the backreference `\3' will try to match
  another occurrence of the text most recently matched by the third set of
  capturing parentheses.  As an example, the regular expression `(\d)\1' could
  match `22', `33', or `00', but wouldn't match `19' or `01'.

  A backreference must be associated with a parenthetical expression that is
  complete.  The expression `(\w(\1))' contains an invalid backreference since
  the first set of parentheses are not complete at the point where the
  backreference appears.

3>Substitution

  Substitution strings are used to replace text matched by a set of capturing
  parentheses.  The substitution string is mostly interpreted as ordinary text
  except as follows.

  The escape sequences described above for special characters, and octal and
  hexadecimal escapes are treated the same way by a substitution string. When
  the substitution string contains the `&' character, XNEdit will substitute the
  entire string that was matched by the `Find...' operation. Any of the first
  nine sub-expressions of the match string can also be inserted into the
  replacement string.  This is done by inserting a `\' followed by a digit from
  1 to 9 that represents the string matched by a parenthesized expression
  within the regular expression.  These expressions are numbered left-to-right
  in order of their opening parentheses.

  The capitalization of text inserted by `&' or `\1', `\2', ... `\9' can be
  altered by preceding them with `\U', `\u', `\L', or `\l'.  `\u' and `\l'
  change only the first character of the inserted entity, while `\U' and `\L'
  change the entire entity to upper or lower case, respectively.
   ----------------------------------------------------------------------

Advanced Topics
---------------

3>Substitutions

  Regular expression substitution can be used to program automatic editing
  operations.  For example, the following are search and replace strings to find
  occurrences of the `C' language subroutine `get_x', reverse the first and
  second parameters, add a third parameter of NULL, and change the name to
  `new_get_x':

     Search string:   `get_x *\( *([^ ,]*), *([^\)]*)\)'
     Replace string:  `new_get_x(\2, \1, NULL)'

3>Ambiguity

  If a regular expression could match two different parts of the text, it will
  match the one which begins earliest.  If both begin in the same place but
  match different lengths, or match the same length in different ways, life
  gets messier, as follows.

  In general, the possibilities in a list of alternatives are considered in
  left-to-right order.  The possibilities for `*', `+', and `?' are considered
  longest-first, nested constructs are considered from the outermost in, and
  concatenated constructs are considered leftmost-first. The match that will be
  chosen is the one that uses the earliest possibility in the first choice that
  has to be made.  If there is more than one choice, the next will be made in
  the same manner (earliest possibility) subject to the decision on the first
  choice.  And so forth.

  For example, `(ab|a)b*c' could match `abc' in one of two ways.  The first
  choice is between `ab' and `a'; since `ab' is earlier, and does lead to a
  successful overall match, it is chosen.  Since the `b' is already spoken for,
  the `b*' must match its last possibility, the empty string, since it must
  respect the earlier choice.

  In the particular case where no `|'s are present and there is only one `*',
  `+', or `?', the net effect is that the longest possible match will be
  chosen.  So `ab*', presented with `xabbbby', will match `abbbb'.  Note that
  if `ab*' is tried against `xabyabbbz', it will match `ab' just after `x', due
  to the begins-earliest rule.  (In effect, the decision on where to start the
  match is the first choice to be made, hence subsequent choices must respect
  it even if this leads them to less-preferred alternatives.)

3>References

  An excellent book on the care and feeding of regular expressions is

          Mastering Regular Expressions, 3rd Edition
          Jeffrey E. F. Friedl
          August 2006, O'Reilly & Associates
          ISBN 0-596-52812-4

  The first end second editions of this book are still useful for basic
  introduction to regexes and contain many useful tips and tricks.
   ----------------------------------------------------------------------

Example Regular Expressions
---------------------------

  The following are regular expression examples which will match:

* An entire line.
!       ^.*$

* Blank lines.
!       ^$

* Whitespace on a line.
!       \s+

* Whitespace across lines.
!       (?n\s+)

* Whitespace that spans at least two lines. Note minimal matching `*?' quantifier.
!       (?n\s*?\n\s*)

* IP address (not robust).
!       (?:\d{1,3}(?:\.\d{1,3}){3})

* Two character US Postal state abbreviations (includes territories).
!       [ACDF-IK-PR-W][A-Z]

* Web addresses.
!       (?:http://)?www\.\S+

* Case insensitive double words across line breaks.
!       (?i(?n<(\S+)\s+\1>))

* Upper case words with possible punctuation.
!       <[A-Z][^a-z\s]*>

   ----------------------------------------------------------------------
```
      ]]]]>
    </xnedit_documentation_text>
  </input_data>
  <purpose>
    You are an expert in XNEdit regular expressions and XNEdit “Find…” / “Replace…” operations.
    Given an original text sample [[source_text]] and a desired transformed text sample [[target_text]], produce exactly the two dialog inputs needed for XNEdit:
    - “String to find:” (a single regular expression)
    - “Replace with:” (a single substitution string)

    Success criteria:
    1) Applying Replace (or Replace All, if requested) transforms [[source_text]] into [[target_text]] for all intended occurrences.
    2) The regex is as specific as possible to avoid unintended matches.
    3) The pattern is performance-safe (no broad nested greedy quantifiers).
  </purpose>

  <constraints>
    <constraint>Use ONLY XNEdit-supported regex features (as described in the provided documentation). No named groups, no PCRE-only tokens.</constraint>
    <constraint>Prefer explicit character classes and bounded patterns over “.*” wherever feasible.</constraint>
    <constraint>Avoid catastrophic backtracking: do not use nested greedy quantifiers over broad atoms (e.g., (.*)+ ).</constraint>
    <constraint>Use (?:...) for grouping unless the group is referenced in the replacement string.</constraint>
    <constraint>If multi-line matching is required, wrap ONLY the necessary part in (?n...). Keep it tightly bounded (e.g., [^\n]* rather than .*).</constraint>
    <constraint>Output MUST follow the exact output format specification section. Do not output anything else.</constraint>
  </constraints>

  <instructions>
    <instruction>1. Compare [[source_text]] vs [[target_text]] and infer the minimal transformation rule(s): what is deleted, inserted, reordered, or reformatted; what stays identical.</instruction>
    <instruction>2. Decide the intended scope:
      - If [[match_mode]] is present, follow it.
      - Else assume global_replace.
      - If [[scope_notes]] indicates a smaller scope (e.g., “only in a block”, “only on lines starting with X”), incorporate that into the FIND regex using anchors, delimiters, or lookarounds.</instruction>
    <instruction>3. Choose stable anchors and delimiters around the changing region (start/end tokens, surrounding punctuation, tag names, line starts/ends). Prefer anchors that are present in BOTH source and target.</instruction>
    <instruction>4. Draft the FIND regex:
      - Use capturing groups only for text that must be reused.
      - Use [^\n]* instead of .* when staying on one line.
      - Use lazy quantifiers only when you have an unambiguous terminator.
      - If multiple near-identical candidates exist, narrow using more context rather than widening the match.</instruction>
    <instruction>5. Draft the REPLACE string:
      - Rebuild the exact target structure using literal text + & and/or \\1..\\9.
      - Apply case modifiers (\\U/\\L/\\u/\\l) only if the target explicitly changes case.</instruction>
    <instruction>6. Sanity-check for safety and correctness:
      - Unintended matches: identify at least one plausible false-positive in [[source_text]] and explain how the regex avoids it.
      - Performance: confirm there are no nested greedy quantifiers over broad patterns.
      - Multi-line: if (?n...) is used, confirm the matched span is bounded.</instruction>
    <instruction>7. Provide 2–4 concrete test cases, each being a short excerpt from [[source_text]] and its expected post-replacement excerpt.</instruction>
    <instruction>8. If [[source_text]] and [[target_text]] are placeholders or insufficient to infer a unique rule, do NOT guess a regex. Instead output NEED_MORE_INPUT with exactly one short bullet list of what is missing.</instruction>
  </instructions>

  <output_format_specification>
    <format><![CDATA[
String to find:
<REGEX>

Replace with:
<REPLACEMENT>

Rationale:
- <why this regex is specific>
- <why this replacement yields the target>
- <safety/performance note>

Test cases:
1) Input: <excerpt>
   Output: <excerpt>
2) Input: <excerpt>
   Output: <excerpt>

(If insufficient info)
NEED_MORE_INPUT:
- <missing item>
- <missing item>
]]></format>
  </output_format_specification>

  <examples>
    <example>
      <input_data>
        <source_text><![CDATA[
get_x(foo, bar)
get_x( a, b + c )
]]></source_text>
        <target_text><![CDATA[
new_get_x(bar, foo, NULL)
new_get_x(b + c, a, NULL)
]]></target_text>
        <match_mode>global_replace</match_mode>
      </input_data>
      <output><![CDATA[
String to find:
get_x\s*\(\s*([^,\s]+)\s*,\s*([^\)]*?)\s*\)

Replace with:
new_get_x(\2, \1, NULL)

Rationale:
- Anchors on literal "get_x" and parentheses, and captures param1 and param2 while tolerating whitespace.
- Uses non-greedy capture for param2 up to the closing parenthesis, then trims trailing whitespace via \s* before \).
- Pattern is linear and avoids nested quantifiers over broad atoms.

Test cases:
1) Input: get_x(foo, bar)
   Output: new_get_x(bar, foo, NULL)
2) Input: get_x( a, b + c )
   Output: new_get_x(b + c, a, NULL)
]]></output>
    </example>

    <example>
      <input_data>
        <source_text><![CDATA[
Name: Alice Smith
Name: Bob Jones
]]></source_text>
        <target_text><![CDATA[
Smith, Alice
Jones, Bob
]]></target_text>
        <match_mode>per_line</match_mode>
      </input_data>
      <output><![CDATA[
String to find:
^Name:\s+(\l+)\s+(\l+)$

Replace with:
\2, \1

Rationale:
- Line-anchored so it only rewrites lines beginning with "Name:".
- Captures two letter tokens and swaps them; whitespace normalized via \s+.
- No broad multi-line tokens; no nested quantifiers.

Test cases:
1) Input: Name: Alice Smith
   Output: Smith, Alice
2) Input: Name: Bob Jones
   Output: Jones, Bob
]]></output>
    </example>

    <example>
      <input_data>
        <source_text><![CDATA[
BEGIN
alpha
beta
END
]]></source_text>
        <target_text><![CDATA[
BEGIN
alpha;beta
END
]]></target_text>
        <match_mode>single_occurrence</match_mode>
      </input_data>
      <output><![CDATA[
String to find:
(?n(^BEGIN\n)([^\n]*)\n([^\n]*)(\nEND$))

Replace with:
\1\2;\3\4

Rationale:
- Enables newline matching only for the bounded 4-line block and anchors the boundary lines.
- Uses [^\n]* to avoid accidental spanning across multiple lines.
- Avoids .* inside (?n...) to reduce backtracking risk.

Test cases:
1) Input: BEGIN\nalpha\nbeta\nEND
   Output: BEGIN\nalpha;beta\nEND
]]></output>
    </example>
  </examples>

  <evaluation_checklist>
    <item>Format adherence: output matches the exact required fields and labeling.</item>
    <item>Specificity: regex is anchored and avoids matching unrelated text.</item>
    <item>Correctness: replacement reconstructs target exactly for provided test cases.</item>
    <item>Safety: no nested greedy quantifiers over broad atoms; bounded multi-line matching if used.</item>
  </evaluation_checklist>
</prompt>
URL: https://ib.bsb.br/regeXnedit