String
Supported platforms:
Available since: Gideros 2011.6
Description
This library provides generic functions to manipulate strings, such as finding and extracting substrings, and pattern matching.
When indexing a string in Lua, the first character is at position 1 (not at 0, as in C). Indices are allowed to be negative and are interpreted as indexing backwards, from the end of the string. Thus, the last character is at position -1, and so on.
The string library provides all its functions inside the table string. It also sets a metatable for strings where the __index field points to the string table. Therefore, you can use the string functions in object-oriented style. For instance, string.byte(s, i) can be written as s:byte(i).
The string library assumes one-byte character encodings
String Patterns
A string pattern is a combination of characters that you can use with string.match(), string.gmatch(), and other functions to find a piece, or substring, of a longer string.
Direct Matches
You can use direct matches in a Lua function like string.match(), except for magic characters. For example, these commands look for the word Gideros within a string: <syntaxhighlight lang="lua"> local match1 = string.match("Welcome to Gideros!", "Gideros") local match2 = string.match("Welcome to my awesome game!", "Gideros") print(match1) -- Gideros print(match2) -- nil </source>
Notice that the first string has a match, so Gideros outputs to the Output window, but the second string doesn't have a match, so nil outputs to the Output window.
Character Classes
Character classes are essential for more advanced string searches. You can use them to search for something that isn't necessarily character-specific but fits within a known category (class), including letters, digits, spaces, punctuation, and more.
The following table shows the official character classes for Lua string patterns:
Class | Represents | Example Match |
---|---|---|
. | Any character | 32kasGJ1%fTlk?@94 |
%a | An uppercase or lowercase letter | aBcDeFgHiJkLmNoPqRsTuVwXyZ |
%l | A lowercase letter | abcdefghijklmnopqrstuvwxyz |
%u | An uppercase letter | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
%d | Any digit (number) | 0123456789 |
%p | Any punctuation character | !@#;,. |
%w | An alphanumeric character (either a letter or a number) | aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789 |
%s | A space or whitespace character | , \n, and \r |
%c | A special control character | |
%x | A hexadecimal character | 0123456789ABCDEF |
%z | The NULL character (\0) |
For single-letter character classes such as %a and %s, the corresponding uppercase letter represents the "opposite" of the class. For instance, %p represents a punctuation character while %P represents all characters except punctuation.
Magic Characters
There are 12 "magic characters" which are reserved for special purposes in patterns:
- $ % ^ * ( ) . [ ] + - ?
You can escape and search for magic characters using the % symbol. For example, to search for giderosmobile.com, escape the . (period) symbol by preceding it with a % as in %.. <syntaxhighlight lang="lua"> -- Incorrect: "giderosmobile.com" matches "giderosmobile#com" because the period is interpreted as "any character" local match1 = string.match("What is giderosmobile#com?", "giderosmobile.com") -- Correct: Escape the period with % so it is interpreted as a literal period character local match2 = string.match("I love giderosmobile.com!", "giderosmobile%.com") print(match1) -- giderosmobile#com print(match2) -- giderosmobile.com </source>
Anchors
You can search for a pattern at the beginning or end of a string by using the ^ and $ symbols. <syntaxhighlight lang="lua"> -- Using ^ to match the beginning of a string local start1 = string.match("first second third", "^first") -- Matches because "first" is at the beginning local start2 = string.match("third second first", "^first") -- Doesn't match because "first" isn't at the beginning print(start1) -- first print(start2) -- nil
-- Using $ to match the end of a string local end1 = string.match("first second third", "third$") -- Matches because "third" is at the end local end2 = string.match("third second first", "third$") -- Doesn't match because "third" isn't at the end print(end1) -- third print(end2) -- nil </source>
You can also use both ^ and $ together to ensure a pattern matches only the full string and not just some portion of it. <syntaxhighlight lang="lua"> -- Using both ^ and $ to match across a full string local match1 = string.match("Gideros", "^Gideros$") -- Matches because "Gideros" is the entire string (equality) local match2 = string.match("I play Gideros", "^Gideros$") -- Doesn't match because "Gideros" isn't at the beginning AND end local match3 = string.match("I play Gideros", "Gideros") -- Matches because "Gideros" is contained within "I play Gideros" print(match1) -- Gideros print(match2) -- nil print(match3) -- Gideros </source>
Class Modifiers
By itself, a character class only matches one character in a string. For instance, the following pattern ("%d") starts reading the string from left to right, finds the first digit (2), and stops. <syntaxhighlight lang="lua"> local match = string.match("The Cloud Kingdom has 25 power gems", "%d") print(match) -- 2 </source>
You can use modifiers with any character class to control the result:
- Quantifier Meaning
- + Match 1 or more of the preceding character class
- - Match as few of the preceding character class as possible
- * Match 0 or more of the preceding character class
- ? Match 1 or less of the preceding character class
- %n For n between 1 and 9, matches a substring equal to the n-th captured string.
- %bxy The balanced capture matching x, y, and everything between (for example, %b() matches a pair of parentheses and everything between them)
Adding a modifier to the same pattern ("%d+" instead of "%d"), outputs 25 instead of 2: <syntaxhighlight lang="lua"> local match1 = string.match("The Cloud Kingdom has 25 power gems", "%d") print(match1) -- 2 local match2 = string.match("The Cloud Kingdom has 25 power gems", "%d+") print(match2) -- 25 </source>
Class Sets
Sets should be used when a single character class can't do the whole job. For instance, you might want to match both lowercase letters (%l) and punctuation characters (%p) using a single pattern.
Sets are defined by brackets [] around them. In the following example, notice the difference between using a set ("[%l%p]+") and not using a set ("%l%p+"). <syntaxhighlight lang="lua"> local match1 = string.match("Hello!!! I am another string.", "[%l%p]+") -- Set local match2 = string.match("Hello!!! I am another string.", "%l%p+") -- Non-set print(match1) -- ello!!! print(match2) -- o!!! </source>
The first command (set) tells Lua to find both lowercase characters and punctuation. With the + quantifier added after the entire set, it finds all of those characters (ello!!!), stopping when it reaches the space.
In the second command (non-set), the + quantifier only applies to the %p class before it, so Lua grabs only the first lowercase character (o) before the series of punctuation (!!!).
Like character classes, sets can be "opposites" of themselves. This is done by adding a ^ character at the beginning of the set, directly after the opening [. For instance, "[%p%s]+" represents both punctuation and spaces, while "[^%p%s]+" represents all characters except punctuation and spaces.
Sets also support ranges which let you find an entire range of matches between a starting and ending character. This is an advanced feature which is outlined in more detail on the Lua 5.1 Manual. String Captures
String captures are sub-patterns within a pattern. These are enclosed in parentheses () and are used to get (capture) matching substrings and save them to variables. For example, the following pattern contains two captures, (%a+) and (%d+), which return two substrings upon a successful match. <syntaxhighlight lang="lua"> local pattern = "(%a+)%s?=%s?(%d+)" local key1, val1 = string.match("TwentyOne = 21", pattern) print(key1, val1) -- TwentyOne 21 local key2, val2 = string.match("TwoThousand= 2000", pattern) print(key2, val2) -- TwoThousand 2000 local key3, val3 = string.match("OneMillion=1000000", pattern) print(key3, val3) -- OneMillion 1000000 </source>
In the previous pattern, the ? quantifier that follows both of the %s classes is a safe addition because it makes the space on either side of the = sign optional. That means the match succeeds if one (or both) spaces are missing around the equal sign.
String captures can also be nested as the following example: <syntaxhighlight lang="lua"> local places = "The Cloud Kingdom is heavenly, The Forest Kingdom is peaceful" local pattern = "(The%s(%a+%sKingdom)[%w%s]+)"
for description, kingdom in string.gmatch(places, pattern) do
print(description) print(kingdom)
end --[[Expected Output: The Cloud Kingdom is heavenly Cloud Kingdom The Forest Kingdom is peaceful Forest Kingdom ]] </source>
This pattern search works as follows:
The string.gmatch() iterator looks for a match on the entire "description" pattern defined by the outer pair of parentheses. This stops at the first comma and captures the following:
- # Pattern Capture
- 1 (The%s(%a+%sKingdom)[%w%s]+) The Cloud Kingdom is heavenly
Using its successful first capture, the iterator then looks for a match on the "kingdom" pattern defined by the inner pair of parentheses. This nested pattern simply captures the following:
- # Pattern Capture
- 2 (%a+%sKingdom) Cloud Kingdom
The iterator then backs out and continues searching the full string, capturing the following:
- # Pattern Capture
- 3 (The%s(%a+%sKingdom)[%w%s]+) The Forest Kingdom is peaceful
- 4 (%a+%sKingdom) Forest Kingdom
String literals
Luau implements support for hexadecimal (\x), Unicode (\u) and \z escapes for string literals.
This syntax follows Lua 5.3 syntax:
- \xAB inserts a character with the code 0xAB into the string
- \u{ABC} inserts a UTF8 byte sequence that encodes U+0ABC character into the string (note that braces are mandatory)
- \z at the end of the line inside a string literal ignores all following whitespace including newlines, which can be helpful for breaking long literals into multiple lines
<syntaxhighlight lang="lua"> local str = "My icon \u{2590} !!!") </source>
Methodsstring.byte returns numerical code |
EventsConstants |