TextMateLib 1.0
Modern C++ implementation of the TextMate syntax highlighting engine
Loading...
Searching...
No Matches
UTF-16 Tokenization API

Tokenize text and return indices as UTF-16 code unit offsets. More...

Collaboration diagram for UTF-16 Tokenization API:

Functions

TML_API TextMateTokenizeResulttextmate_tokenize_line_utf16 (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
 Tokenize a single line with decoded scopes, returning UTF-16 indices.
 
TML_API TextMateTokenizeResult2textmate_tokenize_line2_utf16 (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
 Tokenize a single line with encoded tokens, returning UTF-16 indices.
 
TML_API TextMateTokenizeMultiLinesResulttextmate_tokenize_lines_utf16 (TextMateGrammar grammar, const char **lines, int32_t lineCount, TextMateStateStack initialState)
 Tokenize multiple lines in a single call, returning UTF-16 indices.
 

Detailed Description

Tokenize text and return indices as UTF-16 code unit offsets.

Use these from language bindings where strings are UTF-16 encoded (C#, JavaScript). The original functions above return UTF-8 byte offsets which are correct for C/C++.

Function Documentation

◆ textmate_tokenize_line2_utf16()

TML_API TextMateTokenizeResult2 * textmate_tokenize_line2_utf16 ( TextMateGrammar  grammar,
const char *  lineText,
TextMateStateStack  prevState 
)

Tokenize a single line with encoded tokens, returning UTF-16 indices.

Parameters
grammarValid grammar handle (from textmate_registry_load_grammar())
lineTextThe text to tokenize (UTF-8, null-terminated)
prevStateThe state from the previous line (or initial state for first line)
Returns
Pointer to tokenization result on success, NULL on error
Note
Start offsets in the encoded tokens are UTF-16 code unit offsets
The returned result must be freed with textmate_free_tokenize_result2()

Definition at line 842 of file c_api.cpp.

References TextMateTokenizeResult2::ruleStack, TextMateTokenizeResult2::stoppedEarly, TextMateTokenizeResult2::tokenCount, and TextMateTokenizeResult2::tokens.

◆ textmate_tokenize_line_utf16()

TML_API TextMateTokenizeResult * textmate_tokenize_line_utf16 ( TextMateGrammar  grammar,
const char *  lineText,
TextMateStateStack  prevState 
)

Tokenize a single line with decoded scopes, returning UTF-16 indices.

Parameters
grammarValid grammar handle (from textmate_registry_load_grammar())
lineTextThe text to tokenize (UTF-8, null-terminated)
prevStateThe state from the previous line (or initial state for first line)
Returns
Pointer to tokenization result on success, NULL on error
Note
Token startIndex/endIndex are UTF-16 code unit offsets
The returned result must be freed with textmate_free_tokenize_result()

Definition at line 794 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.

◆ textmate_tokenize_lines_utf16()

TML_API TextMateTokenizeMultiLinesResult * textmate_tokenize_lines_utf16 ( TextMateGrammar  grammar,
const char **  lines,
int32_t  lineCount,
TextMateStateStack  initialState 
)

Tokenize multiple lines in a single call, returning UTF-16 indices.

Parameters
grammarValid grammar handle
linesArray of line strings (UTF-8, null-terminated, none should include newline)
lineCountNumber of lines in the array
initialStateThe state to start with (typically INITIAL or from Session API)
Returns
Pointer to batch result on success, NULL on error
Note
Token startIndex/endIndex are UTF-16 code unit offsets
The returned result must be freed with textmate_free_tokenize_lines_result()

Definition at line 886 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeMultiLinesResult::lineCount, TextMateTokenizeMultiLinesResult::lineResults, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.