TextMateLib 1.0
Modern C++ implementation of the TextMate syntax highlighting engine
Loading...
Searching...
No Matches
Tokenization API

Tokenize text using a grammar, handling stateful line-by-line parsing. More...

Collaboration diagram for Tokenization API:

Modules

 UTF-16 Tokenization API
 Tokenize text and return indices as UTF-16 code unit offsets.
 

Functions

TML_API TextMateStateStack textmate_get_initial_state ()
 Get the initial parsing state.
 
TML_API TextMateTokenizeResulttextmate_tokenize_line (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
 Tokenize a single line of text with decoded scopes.
 
TML_API TextMateTokenizeResult2textmate_tokenize_line2 (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
 Tokenize a single line of text with encoded tokens (more efficient)
 
TML_API TextMateTokenizeMultiLinesResulttextmate_tokenize_lines (TextMateGrammar grammar, const char **lines, int32_t lineCount, TextMateStateStack initialState)
 Tokenize multiple lines in a single call.
 
TML_API void textmate_free_tokenize_result (TextMateTokenizeResult *result)
 Free a line tokenization result.
 
TML_API void textmate_free_tokenize_result2 (TextMateTokenizeResult2 *result)
 Free an encoded line tokenization result.
 
TML_API void textmate_free_tokenize_lines_result (TextMateTokenizeMultiLinesResult *result)
 Free a batch tokenization result.
 
TML_API const char * textmate_grammar_get_scope_name (TextMateGrammar grammar)
 Get the scope name (language identifier) of a grammar.
 
TML_API void textmate_oniglib_dispose (TextMateOnigLib onigLib)
 Free the Oniguruma library.
 

Detailed Description

Tokenize text using a grammar, handling stateful line-by-line parsing.

Function Documentation

◆ textmate_free_tokenize_lines_result()

TML_API void textmate_free_tokenize_lines_result ( TextMateTokenizeMultiLinesResult result)

Free a batch tokenization result.

Parameters
resultValid result pointer (from textmate_tokenize_lines()), or NULL (no-op)
Warning
Do not use result after calling this function
Note
Safe to call with NULL

Definition at line 778 of file c_api.cpp.

References TextMateTokenizeMultiLinesResult::lineCount, TextMateTokenizeMultiLinesResult::lineResults, and textmate_free_tokenize_result().

◆ textmate_free_tokenize_result()

TML_API void textmate_free_tokenize_result ( TextMateTokenizeResult result)

Free a line tokenization result.

Parameters
resultValid result pointer (from textmate_tokenize_line()), or NULL (no-op)
Warning
Do not use result after calling this function
Note
Safe to call with NULL

Definition at line 692 of file c_api.cpp.

References TextMateToken::scopeDepth, TextMateToken::scopes, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.

Referenced by textmate_free_tokenize_lines_result().

◆ textmate_free_tokenize_result2()

TML_API void textmate_free_tokenize_result2 ( TextMateTokenizeResult2 result)

Free an encoded line tokenization result.

Parameters
resultValid result pointer (from textmate_tokenize_line2()), or NULL (no-op)
Warning
Do not use result after calling this function
Note
Safe to call with NULL

Definition at line 710 of file c_api.cpp.

References TextMateTokenizeResult2::tokens.

◆ textmate_get_initial_state()

TML_API TextMateStateStack textmate_get_initial_state ( )

Get the initial parsing state.

Returns
The INITIAL state stack (first line of a document)
Note
This is used as the prevState parameter for the first line
The returned state is read-only and should not be freed

Definition at line 610 of file c_api.cpp.

References tml::INITIAL.

◆ textmate_grammar_get_scope_name()

TML_API const char * textmate_grammar_get_scope_name ( TextMateGrammar  grammar)

Get the scope name (language identifier) of a grammar.

Parameters
grammarValid grammar handle (from textmate_registry_load_grammar())
Returns
Scope name string (e.g., "source.javascript"), valid for lifetime of grammar
NULL if grammar is invalid

Definition at line 947 of file c_api.cpp.

◆ textmate_oniglib_dispose()

TML_API void textmate_oniglib_dispose ( TextMateOnigLib  onigLib)

Free the Oniguruma library.

Parameters
onigLibValid Oniguruma handle (from textmate_oniglib_create()), or NULL (no-op)
Warning
Do not use onigLib after calling this function
All registries and grammars created with this lib become invalid
Note
Safe to call with NULL

Definition at line 963 of file c_api.cpp.

◆ textmate_tokenize_line()

TML_API TextMateTokenizeResult * textmate_tokenize_line ( TextMateGrammar  grammar,
const char *  lineText,
TextMateStateStack  prevState 
)

Tokenize a single line of text with decoded scopes.

Parameters
grammarValid grammar handle (from textmate_registry_load_grammar())
lineTextThe text to tokenize (should not include newline)
prevStateThe state from the previous line (or initial state for first line)
Returns
Pointer to tokenization result on success, NULL on error
Note
The returned result must be freed with textmate_free_tokenize_result()
Use the ruleStack from the result as prevState for the next line
See also
textmate_tokenize_line2() for encoded token format (more efficient)
textmate_get_initial_state()

Definition at line 615 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.

◆ textmate_tokenize_line2()

TML_API TextMateTokenizeResult2 * textmate_tokenize_line2 ( TextMateGrammar  grammar,
const char *  lineText,
TextMateStateStack  prevState 
)

Tokenize a single line of text with encoded tokens (more efficient)

Parameters
grammarValid grammar handle (from textmate_registry_load_grammar())
lineTextThe text to tokenize (should not include newline)
prevStateThe state from the previous line (or initial state for first line)
Returns
Pointer to tokenization result on success, NULL on error
Note
The returned result must be freed with textmate_free_tokenize_result2()
Tokens are encoded as 32-bit values for better performance
Prefer this over textmate_tokenize_line() for performance-critical code

Definition at line 658 of file c_api.cpp.

References TextMateTokenizeResult2::ruleStack, TextMateTokenizeResult2::stoppedEarly, TextMateTokenizeResult2::tokenCount, and TextMateTokenizeResult2::tokens.

◆ textmate_tokenize_lines()

TML_API TextMateTokenizeMultiLinesResult * textmate_tokenize_lines ( TextMateGrammar  grammar,
const char **  lines,
int32_t  lineCount,
TextMateStateStack  initialState 
)

Tokenize multiple lines in a single call.

Parameters
grammarValid grammar handle
linesArray of line strings (none should include newline)
lineCountNumber of lines in the array
initialStateThe state to start with (typically INITIAL or from Session API)
Returns
Pointer to batch result on success, NULL on error
Note
The returned result must be freed with textmate_free_tokenize_lines_result()
Reduces FFI call overhead when tokenizing multiple lines (important for language bindings)
Each result's ruleStack is automatically passed to the next line
See also
textmate_free_tokenize_lines_result()

Definition at line 720 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeMultiLinesResult::lineCount, TextMateTokenizeMultiLinesResult::lineResults, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.