Tokenize text and return indices as UTF-16 code unit offsets. More...

Collaboration diagram for UTF-16 Tokenization API:

Functions
TML_API TextMateTokenizeResult *	textmate_tokenize_line_utf16 (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
	Tokenize a single line with decoded scopes, returning UTF-16 indices.

TML_API TextMateTokenizeResult2 *	textmate_tokenize_line2_utf16 (TextMateGrammar grammar, const char *lineText, TextMateStateStack prevState)
	Tokenize a single line with encoded tokens, returning UTF-16 indices.

TML_API TextMateTokenizeMultiLinesResult *	textmate_tokenize_lines_utf16 (TextMateGrammar grammar, const char **lines, int32_t lineCount, TextMateStateStack initialState)
	Tokenize multiple lines in a single call, returning UTF-16 indices.

Detailed Description

Tokenize text and return indices as UTF-16 code unit offsets.

Use these from language bindings where strings are UTF-16 encoded (C#, JavaScript). The original functions above return UTF-8 byte offsets which are correct for C/C++.

Function Documentation

◆ textmate_tokenize_line2_utf16()

TML_API TextMateTokenizeResult2 * textmate_tokenize_line2_utf16	(	TextMateGrammar	grammar,
		const char *	lineText,
		TextMateStateStack	prevState
	)

Tokenize a single line with encoded tokens, returning UTF-16 indices.

Parameters

grammar	Valid grammar handle (from textmate_registry_load_grammar())
lineText	The text to tokenize (UTF-8, null-terminated)
prevState	The state from the previous line (or initial state for first line)

Returns: Pointer to tokenization result on success, NULL on error

Note: Start offsets in the encoded tokens are UTF-16 code unit offsets; The returned result must be freed with textmate_free_tokenize_result2()

Definition at line 842 of file c_api.cpp.

References TextMateTokenizeResult2::ruleStack, TextMateTokenizeResult2::stoppedEarly, TextMateTokenizeResult2::tokenCount, and TextMateTokenizeResult2::tokens.

◆ textmate_tokenize_line_utf16()

TML_API TextMateTokenizeResult * textmate_tokenize_line_utf16	(	TextMateGrammar	grammar,
		const char *	lineText,
		TextMateStateStack	prevState
	)

Tokenize a single line with decoded scopes, returning UTF-16 indices.

Parameters

grammar	Valid grammar handle (from textmate_registry_load_grammar())
lineText	The text to tokenize (UTF-8, null-terminated)
prevState	The state from the previous line (or initial state for first line)

Returns: Pointer to tokenization result on success, NULL on error

Note: Token startIndex/endIndex are UTF-16 code unit offsets; The returned result must be freed with textmate_free_tokenize_result()

Definition at line 794 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.

◆ textmate_tokenize_lines_utf16()

TML_API TextMateTokenizeMultiLinesResult * textmate_tokenize_lines_utf16	(	TextMateGrammar	grammar,
		const char **	lines,
		int32_t	lineCount,
		TextMateStateStack	initialState
	)

Tokenize multiple lines in a single call, returning UTF-16 indices.

Parameters

grammar	Valid grammar handle
lines	Array of line strings (UTF-8, null-terminated, none should include newline)
lineCount	Number of lines in the array
initialState	The state to start with (typically INITIAL or from Session API)

Returns: Pointer to batch result on success, NULL on error

Note: Token startIndex/endIndex are UTF-16 code unit offsets; The returned result must be freed with textmate_free_tokenize_lines_result()

Definition at line 886 of file c_api.cpp.

References TextMateToken::endIndex, TextMateTokenizeMultiLinesResult::lineCount, TextMateTokenizeMultiLinesResult::lineResults, TextMateTokenizeResult::ruleStack, TextMateToken::scopeDepth, TextMateToken::scopes, TextMateToken::startIndex, TextMateTokenizeResult::stoppedEarly, TextMateTokenizeResult::tokenCount, and TextMateTokenizeResult::tokens.

Functions

Detailed Description

Function Documentation

◆ textmate_tokenize_line2_utf16()

◆ textmate_tokenize_line_utf16()

◆ textmate_tokenize_lines_utf16()