ClassiCube/doc/strings.md
2023-11-30 07:20:30 +11:00

9.7 KiB

Introduction

ClassiCube uses a custom string type rather than the standard C char* string in most places

ClassiCube strings (cc_string) are a struct with the following fields:

  • buffer -> Pointer to 8 bit characters (unsigned code page 437 indices)
  • length -> Number of characters currently used
  • capacity -> Maximum number of characters (i.e buffer size)

Note: This means STRINGS MAY NOT BE NULL TERMINATED (and are not in most cases)

You should also read the Strings section in the style guide

Memory management

Some general guidelines to keep in mind when it comes to cc_string strings:

  • String buffers can be allocated on either the stack or heap
    (i.e. make sure you don't return strings that are using stack allocated buffers)
  • Strings are fixed capacity (strings do not grow when length reaches capcity)
    (i.e. make sure you allocate a large enough buffer upfront)
  • Strings are not garbage collected or reference counted
    (i.e. you are responsible for managing the lifetime of strings)

Usage examples

Initialisating a string from readonly text:

cc_string str = String_FromConst("ABC");

Initialising a string from temporary memory on the stack:

// str will be able to store at most 200 characters in it
char strBuffer[200];
cc_string str = String_FromArray(strBuffer);

Initialising a string from persistent memory on the heap:

// str will be able to store at most 200 characters in it
char* str = Mem_Alloc(1, 200, "String buffer");
cc_string str = String_Init(str, 0, 200);

Converting to/from other string representations

C String conversion

C string -> cc_string

Creating a cc_string string from a C string is straightforward:

From a constant C string

void Example(void) {
    cc_string str = String_FromConst("test");
}

From a C string

void Example(const char* c_str) {
    cc_string str = String_FromReadonly(c_str);
}

Note: String_FromReadonly can also be used with constant C strings, it's just a bit slower

From a C fixed size string

struct Something { int value; char name[50]; };

void Example(struct Something* some) {
    cc_string str = String_FromRawArray(some->name);
}

cc_string -> C string

The buffer field should not be treated as a C string, because cc_string strings MAY NOT BE NULL TERMINATED

The general way to achieve this is to

  1. Initialise capacity with 1 less than actual buffer size (e.g. use String_InitArray_NT instead of String_InitArray)
  2. Perform various operations on the cc_string string
  3. Add null terminator to end (i.e. buffer[length] = '\0';
  4. Use buffer as a C string now

For example:

void PrintInt(int value) {
    cc_string str; char strBuffer[128];
    String_InitArray_NT(str, strBuffer);
    String_AppendInt(&str, value);
    str.buffer[str.length] = '\0';
    puts(str.buffer);
}

OS String conversion

cc_string strings cannot be directly used as arguments for operating system functions and must be converted first.

The following functions are provided to convert cc_string strings into operating system specific encoded strings:

cc_string -> Windows string

Platform_EncodeString converts a cc_string into a null terminated WCHAR and CHAR string

Example

void SetWorkingDir(cc_string* title) {
    cc_winstring str;
    Platform_EncodeUtf16(&str, title);
    SetCurrentDirectoryW(str.uni);
	
    // it's recommended that you DON'T use the ansi format whenever possible
    //SetCurrentDirectoryA(str.ansi); 
}

cc_string -> UTF8 string

String_EncodeUtf8 converts a cc_string into a null terminated UTF8-encoded char* string

Example

void SetWorkingDir(cc_string* title) {
    char buffer[NATIVE_STR_LEN];
    String_EncodeUtf8(buffer, title);
    chdir(buffer);
}

API

I'm lazy so I will just link to String.h

If you'd rather I provided a more detailed reference here, please let me know.

TODO

Comparisons to other string implementations

C comparison

A rough mapping of C string API to ClassiCube's string API:

atof    -> Convert_ParseFloat
strtof  -> Convert_ParseFloat
atoi    -> Convert_ParseInt
strtoi  -> Convert_ParseInt

strcat  -> String_AppendConst/String_AppendString
strcpy  -> String_Copy
strtok  -> String_UNSAFE_Split

strlen  -> str.length
strcmp  -> String_Equals/String_Compare
strchr  -> String_IndexOf
strrchr -> String_LastIndexOf
strstr  -> String_IndexOfConst

sprintf -> String_Format1/2/3/4
    %d  -> %i
  %04d -> %p4
    %i  -> %i
    %c  -> %r
  %.4f  -> %f4
    %s  -> %s (cc_string)
    %s  -> %c (char*)
    %x  -> %h

C# comparison

A rough mapping of C# string API to ClassiCube's string API:

byte.Parse   -> Convert_ParseUInt8
ushort.Parse -> Convert_ParseUInt16
float.Parse  -> Convert_ParseFloat
int.Parse    -> Convert_ParseInt
ulong.Parse  -> Convert_ParseUInt64
bool.Parse   -> Convert_ParseBool

a += "X";     -> String_AppendString
b = a;        -> String_Copy
string.Insert -> String_InsertAt
string.Remove -> String_DeleteAt

string.Substring -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt
string.Split     -> String_UNSAFE_Split/String_UNSAFE_SplitBy
string.TrimStart -> String_UNSAFE_TrimStart
string.TrimEnd   -> String_UNSAFE_TrimEnd

a.Length           -> str.length
a == b             -> String_Equals
string.Equals      -> String_CaslessEquals (StringComparison.OrdinalIgnoreCase)
string.IndexOf     -> String_IndexOf/String_IndexOfConst
string.LastIndexOf -> String_LastIndexOf
string.StartsWith  -> String_CaselessStarts (StringComparison.OrdinalIgnoreCase)
string.EndsWith    -> String_CaselessEnds   (StringComparison.OrdinalIgnoreCase)
string.CompareTo   -> String_Compare

string.Format -> String_Format1/2/3/4

Note: I modelled cc_string after C# strings, hence the similar function names

C++ comparison

A rough mapping of C++ std::string API to ClassiCube's string API:

std::stof  -> Convert_ParseFloat
std::stoi  -> Convert_ParseInt
std::stoul -> Convert_ParseUInt64

string::append -> String_AppendString/String_AppendConst
b = a;         -> String_Copy
string::insert -> String_InsertAt
string::erase  -> String_DeleteAt

string::substr -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt

string::length  -> str.length
a == b          -> String_Equals
string::find    -> String_IndexOf/String_IndexOfConst
string::rfind   -> String_LastIndexOf
string::compare -> String_Compare

std::sprintf -> String_Format1/2/3/4

Detailed lifetime examples

Managing the lifetime of strings is important, as not properly managing them can cause issues.

For example, consider the following function:

const cc_string* GetString(void);

void PrintSomething(void) {
	cc_string* str = GetString();
	// .. other code ..
	Chat_Add(str);
}

Without knowing the lifetime of the string returned from GetString, using it might either:

  • Work just fine
  • Sometimes work fine
  • Cause a subtle issue
  • Cause a major problem ptodo rearrange

Constant string return example

const cc_string* GetString(void) {
	static cc_string str = String_FromConst("ABC");
	return &str;
}

This will work fine - as long as the caller does not modify the returned string at all

Stack allocated string return example

const cc_string* GetString(void) {
	char strBuffer[1024];
	cc_string str = String_FromArray(strBuffer);
	
	String_AppendConst(&str, "ABC");
	return &str;
}

This will almost certainly cause problems - after GetString returns, the contents of both str and strBuffer may be changed to arbitary values (as once GetString returns, their contents are then eligible to be overwritten by other stack allocated variables)

As a general rule, you should NEVER return a string allocated on the stack

Dynamically allocated string return example

const cc_string* GetString(void) {
	char* buffer   = Mem_Alloc(1024, 1, "string buffer");
	cc_string* str = Mem_Alloc(1, sizeof(cc_string), "string"); 
	
	*str = String_Init(buffer, 0, 1024);
	String_AppendConst(str, "ABC");
	return str;
}

This will work fine - however, now you also need to remember to Mem_Free both the string and its buffer to avoid a memory leak

As a general rule, you should avoid returning a dynamically allocated string

UNSAFE mutable string return example

char global_buffer[1024];
cc_string global_str = String_FromArray(global_buffer);

const cc_string* GetString(void) {
	return &global_str;
}

Depending on what functions are called in-between GetString and Chat_Add, global_str or its contents may be modified - which can result in an unexpected value being displayed in chat

This potential issue is not just theoretical - it has actually resulted in several real bugs in ClassiCube itself

As a general rule, for unsafe functions returning a string that may be mutated behind your back, you should try to maintain a reference to the string for as short of time as possible

Reducing string lifetime issues

In general, for functions that produce strings, you should try to leave the responsibility of managing the string's lifetime up to the calling function to avoid these pitfalls

The example from before could instead be rewritten like so:

void GetString(cc_string* str);

void PrintSomething(void) {
	char strBuffer[256];
	cc_string str = String_InitArray(strBuffer);
	GetString(&str);
	
	// .. other code ..
	Chat_Add(&str);
}