ClassiCube/doc/strings.md
2023-11-30 07:20:30 +11:00

329 lines
No EOL
9.7 KiB
Markdown

## Introduction
ClassiCube uses a custom string type rather than the standard C `char*` string in most places
ClassiCube strings (`cc_string`) are a struct with the following fields:
- `buffer` -> Pointer to 8 bit characters (unsigned [code page 437 indices](https://en.wikipedia.org/wiki/Code_page_437#Character_set))
- `length` -> Number of characters currently used
- `capacity` -> Maximum number of characters (i.e buffer size)
Note: This means **STRINGS MAY NOT BE NULL TERMINATED** (and are not in most cases)
You should also read the **Strings** section in the [style guide](/doc/style.md)
## Memory management
Some general guidelines to keep in mind when it comes to `cc_string` strings:
- String buffers can be allocated on either the stack or heap<br>
(i.e. make sure you don't return strings that are using stack allocated buffers)
- Strings are fixed capacity (strings do not grow when length reaches capcity)<br>
(i.e. make sure you allocate a large enough buffer upfront)
- Strings are not garbage collected or reference counted<br>
(i.e. you are responsible for managing the lifetime of strings)
## Usage examples
Initialisating a string from readonly text:
```C
cc_string str = String_FromConst("ABC");
```
Initialising a string from temporary memory on the stack:
```C
// str will be able to store at most 200 characters in it
char strBuffer[200];
cc_string str = String_FromArray(strBuffer);
```
Initialising a string from persistent memory on the heap:
```C
// str will be able to store at most 200 characters in it
char* str = Mem_Alloc(1, 200, "String buffer");
cc_string str = String_Init(str, 0, 200);
```
# Converting to/from other string representations
## C String conversion
### C string -> cc_string
Creating a `cc_string` string from a C string is straightforward:
#### From a constant C string
```C
void Example(void) {
cc_string str = String_FromConst("test");
}
```
#### From a C string
```C
void Example(const char* c_str) {
cc_string str = String_FromReadonly(c_str);
}
```
Note: `String_FromReadonly` can also be used with constant C strings, it's just a bit slower
#### From a C fixed size string
```C
struct Something { int value; char name[50]; };
void Example(struct Something* some) {
cc_string str = String_FromRawArray(some->name);
}
```
### cc_string -> C string
The `buffer` field **should not** be treated as a C string, because `cc_string` strings **MAY NOT BE NULL TERMINATED**
The general way to achieve this is to
1. Initialise `capacity` with 1 less than actual buffer size (e.g. use `String_InitArray_NT` instead of `String_InitArray`)
2. Perform various operations on the `cc_string` string
3. Add null terminator to end (i.e. `buffer[length]` = '\0';
4. Use `buffer` as a C string now
For example:
```C
void PrintInt(int value) {
cc_string str; char strBuffer[128];
String_InitArray_NT(str, strBuffer);
String_AppendInt(&str, value);
str.buffer[str.length] = '\0';
puts(str.buffer);
}
```
## OS String conversion
`cc_string` strings cannot be directly used as arguments for operating system functions and must be converted first.
The following functions are provided to convert `cc_string` strings into operating system specific encoded strings:
### cc_string -> Windows string
`Platform_EncodeString` converts a `cc_string` into a null terminated `WCHAR` and `CHAR` string
#### Example
```C
void SetWorkingDir(cc_string* title) {
cc_winstring str;
Platform_EncodeUtf16(&str, title);
SetCurrentDirectoryW(str.uni);
// it's recommended that you DON'T use the ansi format whenever possible
//SetCurrentDirectoryA(str.ansi);
}
```
### cc_string -> UTF8 string
`String_EncodeUtf8` converts a `cc_string` into a null terminated UTF8-encoded `char*` string
#### Example
```C
void SetWorkingDir(cc_string* title) {
char buffer[NATIVE_STR_LEN];
String_EncodeUtf8(buffer, title);
chdir(buffer);
}
```
# API
I'm lazy so I will just link to [String.h](/src/String.h)
If you'd rather I provided a more detailed reference here, please let me know.
TODO
# Comparisons to other string implementations
## C comparison
A rough mapping of C string API to ClassiCube's string API:
```
atof -> Convert_ParseFloat
strtof -> Convert_ParseFloat
atoi -> Convert_ParseInt
strtoi -> Convert_ParseInt
strcat -> String_AppendConst/String_AppendString
strcpy -> String_Copy
strtok -> String_UNSAFE_Split
strlen -> str.length
strcmp -> String_Equals/String_Compare
strchr -> String_IndexOf
strrchr -> String_LastIndexOf
strstr -> String_IndexOfConst
sprintf -> String_Format1/2/3/4
%d -> %i
%04d -> %p4
%i -> %i
%c -> %r
%.4f -> %f4
%s -> %s (cc_string)
%s -> %c (char*)
%x -> %h
```
## C# comparison
A rough mapping of C# string API to ClassiCube's string API:
```
byte.Parse -> Convert_ParseUInt8
ushort.Parse -> Convert_ParseUInt16
float.Parse -> Convert_ParseFloat
int.Parse -> Convert_ParseInt
ulong.Parse -> Convert_ParseUInt64
bool.Parse -> Convert_ParseBool
a += "X"; -> String_AppendString
b = a; -> String_Copy
string.Insert -> String_InsertAt
string.Remove -> String_DeleteAt
string.Substring -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt
string.Split -> String_UNSAFE_Split/String_UNSAFE_SplitBy
string.TrimStart -> String_UNSAFE_TrimStart
string.TrimEnd -> String_UNSAFE_TrimEnd
a.Length -> str.length
a == b -> String_Equals
string.Equals -> String_CaslessEquals (StringComparison.OrdinalIgnoreCase)
string.IndexOf -> String_IndexOf/String_IndexOfConst
string.LastIndexOf -> String_LastIndexOf
string.StartsWith -> String_CaselessStarts (StringComparison.OrdinalIgnoreCase)
string.EndsWith -> String_CaselessEnds (StringComparison.OrdinalIgnoreCase)
string.CompareTo -> String_Compare
string.Format -> String_Format1/2/3/4
```
*Note: I modelled cc_string after C# strings, hence the similar function names*
## C++ comparison
A rough mapping of C++ std::string API to ClassiCube's string API:
```
std::stof -> Convert_ParseFloat
std::stoi -> Convert_ParseInt
std::stoul -> Convert_ParseUInt64
string::append -> String_AppendString/String_AppendConst
b = a; -> String_Copy
string::insert -> String_InsertAt
string::erase -> String_DeleteAt
string::substr -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt
string::length -> str.length
a == b -> String_Equals
string::find -> String_IndexOf/String_IndexOfConst
string::rfind -> String_LastIndexOf
string::compare -> String_Compare
std::sprintf -> String_Format1/2/3/4
```
# Detailed lifetime examples
Managing the lifetime of strings is important, as not properly managing them can cause issues.
For example, consider the following function:
```C
const cc_string* GetString(void);
void PrintSomething(void) {
cc_string* str = GetString();
// .. other code ..
Chat_Add(str);
}
```
Without knowing the lifetime of the string returned from `GetString`, using it might either:
* Work just fine
* Sometimes work fine
* Cause a subtle issue
* Cause a major problem
ptodo rearrange
### Constant string return example
```C
const cc_string* GetString(void) {
static cc_string str = String_FromConst("ABC");
return &str;
}
```
This will work fine - as long as the caller does not modify the returned string at all
### Stack allocated string return example
```C
const cc_string* GetString(void) {
char strBuffer[1024];
cc_string str = String_FromArray(strBuffer);
String_AppendConst(&str, "ABC");
return &str;
}
```
This will **almost certainly cause problems** - after `GetString` returns, the contents of both `str` and `strBuffer` may be changed to arbitary values (as once `GetString` returns, their contents are then eligible to be overwritten by other stack allocated variables)
As a general rule, you should **NEVER** return a string allocated on the stack
### Dynamically allocated string return example
```C
const cc_string* GetString(void) {
char* buffer = Mem_Alloc(1024, 1, "string buffer");
cc_string* str = Mem_Alloc(1, sizeof(cc_string), "string");
*str = String_Init(buffer, 0, 1024);
String_AppendConst(str, "ABC");
return str;
}
```
This will work fine - however, now you also need to remember to `Mem_Free` both the string and its buffer to avoid a memory leak
As a general rule, you should avoid returning a dynamically allocated string
### UNSAFE mutable string return example
```C
char global_buffer[1024];
cc_string global_str = String_FromArray(global_buffer);
const cc_string* GetString(void) {
return &global_str;
}
```
Depending on what functions are called in-between `GetString` and `Chat_Add`, `global_str` or its contents may be modified - which can result in an unexpected value being displayed in chat
This potential issue is not just theoretical - it has actually resulted in several real bugs in ClassiCube itself
As a general rule, for unsafe functions returning a string that may be mutated behind your back, you should try to maintain a reference to the string for as short of time as possible
### Reducing string lifetime issues
In general, for functions that produce strings, you should try to leave the responsibility of managing the string's lifetime up to the calling function to avoid these pitfalls
The example from before could instead be rewritten like so:
```C
void GetString(cc_string* str);
void PrintSomething(void) {
char strBuffer[256];
cc_string str = String_InitArray(strBuffer);
GetString(&str);
// .. other code ..
Chat_Add(&str);
}
```