diff options
author | Nicholas <nbnoll@eml.cc> | 2021-11-20 10:53:19 -0800 |
---|---|---|
committer | Nicholas <nbnoll@eml.cc> | 2021-11-20 10:53:19 -0800 |
commit | a9bfe650038afea8b751175cac16f6027345e45f (patch) | |
tree | 9a7f9feb76a64bb3efe573036d80b7bdbf8a59a5 /src/base/utf/internal.h | |
parent | 1c8d4e69205fd875f6bec3fa3bd929c2e7f52f62 (diff) |
Chore: reorganize libutf and libfmt into base
I found the split to be arbitrary. Better to include the functionality
in the standard library. I also split the headers to allow for more
granular inclusion (but the library is still monolithic). The only
ugliness is the circular dependency introduced with libutf's generated
functions. We put explicit prereqs with the necessary object files
instead.
Diffstat (limited to 'src/base/utf/internal.h')
-rw-r--r-- | src/base/utf/internal.h | 37 |
1 files changed, 37 insertions, 0 deletions
diff --git a/src/base/utf/internal.h b/src/base/utf/internal.h new file mode 100644 index 0000000..49945dd --- /dev/null +++ b/src/base/utf/internal.h @@ -0,0 +1,37 @@ +#pragma once + +#include <u.h> +#include <base.h> + +/* + * NOTE: we use the preprocessor to ensure we have unsigned constants. + * UTF-8 code: + * 1 byte: + * 0xxxxxxx + * 2 byte: + * 110xxxxx 10xxxxxx + * 3 byte: + * 1110xxxx 10xxxxxx 10xxxxxx + * 4 byte: + * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx + */ + +#define Tx 0x80u // 0b10000000 transfer header +#define TMask 0x3Fu // 0b00111111 transfer mask + +#define TByte1 0xC0u // 0b11000000 +#define TByte2 0xE0u // 0b11100000 +#define TByte3 0xF0u // 0b11110000 +#define TByte4 0xF8u // 0b11111000 + +#define RuneMask 0x1FFFFFu + +#define Rune1Byte 0x000080u // 1 << 8 (1 byte) +#define Rune2Byte 0x001000u // 1 << 12 (2 bytes) +#define Rune3Byte 0x020000u // 1 << 17 (3 bytes) +#define Rune4Byte 0x400000u // 1 << 22 (4 bytes) + + +/* UTF-16 nonsense */ +#define RuneSurrogateMin 0x0D8000 +#define RuneSurrogateMax 0x0D8FFF |