char --Type that represents a character

char is a type that represents a character. The char type is supposed to represent ASCII character codes (0 to 127). The char type is defined as an 8-bit signed integer type or an 8-bit unsigned integer type.

#Type that represents a character
char

char sample code

This is a sample code using char.

#include <stdio.h>
#include <stdint.h>

int main (void) {
  char ch ='a';
  
  printf("%c\n", ch);
}

"%C" is used in the format specifier of printf function.

Output result.

a

Why is it unclear whether the char type has a sign or not?

It is thought that this is because there was a processing system that implemented char as an 8-bit signed integer and a processing system that implemented it as an 8-bit unsigned integer for historical reasons. Therefore, in the C language specification, char is defined as an 8-bit signed integer or an 8-bit unsigned integer.

Is it guaranteed that the width is 8bit?

Yes, the width is guaranteed to be 8 bits.

Use int8_t and uint8_t when dealing with 8-bit numbers

When expressing an 8-bit number instead of a character, signed and unsigned are defined int8_t, uint8_t.

Should I use int8_t and uint8_t in all places instead of char?

Ideally, that's right. However, for historical reasons, in C language, the argument types of functions that receive characters and strings are "char" and "char *".

Considering passing a character or string as a function argument, using "char" to represent the character and "char *" to represent the string is concise in that it does not require a typecast. It will be easier to understand.

Also, in the case of characters, there is not much trouble because it is not possible to distinguish between signed and unsigned. If you are in trouble, you can only typecast to int8_t or uint8_t.

Rather than replacing everything, it's one of the best ways to reconcile the past, present and future, while respecting the widespread description and using new and useful features where it's needed. I think it's one.

Is it possible to express Unicode in C?

First of all, it is assumed that "Unicode which is a code point" and "UTF-8, UTF-16 and UTF-32 which are actual coding" can be distinguished.

Unicode code point

Unicode (the de facto standard UCS-4) expresses one character with a maximum of 31 bits. In other words, if you want to express it comprehensively, you need 4 bytes. In other words, int32_t type is required. The 32nd bit of the largest bit is guaranteed not to be used by the specifications. , uint32_t type does not need to be used.

You can also use char32_t added in C11.

UTF-8

UTF-8 is a coding format that encodes Unicode code points with a variable length of 1 to 4 bytes. UTF-8 is just a byte string and you don't have to be endian. That is, a character or string can be represented as an array of chars.

// UTF-8 representation in C (source code saved in UTF-8)
const char * ch = "A";
const char * string = "aiueo";

UTF-8 is a printf function and can be output normally.

printf("%s\n", string);

UTF-32

UTF-32 is exactly the same as the Unicode code point, except that it is coded. Read the Unicode code point description.

UTF-32 needs to be endian conscious.

UTF-16

UTF-16 is an encoding format that expresses Unicode in 2 bytes. However, the characters that need to be represented in Unicode exceed the set of characters that can be represented in 2 bytes. In that case, it is expressed using 4 bytes by a method called surrogate pair.

Normally, it is 2 bytes, so in C language, one character can be represented by uint16_t, and if it is 4 bytes, it can be represented by two uint16_t.

You can also use char16_t added in C11.

UTF-16 needs to be endian conscious.

Associated Information