C language string

I will explain about C language strings . In C language, there is no type called a character string, and a character string is expressed using a array of characters. C language strings end with "\ 0". You can print a string in the format "%s" of the printf function.

#include <stdio.h>

int main (void) {
  // The string is represented by an array of characters and ends with "\ 0".
  char string [4];
  string [0] ='a';
  string [1] ='b';
  string [2] ='c';
  string [3] ='\ 0';
  
  // Output string
  printf("%s\n", string);
}

This is the output result.

abc

char type is a type that expresses characters

char type is a type that expresses characters in C language and is defined as "signed or unsigned 8-bit integer". It is thought that it is supposed to assign an ASCII character code.

Character literals

"'A'" is a character literal. Converted to the corresponding ASCII code. If it is "'a'", it will be converted to 97.

String literal

You can easily represent strings using string literals. A string literal returns the address at the beginning of the string array, so char * type You will receive it at>. It's a good idea to add the const modifier so that the string doesn't change. The following is the recommended C language expression for expressing a character string.

#include <stdio.h>

int main (void) {
  // String literal
  const char * string = "abc";
  
  // Output string
  printf("%s\n", string);
}

Dynamically create strings

To create a string dynamically, use the calloc function to allocate memory. When you are done using it, release it with the free function.

#include <stdio.h>
#include <stdlib.h>

int main (void) {
  // Allocate memory for strings of length 3.
  // Since the end of the C language string must be "\ 0", reserve 1 byte more.
  char * string = calloc (4, sizeof (char));
  
  string [0] ='a';
  string [1] ='b';
  string [2] ='c';
  string [3] ='\ 0';
  
  printf("%s\n", string);

  // Release when finished using
  free (string);
}

This is the output result.

abc

Since the C language character string ends with "\ 0", there is no problem even if you reserve extra memory. It can take up a lot of memory space if you need to add it instead of a constant string. This is an example of allocating a memory area of ​​100 bytes.

#include <stdio.h>
#include <stdlib.h>

int main (void) {
  // Allocate memory for strings of length 3.
  // Since the end of the C language string must be "\ 0", reserve 1 byte more.
  char * string = calloc (100, sizeof (char));
  
  string [0] ='a';
  string [1] ='b';
  string [2] ='c';
  string [3] ='\ 0';
  
  printf("%s\n", string);
  
  // Release when finished using
  free (string);
}

This is the output result.

abc

Reallocate memory area for strings

To reallocate the memory area of ​​the string, allocate new memory and use the memcpy function to copy the previous string. Frees the old memory area and reassigns the new string.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

int main (void) {
  // Allocate memory for strings of length 3.
  // Since the end of the C language string must be "\ 0", reserve 1 byte more.
  int32_t capacity = 4;
  char * string = calloc (capacity, sizeof (char));
  
  string [0] ='a';
  string [1] ='b';
  string [2] ='c';
  string [3] ='\ 0';
  
  printf("Before%s\n", string);

  // I want the length to be 5
  int32_t new_capacity = 6;
  char * new_string = calloc (new_capacity, sizeof (char));
  
  // Copy the previous string
  memcpy (new_string, string, capacity --1);
  
  // Add new characters
  new_string [3] ='d';
  new_string [4] ='e';
  new_string [5] ='\ 0';
  
  // Free the old string
  free (string);
  
  // Replace with a new string
  string = new_string;
  new_string = NULL;
  
  printf("After%s\n", string);

  free (string);
}

This is the output result.

Before abc
After abcde

Reallocating the memory area can also be done using the realloc function, which is fast, but realloc is , Implementation differs depending on the processing system, and I have experienced that bugs are difficult to reproduce, so if there is no performance problem, I feel that it is safer to reallocate memory using calloc.

String length

Use the strlen function to get the length of the string. You can use the strlen function by reading string.h.

#include <string.h>
size_t strlen (const char * s);

In C language, there is a promise that strings end with "\ 0". The strlen function assumes this convention and calculates the length of the string. In other words, it loops and counts the number of characters until "\ 0" is found. Conversely, if the string does not end with "\ 0", it will go to an unintended memory area and a buffer overrun will occur.

When using strlen, make sure that you are using it for strings ending in "\ 0".

This is a sample to find the length of a character string with the strlen function.

#include <string.h>
#include <stdint.h>
#include <stdio.h>

int main (void) {
  const char * string = "Hello";
  
  int32_t string_length = strlen (string);
  
  printf("%d\n", string_length);
}

This is the output result.

Five

Character search

Use the strchr function to search for characters. There is also a strrchr function that searches for characters from the back.

String search

Use the strstr function to search for a string. This is a sample to check if a string is included with the strstr function. It is OK if you confirm that the return value is not NULL.

#include <string.h>
#include <stdint.h>
#include <stdio.h>

int main (void) {
  const char * message = "I like orange\n";
  const char * match = "orange";
  
  if (strstr (message, match)! = NULL) {
    printf("Match\n");
  }
  else {
    printf("Not Match\n");
  }
}

This is the output result.

Match

Copy of string

Let's copy the string. strlen function, and allocate the memory area with the calloc function , memcpy function to copy the string.

#include <string.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main (void) {
  const char * string1 = "Hello";
  
  int32_t string1_length = strlen (string1);
  
  char * string2_tmp = calloc (string1_length + 1, sizeof (char));
  memcpy (string2_tmp, string1, string1_length);
  
  const char * string2 = string2_tmp;
  string2_tmp = NULL;
  
  printf("%s\n", string2);
  
  free ((void *) string2);
}

This is the output result.

Hello

Other ways to copy strings

To copy a string, strcpy function or strncpy function at //www9.plala.or.jp/sgwr-t/lib/strncpy.html".

Concatenation of strings

How do you concatenate strings in C? Concatenate two strings and return the result. Well, I'm worried. "Well, do you do it yourself?" Yes. that's right. Calculate the length of the two strings, calculate the length of the string after concatenation, and copy the original strings in order.

#include <string.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main (void) {
  const char * string1 = "ab";
  const char * string2 = "cde";
  
  int32_t string1_length = strlen (string1);
  int32_t string2_length = strlen (string2);
  
  int32_t string3_length = string1_length + string2_length;
  char * string3_tmp = calloc (string3_length + 1, sizeof (char));
  
  memcpy (string3_tmp, string1, string1_length);
  memcpy (string3_tmp + string1_length, string2, string2_length);
  
  const char * string3 = string3_tmp;
  string3_tmp = NULL;
  
  printf("%s\n", string3);
  
  free ((void *) string3);
}

This is the output result.

abcde

"Do you do this every time?" "Yes. If necessary." "Why don't you do this kind of processing many times?" "It comes out many times (laughs)."

Other ways to concatenate strings

If you want to add a string to the end of yourself, strcat function or You can also use a href = "http://www9.plala.or.jp/sgwr-t/lib/strncat.html"> strncat function.

String replacement

How do you replace strings in C? Well, I'm worried. "Well, do you do it yourself?" Yes. that's right.

Let's write a process to replace "Foo::Bar::Baz" with "Foo / Bar / Baz".

#include <string.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main (void) {
  const char * module_name = "Foo::Bar::Baz";
  int32_t module_name_length = strlen (module_name);
  
  // It is OK to secure a large amount
  char * module_path_tmp = calloc (module_name_length, module_name_length);
  
  int32_t module_path_tmp_index = 0;
  for (int32_t i = 0; i <module_name_length; i ++) {
    
    if (module_name [i] ==':' && module_name [i + 1] ==':') {
      module_path_tmp [module_path_tmp_index] ='/';
      i ++;
      module_path_tmp_index ++;
    }
    else {
      module_path_tmp [module_path_tmp_index] = module_name [i];
      module_path_tmp_index ++;
    }
  }
  
  const char * module_path = module_path_tmp;
  module_path_tmp = NULL;

  printf("%s\n", module_path);
  
  free ((void *) module_path);
}

This is the output result.

Foo / Bar / Baz

String function

Functions for strings in C can be found in string.h.

Can Unicode be handled in C?

I will introduce utf8proc as a library that can handle Unicode in C language. UTF-8, UTF-16, UTF-32 conversion is possible.

I won't go into details about how to handle Unicode, but I'll briefly describe what I'm thinking about.

It is difficult to handle Unicode only with C language standard functions and C language specifications.

About UTF-8

The character string output can be output as it is if the source code character string is UTF-8 and the environment is UNIX / Linux / Mac.

UTF-8 is upward compatible with ASCII code, so you can use the C language library created in the past as it is.

When using C language for the purpose of the Web, it seems better to handle character strings in UTF-8. UTF-8 is just a byte string and you don't have to be endian.

A simple byte string means that in C language, a UTF-8 string can be expressed as follows. Suppose you saved the source code in UTF-8.

// UTF-8 representation in C
const char * string = "aiueo";

About multibyte characters, UTF-16, UTF-32

The C language has a mechanism for handling multibyte characters. There are two types of multibyte characters, double-byte characters and 4-byte characters.

Double-byte characters are, for example, a code that expresses one character in two bytes such as Shift_JIS, and a Unicode encoding method that expresses one character in principle two bytes such as UTF-16LE and UTF-16BE. It is expected to be used.

4-byte characters are supposed to store UTF-32, a Unicode coding system, or Unicode code points. UTF-32 has a one-to-one correspondence with Unicode code points. In other words, UTF-32 is a Unicode code point.

Double-byte characters have to be used on Windows. This is because the internal character code of Windows is UTF-16LE, and the Windows API compiled in the Unicode environment receives the UTF-16LE string.

Unlike UTF-8, double-byte characters and 4-byte characters cannot be output easily. It's like "I want to see the contents, but I can't see it."

UTF-8 is good. But the reality is that it's not the only thing, it needs conversion. This is the reality ~. Like.

Multibyte characters aren't used to represent UTF-8, so don't be afraid to keep them in your head.

Associated Information