UTF-8 string

This is a simple implementation of UTF-8 strings in C .

Implementation

UTF8string is based on std::string provided by the standard C library but has been implemented to support UTF-8 encoded strings.

Some functions have been adapted for utf8 strings :

utf8_length : get number of characters in a string (number of codepoints).
utf8_size : get get the memory size of the string (in byte).
utf8_find : find a utf8 substring in the current string.
utf8_substr : get a utf8 substring of the current string.
utf8_at : get the codepoint at a specified position.
utf8_pop : remove the last codepoint of the string.

Usage

You just need to include all of the .hpp and .cpp files from src/ in your project. For each file that uses UTF8string, include this piece of code :

#include "utf8_string.hpp"

Code example

UTF8string u8("がんばつて Gumichan");
UTF8string sub = u8.utf8_substr(0,5);
size_t pos = u8.utf8_find(UTF8string("chan"));
size_t sz  = u8.utf8_size();
size_t l   = u8.utf8_length();

std::cout << "u8 string: " << u8 << "\n";
std::cout << "utf8 substring from 0 to 5: " << sub << "\n";
std::cout << "utf8 codepoint at 2: " << u8.utf8_at(2) << "\n";
std::cout << "utf8 string \"chan\" at " << pos << "\n";
std::cout << "u8 string - memory size: " << sz << "; length: " << l << "\n\n";

for (auto s: sub)    // or for (const std::string& s: u8)
{
    std::cout << "-> " << s << "\n";
}

Output :

utf8 string: がんばつて Gumichan
utf8 substring from 0 to 5: がんばつて
utf8 codepoint at 2: ば
utf8 string "chan" at 10
u8 string - memory size: 24; length: 14

-> が
-> ん
-> ば
-> つ
-> て

Project that uses UTF8string

LunatiX

License

This library is under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
src		src
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UTF-8 string

Implementation

Usage

Code example

Project that uses UTF8string

License

About

Releases

Packages

Languages

License

Gumichan01/utf8_string

Folders and files

Latest commit

History

Repository files navigation

UTF-8 string

Implementation

Usage

Code example

Project that uses UTF8string

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages