Jump to content

ProcessWire MySQL utf8mb4


Raymond Geerts
 Share

Recommended Posts

I'm importing data from the freebase.com database and having some troubles with 4 byte characters.

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x92\x93\xF0\x9F...' for column 'data' at row 1

Doing some research i found out the database fields that use now utf8_general_ci collation all need to be utf8mb4_unicode_ci. But also the connection needs utf8mb4 and SET NAMES need to be utf8mb4.

mysql_query("SET CHARACTER SET utf8mb4");
mysql_query("SET NAMES utf8mb4");

1) Would ProcessWire (PDO) be able to make connection with these settings? There is a setting for this in the config but there its mentioned its depricated. Seems the PDO by default SET NAMES utf8

From the config file:

/**
 * Optional 'set names utf8' for sites that need it (this option is deprecated)
 *
 * This may be used instead of the $config->dbCharset = 'utf8' option, and exists here only for
 * backwards compatibility with existing installations. Otherwise, this option is deprecated.
 *
 * $config->dbSetNamesUTF8 = true;
 *
 */
$config->dbSetNamesUTF8 = true;

/**
 * Optional DB socket config for sites that need it (for most you should exclude this)
 *
 * $config->dbSocket = '';
 *
 */

2) I have ProcessWire installed, would it be possible to convert it from utf8_general_ci to utf8mb4_unicode_ci

I found this in the config file that is located in the wire folder:

/**
 * Database character set. utf8 recommended.
 *
 * Note that you should probably not add/change this on an existing site. i.e. don't add this to
 * an existing ProcessWire installation without asking how in the ProcessWire forums.
 *
 */
$config->dbCharset = 'utf8';

Probably need to change this to utf8mb4 then but as it mentions i should ask here, does anybody have some more information what to do to make this work?

Edit: At the moment i converted the database and fields by modifying the mysql export and re-importing it again.

Now it has:

CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

  • Like 1
Link to comment
Share on other sites

OK it seems to work now. I have done the following (from a already installed ProcessWire):

- Export database
- Modified the lines in the exported file that have

ENGINE=MyISAM DEFAULT CHARSET=utf8;

to be

ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

- Imported file in to the database
- Created a MySQL config file (in my case /etc/my.cfg) and added the following lines

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

- Restarted MySQL

- Modified the following ProcessWire files:

/wire/config.php
line 47: $config->dbCharset = 'utf8mb4';

/wire/core/Database.php
line 72: else if($config->dbSetNamesUTF8) $this->query("SET NAMES 'utf8mb4'");

/wire/core/Fieldtype.php
line 421: 'xtra' => 'ENGINE=MyISAM DEFAULT CHARSET=utf8mb4',

/wire/core/WireDatabasePDO.php
line 82: PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8mb4'",


This seems to be working, i didnt get the MySQL PDO error anymore about the \xF0\x9F\x92\x93\xF0\x9F character

Altough i'm not so happy to have modified the files in the wire folder, since these are core files and will be overwritten as soon as i update the wire folder by replacing the files inside. Would be nice to be able to set this value somewhere in a config file.

Update: a recent update in the PW DEV branche includes the ability to set the charset in the configuration file.

The issue is discussed here:

https://github.com/ryancramerdesign/ProcessWire/issues/452

I can confirm this works fine, i have put the following line in the config file located in the site folder: $config->dbCharset = 'utf8mb4';
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

  • 9 years later...

I had the same problem today. I can confirm that the solution above works. Just add $config->dbCharset = 'utf8mb4'; to the config.php file in the site folder.

My DB collate is set on utf8mb4_general_ci and it works perfectly. I can past emoji character in a TinyMCE field. Thank you @Raymond Geerts

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...