Emoji has become an essential part of communication in our digital lives. Hence, as developers, our applications should provide first-class support to them. In this blog post, we will explore how to store Emoji's in MySQL database.
In one of our internal Ruby on Rails web application using MySQL database, we were facing an issue whenever a user tried to insert an emoji in the text of an article. The "save" functionality in UI ran infinitely (ajax) and logs were reporting following errors:
ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x8A ...' for column 'content' at row 1: INSERT INTO posts (content, subject, user_id, created_at, updated_at, url_token) VALUES
Issue was, our MySQL database was using encoding UTF-8
which supports only 3 bytes per character
. But to save the emoji we require 4 bytes per character
and because of that UTF-8
character set is not enough to store the emoji. The workaround to support 4 bytes character set is utf8mb4
which was introduced by MySql in 2010. To support emoji's, following is a simple step by step approach that helps in modifying the encoding from utf8
to utf8mb4
.
- For safety purpose, take backup of your existing database.
- Add the following configuration setting in section
[mysqld]
of filemy.cnf
[mysqld]
default-storage-engine=InnoDB
innodb_file_format=barracuda
innodb_file_per_table=1
innodb_large_prefix=1
init_connect='SET collation_connection = utf8mb4_unicode_ci'
init_connect='SET NAMES utf8mb4'
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
Add following mysql configuration in section [client]
of my.cnf
[client]
default-character-set = utf8mb4
Add following mysql configuration in section [mysql]
of my.cnf
[mysql]
default-character-set = utf8mb4
Now restart the mysql
server.
3. Change the database.yml
settings to have following multibyte characters support:
encoding: utf8mb4
collation: utf8mb4_unicode_ci
utf8mb4
allows the string to support 4 bytes per character.utf8mb4_unicode_ci
uses the Unicode Collation Algorithm as defined in the Unicode standards.
4. Then add a migration to alter the database and table. Add following code in the migration
def db
ActiveRecord::Base.connection
end
def alter_column_to_utf8mb(table, column)
default = column.default.blank? ? '' : "DEFAULT '#{column.default}'"
null = column.null ? '' : 'NOT NULL'
execute "ALTER TABLE `#{table}` MODIFY `#{column.name}` #{column.sql_type.upcase}
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci #{default} #{null};"
end
def up
execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;"
db.tables.each do |table|
execute "ALTER TABLE `#{table}` ROW_FORMAT=DYNAMIC CHARACTER SET
utf8mb4 COLLATE utf8mb4_unicode_ci;"
db.columns(table).each do |column|
case column.sql_type
when /([a-z]*)text/i
alter_column_to_utf8mb(table, column)
when /varchar\(([0-9]+)\)/i
alter_column_to_utf8mb(table, column)
end
end
end
end
First ALTER
command changes the encoding and the COLLATE
for the database for each table, we need to change the CHARACTER SET
and COLLATE
which we are doing in second ALTER
command. After that, we need to ALTER
the text and varchar columns to set CHARACTER SET
and COLLATE
with default and null attributes of the column.
5. Add initializer file config/initializers/ar_innodb_row_format.rb
require 'active_record/connection_adapters/abstract_mysql_adapter'
module ActiveRecord
module ConnectionAdapters
class AbstractMysqlAdapter
NATIVE_DATABASE_TYPES[:string] = { name: 'varchar', limit: 191 }
end
end
end
This initializer ensures to set column limit to 191
for the column type varchar at the time of executing ALTER
or CREATE
command on the table. Why length 191
? MySQL indexes are limited to 768 bytes
because the InnoDB
storage engine has a maximum index length of 767 bytes
. This means that if we increase VARCHAR(255)
from 3 bytes per character
to 4 bytes per character
, the index key is smaller in terms of characters. That's why we need to change the length from 255
to 191
.
😊 Hurray! we are ready to use emoji. Hope this will help!
Reference link: https://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4