Jump to content

Recommended Posts

Posted

Hey guys,

As I'm refactoring one of our websites completely I thought it's nice to start from scratch and make use of new multi-instance support to import data from the old site.

Most of it works nice so far, except I'm encountering a little (I guess) UTF-8 issue as the imported content within the new site has some black diamonds with question marks in it.

I tried utf8_en/decode and combinations in any order, $sanitizer->purify / ->entities / ->unentities, htmlspecialchars and various other possible solutions with no luck
Then I tried the same grabbing and inserting the body field directly from and to the db with no difference..
Both db's are utf8_general_ci and pw is at 3.0.33

As a side note, no question mark diamonds in the old (nor the new) db and old page content works as expected of course
Uh I also tried to set charsets using php's header and ini_set functions..

Hopefully one got another idea :)

Posted

Hi @can, I suggest you to use an editor like Geany wich support multiples charactes set and open your .sql dump. If is too big, make a smaller dump that contains problematic data.

Then in Geany reload it as ISO-8859-15 and see what happen, if you see accents, ñ and other caracters then you discovered the right codification for the file.

Sometimes is a problem at sql file level, maybe created from a non UTF terminal or something.

 

Hablas español? Veo que estas en Perù. Yo hablo español. Antes vivia en Venezuela.

 

Saludos.

Posted

How are you doing the import?

I remember I had a tough time with UTF-8 and German Umlauts when I implemented a Spreadsheet Content importer.

I had to get the CSV file in a particular format (UTF8-BOM or Byte Order Marker) for it work.

Below is what i wrote in the Header of that spreadsheet code.
 

Quote

*  Important : To get this working, the utf8.csv file that it imports must be in UTF8-BOM (Byte Order Marker) format
*  Best to save the XLS file into TSV, then open it with Sublime Text and encode it with UTF8-BOM before
*  saving it down again for processing.

 

Posted

I'm going to try your suggestions guys, actually the german umlauts work..the places where I'm getting the question marks seem to be just whitespace in the old and the new databse

Import happens using the API (should've filed the question in the API section..), so I'm bootstrapping the old instance like
 

$old = new ProcessWire('../old/', 'http://old.dev');

$categories = $old->pages->find("template=forum-category");
foreach ($categories as $cat) {
	$p = new Page();
	$p->template = 'post';
	$p->parent = $parent;
	$p->title = $cat->title;
	..
}

and so on...

Uh and I also tried setting output formatting on and of.. $p->of(false)

@Francesco Bortolussi Estoy aprendiendo todavia..poco a poco ;) Ah ja y ahora vives en los estados unidos o por donde? ¿Ha leído de nosotros proyecto? enlace en mi firma...

Posted

Couple of more suggestions :

1) Have you tried grabbing 1 of the fields with the problematic "spaces" and pasting it on a text editor like Sublime Text? Are you definitely sure it's just 'whitespace'?


2) Have you tried using functions like str_replace when doing the copying, like replacing the space with a space?

Posted

@can que bueno. Yo me encuentro en Italia. 

Y pues si, latinoamerica se presta mucho para proyectos como el tuyo: mucha tierra, clima y gente.

En donde vivia Venezuela caería de maravilla tu proyecto ya que ni comida hay. 

Exito.

 

PD: all other's forum member's sorry for the off topic.

  • Like 1
Posted
15 hours ago, FrancisChung said:

Couple of more suggestions :

1) Have you tried grabbing 1 of the fields with the problematic "spaces" and pasting it on a text editor like Sublime Text? Are you definitely sure it's just 'whitespace'?


2) Have you tried using functions like str_replace when doing the copying, like replacing the space with a space?

1) pasting from db (adminer) to sublime it looks the same, just like whitespace

2) like str_replace(' ', ' ', $body) ? no difference

Aha..thanks to your suggestion I then tried preg_replace('/\s+/', ' ', $body) and it worked! :D

Thanks guys :)

So what exactly happened here? what are those mysterious falty white spaces in reality?

 

@Francesco Bortolussi hemos encontrado algunos italianos aka en el perú haha :D

  • Like 1
Posted

I cheered too soon^^ not yet solved properly..instead of the "diamond" question mark icons I'm left with regular question marks which I can't just search and replace because the content contains question marks..

Any ideas?

Posted
5 hours ago, BitPoet said:

Have a look here, your problem sounds quite similar to what is described in the article.

Thanks for the article, Bitpoet. 
Bookmarked it and had a scan through read.

Here's hoping I won't ever need it in the future

Posted
On 21.9.2016 at 11:37 PM, BitPoet said:

Have a look here, your problem sounds quite similar to what is described in the article.

Thanks @BitPoet, didn't had the time to follow the instructions so far..

but all db's/tables are utf8 already and were installed using utf8..I'm going to further check this and work the instructions of your linked article..

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...