Jump to content

Importing csv with line breaks


Recommended Posts

Hello all. Thanks to the wonderful help from this forum I've gotten much further in my project in a lot less time than expected. I'm now trying to figure out the best method of importing the records that will make up the bulk of my site.

I'm using the ImportPagesCSV module and for the most part it is doing exactly what I need. I'm just running into a couple of issues. 

The major one is a lack of line breaks for the body field. I can get the line breaks to appear if I open the .csv in excel or notepad, but they don't seem to carry over when the page is imported. Is there a particular way to format the .csv file that will get the line breaks to transfer? I have searched around for solutions but have not found anything that seems to apply as the solutions seem to vary by application.

The other issue is that when I publish the body of the page in my template, the content of the body has no paragraph tags, which screws up the styles. If I go into the page record and edit the body field manually (merely clicking the "source" button and then saving for example), then paragraph tags will appear, but since I expect to import about 600 records, I'm hoping not to have do this for all of them. 

Any insight on how to fix these issues are appreciated.


Link to comment
Share on other sites

In CSV linebreaks are used to separate individual rows, so a linebreak inside a cell may cause problems. Depending on your CSV parser it could result in an error altogether, it may silently discard the newline, or it may handle it fine. As far as I know fgetcsv can handle newlines as long as the cell is quoted properly. Can you post an example of your CSV? Make sure the fields are properly quoted.

Regarding the paragraph tags, that problem is the result of bypassing the interface and importing directly to the database. The textarea field doesn't care if it contains HTML or regular text with linebreaks. The HTML structure is created by the CK Editor during editing, so if you don't want to edit all imported pages manually, you will have to convert your body fields programmatically. This involves just using str_replace or a regular expression to replace newlines with paragraph tags. You can do that before the import (by modifying your CSV), during the import (apparently you can hook ImportPagesCSV::importPageValue) or after the import (by using $pages->find to find all imported pages, iterate through them and change the body field). Here's a quick and dirty code sample:

$text = $page->body;
$textHTML = '<p>' . preg_replace("/[\n\r]+/", "</p><p>", $text) . '<p>';
$page->body = $textHTML;

You can get very sophisticated with that, like converting single line breaks into <br> tags instead and multiple linebreaks into <p> tags, but it depends on your requirements and source data.

  • Like 3
Link to comment
Share on other sites

I suspected that the paragraph tag issue was something like that. I wasn't sure that carrots could be safely parsed by the import module but if including HTML tags in the string is not an issue it shouldn't be too difficult for me to add them. This actually might be the solution to the line break issue at the same time.

For reference here is the test CSV that I am using currently.

Thanks for the advice! 


Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...