Jump to content

Running a Daily Newspaper website with Process Wire.


Crssp
 Share

Recommended Posts

Hey all, I've been asking this question around in different CMS forums.

After discovering a newbie on the block aka Statamic, it sort of clarified what I need to do which is work with Flat Files, for some of the data.

Is there a way with Process wire to dynamically pull in or include flat file text files for the news articles from the folder structure where they are stored ???

Our Text file stories get archived in folders with paths such as:

/issues/2013/Jan/08/ar_news_010813_story1

/issues/2013/Jan/08/ar_news_010813_story2 etc.

/issues/2013/Jan/08/ar_sports_010813_story1 and so on.

Any way you can see to use Process wire to pull in the articles wrapped in Text tags <!--Text--> to other templates.

The story text files only include strong, bold, Italics, or EM tags and line break html tags.

From there you have a daily index, also archived, with read more links to get to each individual story from a combined index/ front page.

Complicated beast of a project, at least for me... Any thoughts, advice, strategies, recommendations are welcome

Anything I can clarify let me know...

-Crssp

Link to comment
Share on other sites

Hi Crrsp

I certainly would not use flat files for a news website. News websites work best with a relational database like MySQL, which is what ProcessWire runs on.

I have worked on several news sites over the years and the one question I always start by asking is:

How many authors/editors will there be and what sort of editorial process does there need to be?

The problem with a lot of CMSs is they kind of assume that you might have one pile of people that write articles/edit articles/publish articles and then some others that just write. And that is just like a newspaper!

Of course, it is nothing like a newspaper at all. Most news sites are really not much different to a blog on steroids.

Once you can work out whether your site will have a complicated editorial structure or whether it is going to be very few people and a very simple editorial structure, it becomes much easier to work out how to put the site together.

The beauty of ProcessWire is that it does not assume anything - it is designed to be moulded to exactly how you want to work.

Once you have worked that out, then it is a case of mapping out the actual structure, how you want to handle different types of articles, how you want to handle images and so on and then creating the fields and templates to achieve your aims. Sorry, that is a bit over simplified, but I think you will see what I mean!

Joss

Link to comment
Share on other sites

The News articles all get pushed out from another system entirely, and shoved to a server on the network.

We still have a healthy print product, believe it or not.

They could be imported into a database, but my designer brain blows up at that point, lol.

Thanks for your reply Joss.

Most Stories have no image associated in this hard-boiled news product.

The stories begin their life in Adobe InCopy before being pushed into server land.

Sorry for the vague answers, it's only because my understanding of parts of the process are that vague.

Link to comment
Share on other sites

Not a problem!

It maybe that one way or another you will have to deal with each article individually because of categorising, adding comments, adding meta descriptions, adding a summary for listings, author by-lines or other reasons. And you will probably want the final result to be searchable, and that means it needs to be in a searchable form.

Articles in a print magazine and articles in a site have some distinct differences simply because the "user interface" is different. In the print world, NOTHING is automated - most articles are presented in a complete form with only some getting a lead in from the front page. On a website, often too much is automated - I have a personal dislike for summaries on listing pages that are the first few words of the intro that is then repeated in the article; unprofessional and unimaginative. But unless you want a huge staff you need some automation, but that means that the articles may need to be broken down into useful bits.

It strikes me that you could have two issues here - a site starting from this point in time and what is the best way to present the news information, and what to do with existing articles.

Depending on how they are laid out in a text file, one of the clever chaps here could probably write you a small php script that can import text files in batches into the data base (They have now set up a jobs board in the forum).

With new articles, "import" is probably a case of copying and pasting so that you have those articles in exactly the form you want them.

Link to comment
Share on other sites

Our archive goes back digitally to the 1998 or something, not sure we will want all of that.

Manually copy pasting isn't practical at all.

Hence I was liking the flat files way of access.

Site search is a great point.

We have an existing website product, but the whole thing there is built on .asp technologies.

We've lost the keys to that car, you might say through staff attrition.

Thanks again!

Link to comment
Share on other sites

It shouldn't be too difficult to come up with a script to import the text in the flat files to pw pages.

Years ago I had a similar problem and a friend knocked up about 20 lines of Perl that did the job. We were helped by the fact that the text files had a very precise layout, though, which he could get the script to recognise.

Joss

Link to comment
Share on other sites

All of this brings me to another point, there could be a database somewhere, if there is, it isn't going to be mySQL though I don't think.

One of the IT guys here might be able to answer that for me.

It's a bit of a nightmare picking the website we have now apart.

It's got classic .asp components, and dot net article redirector doo-dads all built in.

Everybody poo-poo'ld PHP and/or open source technologies in the past MS technologies escape me totally and they reinvent the wheel all the time over there at that $$$ shop [insert more microsoft rants, lol...].

I don't seem to be able to get my head around any of the data-bits regardless.

Everything magically works, on the current site, but we are getting tasked with updating the behemoth.

The current web head, thinks we can pull what we need via the working RSS feed, I have my doubts on that.

Link to comment
Share on other sites

HI Crssp

It does sound like a nightmare waiting to happen and probably needs a bit of a unified approach.

There is a good chance that it is backed by a database - depending on how old it is and how much money was thrown at it, it might be MySQL or it might be Microsofts solution - very powerful, but at a painful licencing cost!

It will be interesting to see what your IT people say.

Link to comment
Share on other sites

From by brief scan of this topic (I keep falling behind) I'm with diogo in that it wouldn't be too hard to import them with a script and use the API to put them into ProcessWire as pages, use the date from the folder as the published date and then you have a nice, structured database for your news stories with search functions and a date-based archive.

I've imported something that sounds a lot harder than what you're after, so if you have those nice tags like <!-- Text --> to delimit different parts of content it should be relatively simple.

Is there any chance you can provide an example article? Also if there are associated images with some of them, are the image files names sensibly or are they just in the folder with the text file (i.e. one folder per article - I think they would be looking at your first post)? If you don't want to post something on the forums but dont mind sending it my way then feel free to PM me and I can have a quick look.

  • Like 3
Link to comment
Share on other sites

Back to my original question about including text files for articles.

It was probably poor practice of me to mention another product CMS, if I wanted a creative solution.

But does anybody have anything regarding pulling in Text files dynamically, must be something?

If the old boys could do it in .Asp must be something, not saying it's the best practice or anything.

Thanks again, importing to db is best idea so far, I'm sure there is currently no DB at least not for loading the articles in a live page.

@diogo It shouldn't be too difficult to come up with a script to import the text in the flat files to pw pages.

You were suggesting importing to the database or to pages then diogo?

Link to comment
Share on other sites

I am no coder, but I imagine you could list all the files in a folder and retrieve them with PhP. You could then store them in an array and hopefully sort them by date or alphabetically - depending how they are titled or how you are identifying blocks of text within the files.

But, this is a huge array if there are a lot of them and I am not sure how that would all pan out. 

You do the same with an image gallery, if you think about it, where the images are stored in folders.  But even then, if you are dealing with a large number of images, it can be more efficient to list them in a database and then retrieve what you need specifically.

Maybe you can generate a one-off "index" of files in a folder and then use that to then retrieve a specific file - that is sort of working in the same way as a database does.

But by the time you do all that, you might as well have imported all the files, to be honest - if that can be automated, then it would take very little time, and once it is done, that is that!

Link to comment
Share on other sites

XML feeds would be a valid solution, without knowing the details of what the content is. But Ryan is doing it aswell in some of his sites to share content between installations.

Other solutions would be to import them from text or html files. I'm doing a reports website of a company where we get static html files and I have a script to import them in 1 click.

It's a matter of reading and parsing the files, create a new page, put them in the textfield and save.

$content = file_get_contents($url_to_file);
$content = str_get_html($content); // using http://simplehtmldom.sourceforge.net/manual.htm
$p = new Page("basic-page");
$p->parent = "/news/";
$p->title = $content->find("h1",0)->plaintext;
$p->body = $content->find("#main",0)->innertext;
$p->save();
  • Like 1
Link to comment
Share on other sites

It's ProcessWire API code, which is basically PHP functions for the most part.

Best thing to do is if you can share an example news post like I mentioned and one of us can probably give you a better idea.

Link to comment
Share on other sites

 how would I run such a script then

Like any other php script, just access to the php file through the url. Wish in PW world means accessing the page that has this in the template.

Link to comment
Share on other sites

While there are certainly more and better things one could do with access to text files (i.e. importing to DB), it certainly wouldn't be hard to pull in a text file directly from your template and display it on every page view. At the simplest level: 

echo  file_get_contents("/path/to/file.txt"); 
  • Like 1
Link to comment
Share on other sites

How do I make that path dynamic, will the docs show any good examples?

There's quite a few variables in the path that will increment, etc.

Some way I could just pull the date string, that creates the path portion such as 2013/Jan/08/

That is why I had the paths in my first post.

/issues/2013/Jan/08/ar_news_010813_story1

Thanks one and all for the suggestions and schooling.

Link to comment
Share on other sites

It's really easy to do the path stuff - you can split out that information by running a PHP explode() function on the path name and then picking out the year, month and day and then have whatever structure you like in PW.

The main thing I keep asking though is more important and harder work - do you have an example of one of the articles so I can see the contents and suggest how you would parse all that?

Basically if I could see that, I can pretty much give you the code to do it as I've got a converter script on my PC here :)

Another question I asked earlier was how images are stored for each article (when they have images)? Are images just in the folder with the file that contains the text?

The paths can stay the same as the text versions if you like - /issues/2013/Jan/08/ for example, but then have a more meaningful title on the end, like "news-story-title" if you like. You could also do away with the date in the path altogether and just have /issues/news-story-title and people can use an archive page to go through the days/months/years.

But I'll rewind a bit as the harder part will be importing the articles. Iterating through your current directories, assuming the path you have for the files is consistent back through the months and years is easy, but I'm itching to see the content of one of the text files to see how hard that side of things might be :)

  • Like 4
Link to comment
Share on other sites

Turns out I was a bit off on these, but I think the true text exists elsewhere in the same format but I have to locate that server and hence the text files.

These are from an asp page, that has just a bit of other stuff (junk) that gets in the way for sure, but here's the article text.

The headline is pulled in as below:

School leaders: Casino tax money doesn't offset cuts

I've learned though that a second program adds some of these templating items, so I may be able to get even more raw text files than that. That would only have line breaks and and or tags.

I got brave and asked one of the IT gents, where this directory is lurking on the network.

I'll get back when I know for sure what I've got to work with, the other more raw text will probably not be saved with the story numbers, and all the file names, might be more easy to work with, each article on it's own line.

Stay tuned, and thanks for pushing me one and all.

:)

Link to comment
Share on other sites

Thanks all there was no attachment.

I realized the articles at that path while they are live, contained some asp templatized code bits (the bad news).

The good news is you all have been a great help already.

The article title was wrapped in html span tags, should be accessible to a script.

Maybe today I can find the non-templatized archive.

The forum works great on mobile by the way :)

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...