Jump to content

Recommended Posts

Posted

Hello again!

I'm moving my website: Norwegian to Polish dictionary http://lizus.net (ca. 28k pages) from Glossword to ProcessWire. Everything looks great except one thing: for each page there is created empty subfolder in site/assets/files. So I've got ca. 28k empty and unused folders which takes around 4kB each on my server (sic!). The same situation is with site/assets/cache :/ Is there any solution to improve this?

Best regards,

Remi Turala.

Posted

The ones in /site/assets/cache/Page are optional, as you can disable the cache on any templates where you don't want it to create cache files. But the ones under /site/assets/files/ are required at present, as every page is guaranteed to have a directory placeholder on the file system. So if every one takes up 4k, I'm estimating that's about 109 megabytes (?) that will be required to maintain the 28k pages on the file system.

Posted

Yes, empty pages folders take 109MB :/

I understand that /files folder is necessary but in my opinion it would be better when /pages subfolders will be created when they are needed, for example when I add images to pages. In other case I don't see any sense in current solution.

Maybe pages using same template can have one, common subfolder?

Space on my server is not a big problem, but it's hard to make backups by FTP (too many folders) :/

Posted

PW guarantees a directory for each page so that module developers always know there is a dedicated space for every page on the disk. Though I'll take a closer look, perhaps I can modify this, as it may not be that important. You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved. I'll see what I can do.

We can't use a common directory for a template just because PW wipes out the entire contents of the page's directory when the page is deleted. In this manner, PW will ensure the page really is deleted and not have leftover files from some module or something else. Keep in mind that a page's template can also change, which adds more variables to the mix, so it's best that we don't mix files from multiple pages into the same directory.

The cache files are kept separate from the page's files just so that the cache can be wiped out in one shot.

Posted

It's a great idea (a dedicated folder for every page), but it makes sense when there is stored something. I hope that You will fix that in the next release of PW, because it's very powerfull and fast CMS (I've tried many other systems before I've decided to switch to PW).

Posted
You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved

Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page.

Posted

At the same time though, Linux can have a problem with many files in the same folder, so it's a tricky one to address.

Posted

Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page.

I'll do that after importing pages (I've done that already by ImportPagesCSV module, but I'm still editing them, because I want to improve my dictionary).

After all, I'm impressed by PW, especially by the speed of this CMS. And I haven't switched on caching, yet!

Posted

I think it makes sense for us to not keep around empty directories. I'm planning to make an update that will provide a config option to disable automatic directory creation, limiting the creation to when a file needs to go there. If all seems good after trying that out for awhile, we'll probably make it a default.

  • Like 1
Posted

That sounds great ryan - could you also add a cleanup script to remove empty folders on current sites? I guess this would be something separate since it wouldn't be required for new installs.

I suppose the way to do that now I think about it is: Fetch every template via the API, work out which ones have an image or file field, iterate through every page using those templates and remove any folders for pages that have no content in those fields (via a simple $field->count() maybe?).

Posted

This will be actually a problem with huge sites since Linux has folder limit at some point.

Max number of folders depends on file system used:

http://superuser.com/questions/66331/what-is-the-maximum-number-of-folders-allowed-in-a-folder-in-linux

So if you use ext2 then your max folder count inside a folder is 32,768. In ext4 it is 64k but there is possibility to raise it. Currently these are probably also max number of pages for your PW installation.

Posted

I don't think the limitations are necessarily due to Linux then as I had a short-lived experience with a web host that had imposed a 4,000 file limit (files OR folders) in a given folder, so maybe the limits I'm thinking of are imposed by a setting in Apache.

Something like that anyway. Or maybe the file system was ext 0.5 ;)

Posted

32k-64k is way too small a limit for PW site. I like to think PW would be happy up to a million or more pages. So will have to make sure the file system isn't interfering with the scalability. It seems to make more and more sense that we don't keep empty directories.

Posted

Yep, that would be good solution. Other thing that people do is to have more deeper structure:

/1/1001/image.jpg

/1/1002/another_image.jpg

...

/2/2001/filedump.zip

/2/2002/image.gif

...

/10/10201/file2.jpg

/11/11390/fileX.zip

...

/23/231021/file3.jpg

...

/191/1912621/scalesforovermillionpages.jpg

... etc

Changing file structure might be hard thing to do, but it would prevent scaling issues with pages where you host large amount of files and use pages as containers for them. Deeper folder structure combined with "folders created when first file uploaded" would be best solution, I think.

  • Like 4
Posted

I've seen that sort of thing too Antti with an online shop - they had folders A-Z with each page storing files randomly in subfolders inside one of the letter folders, but your example is much better as it allows to scale a lot further, is far more logical and is a lot easier to program (I don't mean that this would be easy to implement, but it removes the random element I just mentioned in the other software I'd used).

ryan - Antti's suggestion above does make a lot of sense although it sounds a bit crazy if you've never dealt with that many pages before, but even if you were getting to a more comfortable 20,000 pages it would make sense.

I think the thing to think about with this many pages is how various programs interact with files and folders. There are a lot of different web server setups that can run PW, but they probably all have some sort of limit in this regard which you can safely assume will be different, so something like Antti's solution makes sense to me, even with not keeping empty folders around.

Posted

I like that approach. I think that's something we'll probably want to offer as a config option in the future.

  • Like 3
Posted

I'm glad to see that everybody here see this problem :) I hope, that it will be resolved as soon as possible.

Posted

I'm glad to see that everybody here see this problem :) I hope, that it will be resolved as soon as possible.

agreed! while this isn't seemingly a large issue now - i personally have a site right now with over 140 GB of images, from an online image generator (nameless at this time) that has a fairly (ahem!) significant Alexa ranking - and while this is all fine and dandy, i used a filesystem folder ( a single folder! ) to store all those (between 20k and 100k images... ) and at roughly 1,000,000+ files - the simple linux call of "ls -lsa" on that folder takes up to 15 min. to show a listing... and using single php call to bring a single - known filename, takes up to 7 minutes as well... LET alone doing any sort of 'searching for specific values' (ugh!)

So, while i'm no darned twitter[tm] it DOES behoove this to get 'handled' - that's how i fell on this topic to begin with.

[mind you - when i coded that file system NIGHT-mare - i was a complete N00b, idi0t and over-zealous fruitcake, but after launch it blew up in popularity for a few years and - ahem! i would'a had to change a TON to get it 'better' so i left i there. migrations to a new server have been difficult as well, for that very reason above]

Posted

I totally agree with all that's been said. No need for PW to be creating directories unless they are going to be used. I would get this updated today if I could. Just dealing with a real time crunch here, so expecting to get this in place in April.

  • 3 months later...
Posted

Ryan, any news regarding this?

I am thinking about my discussion module and one upcoming project where I probably will use that (of course, more polish and tweaks will come). But there are about 50 000 posts already on old forum, that probably will be converted, so that folder limit would be real issue here. Although in Discussion module it would be fine to adjust the module to remove the folder right after saving, but of course nicer and more native solution to this problem would be welcome.

There is another simple way to do "folder folding": http://superuser.com/a/66341

  • Like 1
  • 2 weeks later...
Posted

I'm so waiting for this! Everytime I start a project with PW this limitation pops to my mind. :) Glad you'll address it soon.

  • 1 month later...
Posted

Don't know it it's related so pardon me if I am deviating the subject from it's original path. My question is: how to quickly refresh all image versions that were generated with the size function?

Posted

From the API you can call the removeVariations() function from any image:

foreach($page->images as $image) {
 $image->removeVariations(); 
}

That actually removes them at the time the function is called, so you don't need to do a $page->save().

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...