Remi Posted February 24, 2012 Share Posted February 24, 2012 Hello again! I'm moving my website: Norwegian to Polish dictionary http://lizus.net (ca. 28k pages) from Glossword to ProcessWire. Everything looks great except one thing: for each page there is created empty subfolder in site/assets/files. So I've got ca. 28k empty and unused folders which takes around 4kB each on my server (sic!). The same situation is with site/assets/cache :/ Is there any solution to improve this? Best regards, Remi Turala. Link to comment Share on other sites More sharing options...
ryan Posted February 24, 2012 Share Posted February 24, 2012 The ones in /site/assets/cache/Page are optional, as you can disable the cache on any templates where you don't want it to create cache files. But the ones under /site/assets/files/ are required at present, as every page is guaranteed to have a directory placeholder on the file system. So if every one takes up 4k, I'm estimating that's about 109 megabytes (?) that will be required to maintain the 28k pages on the file system. Link to comment Share on other sites More sharing options...
Remi Posted February 24, 2012 Author Share Posted February 24, 2012 Yes, empty pages folders take 109MB :/ I understand that /files folder is necessary but in my opinion it would be better when /pages subfolders will be created when they are needed, for example when I add images to pages. In other case I don't see any sense in current solution. Maybe pages using same template can have one, common subfolder? Space on my server is not a big problem, but it's hard to make backups by FTP (too many folders) :/ Link to comment Share on other sites More sharing options...
ryan Posted February 24, 2012 Share Posted February 24, 2012 PW guarantees a directory for each page so that module developers always know there is a dedicated space for every page on the disk. Though I'll take a closer look, perhaps I can modify this, as it may not be that important. You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved. I'll see what I can do. We can't use a common directory for a template just because PW wipes out the entire contents of the page's directory when the page is deleted. In this manner, PW will ensure the page really is deleted and not have leftover files from some module or something else. Keep in mind that a page's template can also change, which adds more variables to the mix, so it's best that we don't mix files from multiple pages into the same directory. The cache files are kept separate from the page's files just so that the cache can be wiped out in one shot. Link to comment Share on other sites More sharing options...
Remi Posted February 24, 2012 Author Share Posted February 24, 2012 It's a great idea (a dedicated folder for every page), but it makes sense when there is stored something. I hope that You will fix that in the next release of PW, because it's very powerfull and fast CMS (I've tried many other systems before I've decided to switch to PW). Link to comment Share on other sites More sharing options...
diogo Posted February 25, 2012 Share Posted February 25, 2012 You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page. Link to comment Share on other sites More sharing options...
apeisa Posted February 25, 2012 Share Posted February 25, 2012 This will be actually a problem with huge sites since Linux has folder limit at some point. Link to comment Share on other sites More sharing options...
Pete Posted February 25, 2012 Share Posted February 25, 2012 At the same time though, Linux can have a problem with many files in the same folder, so it's a tricky one to address. Link to comment Share on other sites More sharing options...
Remi Posted February 25, 2012 Author Share Posted February 25, 2012 Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page. I'll do that after importing pages (I've done that already by ImportPagesCSV module, but I'm still editing them, because I want to improve my dictionary). After all, I'm impressed by PW, especially by the speed of this CMS. And I haven't switched on caching, yet! Link to comment Share on other sites More sharing options...
ryan Posted February 27, 2012 Share Posted February 27, 2012 I think it makes sense for us to not keep around empty directories. I'm planning to make an update that will provide a config option to disable automatic directory creation, limiting the creation to when a file needs to go there. If all seems good after trying that out for awhile, we'll probably make it a default. 1 Link to comment Share on other sites More sharing options...
Pete Posted February 27, 2012 Share Posted February 27, 2012 That sounds great ryan - could you also add a cleanup script to remove empty folders on current sites? I guess this would be something separate since it wouldn't be required for new installs. I suppose the way to do that now I think about it is: Fetch every template via the API, work out which ones have an image or file field, iterate through every page using those templates and remove any folders for pages that have no content in those fields (via a simple $field->count() maybe?). Link to comment Share on other sites More sharing options...
apeisa Posted February 28, 2012 Share Posted February 28, 2012 This will be actually a problem with huge sites since Linux has folder limit at some point. Max number of folders depends on file system used: http://superuser.com/questions/66331/what-is-the-maximum-number-of-folders-allowed-in-a-folder-in-linux So if you use ext2 then your max folder count inside a folder is 32,768. In ext4 it is 64k but there is possibility to raise it. Currently these are probably also max number of pages for your PW installation. Link to comment Share on other sites More sharing options...
Pete Posted February 28, 2012 Share Posted February 28, 2012 I don't think the limitations are necessarily due to Linux then as I had a short-lived experience with a web host that had imposed a 4,000 file limit (files OR folders) in a given folder, so maybe the limits I'm thinking of are imposed by a setting in Apache. Something like that anyway. Or maybe the file system was ext 0.5 Link to comment Share on other sites More sharing options...
ryan Posted February 29, 2012 Share Posted February 29, 2012 32k-64k is way too small a limit for PW site. I like to think PW would be happy up to a million or more pages. So will have to make sure the file system isn't interfering with the scalability. It seems to make more and more sense that we don't keep empty directories. Link to comment Share on other sites More sharing options...
apeisa Posted February 29, 2012 Share Posted February 29, 2012 Yep, that would be good solution. Other thing that people do is to have more deeper structure: /1/1001/image.jpg /1/1002/another_image.jpg ... /2/2001/filedump.zip /2/2002/image.gif ... /10/10201/file2.jpg /11/11390/fileX.zip ... /23/231021/file3.jpg ... /191/1912621/scalesforovermillionpages.jpg ... etc Changing file structure might be hard thing to do, but it would prevent scaling issues with pages where you host large amount of files and use pages as containers for them. Deeper folder structure combined with "folders created when first file uploaded" would be best solution, I think. 4 Link to comment Share on other sites More sharing options...
Pete Posted February 29, 2012 Share Posted February 29, 2012 I've seen that sort of thing too Antti with an online shop - they had folders A-Z with each page storing files randomly in subfolders inside one of the letter folders, but your example is much better as it allows to scale a lot further, is far more logical and is a lot easier to program (I don't mean that this would be easy to implement, but it removes the random element I just mentioned in the other software I'd used). ryan - Antti's suggestion above does make a lot of sense although it sounds a bit crazy if you've never dealt with that many pages before, but even if you were getting to a more comfortable 20,000 pages it would make sense. I think the thing to think about with this many pages is how various programs interact with files and folders. There are a lot of different web server setups that can run PW, but they probably all have some sort of limit in this regard which you can safely assume will be different, so something like Antti's solution makes sense to me, even with not keeping empty folders around. Link to comment Share on other sites More sharing options...
ryan Posted March 1, 2012 Share Posted March 1, 2012 I like that approach. I think that's something we'll probably want to offer as a config option in the future. 3 Link to comment Share on other sites More sharing options...
Remi Posted March 1, 2012 Author Share Posted March 1, 2012 I'm glad to see that everybody here see this problem I hope, that it will be resolved as soon as possible. Link to comment Share on other sites More sharing options...
Bill Posted March 6, 2012 Share Posted March 6, 2012 I'm glad to see that everybody here see this problem I hope, that it will be resolved as soon as possible. agreed! while this isn't seemingly a large issue now - i personally have a site right now with over 140 GB of images, from an online image generator (nameless at this time) that has a fairly (ahem!) significant Alexa ranking - and while this is all fine and dandy, i used a filesystem folder ( a single folder! ) to store all those (between 20k and 100k images... ) and at roughly 1,000,000+ files - the simple linux call of "ls -lsa" on that folder takes up to 15 min. to show a listing... and using single php call to bring a single - known filename, takes up to 7 minutes as well... LET alone doing any sort of 'searching for specific values' (ugh!) So, while i'm no darned twitter[tm] it DOES behoove this to get 'handled' - that's how i fell on this topic to begin with. [mind you - when i coded that file system NIGHT-mare - i was a complete N00b, idi0t and over-zealous fruitcake, but after launch it blew up in popularity for a few years and - ahem! i would'a had to change a TON to get it 'better' so i left i there. migrations to a new server have been difficult as well, for that very reason above] Link to comment Share on other sites More sharing options...
ryan Posted March 7, 2012 Share Posted March 7, 2012 I totally agree with all that's been said. No need for PW to be creating directories unless they are going to be used. I would get this updated today if I could. Just dealing with a real time crunch here, so expecting to get this in place in April. Link to comment Share on other sites More sharing options...
apeisa Posted June 7, 2012 Share Posted June 7, 2012 Ryan, any news regarding this? I am thinking about my discussion module and one upcoming project where I probably will use that (of course, more polish and tweaks will come). But there are about 50 000 posts already on old forum, that probably will be converted, so that folder limit would be real issue here. Although in Discussion module it would be fine to adjust the module to remove the folder right after saving, but of course nicer and more native solution to this problem would be welcome. There is another simple way to do "folder folding": http://superuser.com/a/66341 1 Link to comment Share on other sites More sharing options...
ryan Posted June 8, 2012 Share Posted June 8, 2012 This is on the list for 2.3, so should be coming soon! 4 Link to comment Share on other sites More sharing options...
Soma Posted June 18, 2012 Share Posted June 18, 2012 I'm so waiting for this! Everytime I start a project with PW this limitation pops to my mind. Glad you'll address it soon. Link to comment Share on other sites More sharing options...
ptjedi Posted July 31, 2012 Share Posted July 31, 2012 Don't know it it's related so pardon me if I am deviating the subject from it's original path. My question is: how to quickly refresh all image versions that were generated with the size function? Link to comment Share on other sites More sharing options...
ryan Posted July 31, 2012 Share Posted July 31, 2012 From the API you can call the removeVariations() function from any image: foreach($page->images as $image) { $image->removeVariations(); } That actually removes them at the time the function is called, so you don't need to do a $page->save(). Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now