Jump to content

Problem with assets/files folder


Remi
 Share

Recommended Posts

Hello again!

I'm moving my website: Norwegian to Polish dictionary http://lizus.net (ca. 28k pages) from Glossword to ProcessWire. Everything looks great except one thing: for each page there is created empty subfolder in site/assets/files. So I've got ca. 28k empty and unused folders which takes around 4kB each on my server (sic!). The same situation is with site/assets/cache :/ Is there any solution to improve this?

Best regards,

Remi Turala.

Link to comment
Share on other sites

The ones in /site/assets/cache/Page are optional, as you can disable the cache on any templates where you don't want it to create cache files. But the ones under /site/assets/files/ are required at present, as every page is guaranteed to have a directory placeholder on the file system. So if every one takes up 4k, I'm estimating that's about 109 megabytes (?) that will be required to maintain the 28k pages on the file system.

Link to comment
Share on other sites

Yes, empty pages folders take 109MB :/

I understand that /files folder is necessary but in my opinion it would be better when /pages subfolders will be created when they are needed, for example when I add images to pages. In other case I don't see any sense in current solution.

Maybe pages using same template can have one, common subfolder?

Space on my server is not a big problem, but it's hard to make backups by FTP (too many folders) :/

Link to comment
Share on other sites

PW guarantees a directory for each page so that module developers always know there is a dedicated space for every page on the disk. Though I'll take a closer look, perhaps I can modify this, as it may not be that important. You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved. I'll see what I can do.

We can't use a common directory for a template just because PW wipes out the entire contents of the page's directory when the page is deleted. In this manner, PW will ensure the page really is deleted and not have leftover files from some module or something else. Keep in mind that a page's template can also change, which adds more variables to the mix, so it's best that we don't mix files from multiple pages into the same directory.

The cache files are kept separate from the page's files just so that the cache can be wiped out in one shot.

Link to comment
Share on other sites

It's a great idea (a dedicated folder for every page), but it makes sense when there is stored something. I hope that You will fix that in the next release of PW, because it's very powerfull and fast CMS (I've tried many other systems before I've decided to switch to PW).

Link to comment
Share on other sites

You could go in and manually delete the empty directories if you wanted to, but PW would re-create one for the page the next time the individual page is saved

Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page.

Link to comment
Share on other sites

Why don't you do this? Since your site it's a dictionary, I can imagine that each entry will be untouched most of the time, and folders won't be created until you do save the page.

I'll do that after importing pages (I've done that already by ImportPagesCSV module, but I'm still editing them, because I want to improve my dictionary).

After all, I'm impressed by PW, especially by the speed of this CMS. And I haven't switched on caching, yet!

Link to comment
Share on other sites

I think it makes sense for us to not keep around empty directories. I'm planning to make an update that will provide a config option to disable automatic directory creation, limiting the creation to when a file needs to go there. If all seems good after trying that out for awhile, we'll probably make it a default.

  • Like 1
Link to comment
Share on other sites

That sounds great ryan - could you also add a cleanup script to remove empty folders on current sites? I guess this would be something separate since it wouldn't be required for new installs.

I suppose the way to do that now I think about it is: Fetch every template via the API, work out which ones have an image or file field, iterate through every page using those templates and remove any folders for pages that have no content in those fields (via a simple $field->count() maybe?).

Link to comment
Share on other sites

This will be actually a problem with huge sites since Linux has folder limit at some point.

Max number of folders depends on file system used:

http://superuser.com/questions/66331/what-is-the-maximum-number-of-folders-allowed-in-a-folder-in-linux

So if you use ext2 then your max folder count inside a folder is 32,768. In ext4 it is 64k but there is possibility to raise it. Currently these are probably also max number of pages for your PW installation.

Link to comment
Share on other sites

I don't think the limitations are necessarily due to Linux then as I had a short-lived experience with a web host that had imposed a 4,000 file limit (files OR folders) in a given folder, so maybe the limits I'm thinking of are imposed by a setting in Apache.

Something like that anyway. Or maybe the file system was ext 0.5 ;)

Link to comment
Share on other sites

32k-64k is way too small a limit for PW site. I like to think PW would be happy up to a million or more pages. So will have to make sure the file system isn't interfering with the scalability. It seems to make more and more sense that we don't keep empty directories.

Link to comment
Share on other sites

Yep, that would be good solution. Other thing that people do is to have more deeper structure:

/1/1001/image.jpg

/1/1002/another_image.jpg

...

/2/2001/filedump.zip

/2/2002/image.gif

...

/10/10201/file2.jpg

/11/11390/fileX.zip

...

/23/231021/file3.jpg

...

/191/1912621/scalesforovermillionpages.jpg

... etc

Changing file structure might be hard thing to do, but it would prevent scaling issues with pages where you host large amount of files and use pages as containers for them. Deeper folder structure combined with "folders created when first file uploaded" would be best solution, I think.

  • Like 4
Link to comment
Share on other sites

I've seen that sort of thing too Antti with an online shop - they had folders A-Z with each page storing files randomly in subfolders inside one of the letter folders, but your example is much better as it allows to scale a lot further, is far more logical and is a lot easier to program (I don't mean that this would be easy to implement, but it removes the random element I just mentioned in the other software I'd used).

ryan - Antti's suggestion above does make a lot of sense although it sounds a bit crazy if you've never dealt with that many pages before, but even if you were getting to a more comfortable 20,000 pages it would make sense.

I think the thing to think about with this many pages is how various programs interact with files and folders. There are a lot of different web server setups that can run PW, but they probably all have some sort of limit in this regard which you can safely assume will be different, so something like Antti's solution makes sense to me, even with not keeping empty folders around.

Link to comment
Share on other sites

I'm glad to see that everybody here see this problem :) I hope, that it will be resolved as soon as possible.

agreed! while this isn't seemingly a large issue now - i personally have a site right now with over 140 GB of images, from an online image generator (nameless at this time) that has a fairly (ahem!) significant Alexa ranking - and while this is all fine and dandy, i used a filesystem folder ( a single folder! ) to store all those (between 20k and 100k images... ) and at roughly 1,000,000+ files - the simple linux call of "ls -lsa" on that folder takes up to 15 min. to show a listing... and using single php call to bring a single - known filename, takes up to 7 minutes as well... LET alone doing any sort of 'searching for specific values' (ugh!)

So, while i'm no darned twitter[tm] it DOES behoove this to get 'handled' - that's how i fell on this topic to begin with.

[mind you - when i coded that file system NIGHT-mare - i was a complete N00b, idi0t and over-zealous fruitcake, but after launch it blew up in popularity for a few years and - ahem! i would'a had to change a TON to get it 'better' so i left i there. migrations to a new server have been difficult as well, for that very reason above]

Link to comment
Share on other sites

I totally agree with all that's been said. No need for PW to be creating directories unless they are going to be used. I would get this updated today if I could. Just dealing with a real time crunch here, so expecting to get this in place in April.

Link to comment
Share on other sites

  • 3 months later...

Ryan, any news regarding this?

I am thinking about my discussion module and one upcoming project where I probably will use that (of course, more polish and tweaks will come). But there are about 50 000 posts already on old forum, that probably will be converted, so that folder limit would be real issue here. Although in Discussion module it would be fine to adjust the module to remove the folder right after saving, but of course nicer and more native solution to this problem would be welcome.

There is another simple way to do "folder folding": http://superuser.com/a/66341

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
  • 1 month later...

Don't know it it's related so pardon me if I am deviating the subject from it's original path. My question is: how to quickly refresh all image versions that were generated with the size function?

Link to comment
Share on other sites

From the API you can call the removeVariations() function from any image:

foreach($page->images as $image) {
 $image->removeVariations(); 
}

That actually removes them at the time the function is called, so you don't need to do a $page->save().

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By AndZyk
      Hello,
      we have many projects as pages containing images inside regular image fields and repeater matrix elements. Now we would like to organize our assets.
      ProcessWire saves assets with id as folder name and makes separate folders for repeater matrix elements.
      Is there a way to collect all assets of one page and export them in a folder with the page name/title?
      I think I could achieve this by using the command line wget for this website or maybe by a hook. But maybe someone has done this before. 😉
      Regards, Andreas
    • By donatas
      Hi,
      I am trying (wondering if even possible) to set a very different file folder path for my PW install. The path I want to set exist in the same server, but in another domain. Can the PW API see that far into file system? It is a shared hosting environment so no permission management options available.
      Also can this be achieved through .htaccess redirection? If you request a file (images mostly, but translation json files are important too) from `domain-A.com/site/assets/files/` to grab them from `domain-B.com/site/assets/files/`? I would prefer to do this through PW API, I can see the function `$config->setPath()` but it didn't work when set in  config.php like `$config->setPath("files", "/domains/domain-B/public_html/site/assets/files/")`.
      Is there some other option to do this?
      My ultimate goal is to have two PW installs on different domains but one is just a "mirror" that is using the same database as the other and should use the same files structure if editors upload any images to the main (domain-B.com) website. I could use domain parking function but it then needs a more expensive SSL certificate for two domains , which I am trying to avoid if possible 🙂 (I'd like to use single domain Let's Encrypt certificates, thus I need to PW installs).
      Would appreciate any insight! Thanks!
    • By dotnetic
      Hi folks, I published "Simple file downloads with ProcessWire tutorial"  today which explains how to make a simple download function with ProcessWire (tested with version 3.0+).
      Basically this is based on my post here in the forums 
       
    • By hellerdruck
      Hi all
      I need help with something. Situation: We have let's say 2'000 Files (Excel) that should be displayed (list with links) on a page. We'd need to filter these files by given Keywords or a tree structure or both. Now, I'm looking for a solution whereas our customer can synchronise the files from his local computer with the folder on the webserver. They will update and upload files on a daily basis. Therefore, it would need to synchronise rather than load the files manually in pages or repeaters. Maybe indexing would be an idea, too.
      Are there any modules for Processwire that would help achieving this? Could anyone point me in the right direction?
      Thanks in advance.
    • By horst
      Hi, on a site I want to disable access to original images and only allow to access thumbnails and watermarked image variations.
      EDIT:
      A good solution for protecting original images can be found a bit down in this thread:
       
      Old content of this initial post:
       
×
×
  • Create New...