Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 04/12/2021 in all areas

  1. Hi, A bit of background. I am creating a website which lets you navigate through a protein database with 20 million proteins grouped into 50 thousand categories. The database is fixed in size, meaning no need to update/add information in the near future. Queries to the database are pretty standard. The problem I am currently having is the time it takes to create the pages for the proteins (right now around a week). Pages are created reading the data from a csv file. Based on previous posts I found on this forum (link1, link2) I decided to use $database transactions to load the data from a php script (InnoDB engine required). This really boosts performance from 25 pages per second to 200 pages per second. Problem is performance drops as a function of pages created (see attached image). Is this behavior expected? I tried using gc_collect_cycles() but haven't noticed any difference. Is there a way to avoid the degradation in performance? A stable 200 pages per second would be good enough for me. Pseudo code: $handle = fopen($file, "r"); $trans_size = 200 // commit to database every _ pages try { $database->beginTransaction(); for ($i = 0; $row = fgetcsv($handle, 0, " "); ++$i) { // fields from data $title = $row[0]; $size = $row[1]; $len_avg = $row[2]; $len_std = $row[3]; // create page $page = new Page(); $page->template = "protein"; $page->title = $title; $page->size = $size; $page->len_avg = $len_avg; $page->len_std = $len_std; $page->save(); if (($i+1)%$trans_size == 0) { $database->commit(); // $pages->uncacheAll(); // gc_collect_cycles(); $database->beginTransaction(); } } $database->commit(); } I am quiet new to process wire so feel free to criticize me ? Thanks in advance
    4 points
  2. TBH, neither have I*. However, I have had to migrate pages, sometimes with images. For example, pages that are used to hold site settings. Also, I intended that the module might be used in 'rescue' mode as explained in the original post, which might involve migrating 'content'. Since the module does allow migrating pages, prompted by @adrian, I thought I would try and include RTE fields if I could. *Correction - I meant RTE fields with images. Even the migration pages themselves have an RTE field, but I hadn't expected to put images in it, although that is possible.
    3 points
  3. Good point @horst about using getallheaders() to ensure the fingerprint matches. I have combined that with @Robin S's code and it looks like we have a winner. Your combined brilliance seems to have got us through. I'll test a bit more tomorrow, including on a production server with https, but hopefully we'll be good. Here is the version I am using: // make URLs in links panel root relative and get titles if not supplied $existingConfig = $this->wire('modules')->getModuleConfigData($this); $existingLinks = isset($existingConfig['linksCode']) ? $existingConfig['linksCode'] : ''; $savedLinks = isset($data['linksCode']) ? $data['linksCode'] : ''; if($savedLinks !== $existingLinks) { $this->addHookAfter('ProcessWire::finished', null, function($event) { // Make URLs in links panel root relative and get titles if not supplied $tracyConfig = $this->wire('modules')->getModuleConfigData($this); // Close existing session to avoid session blocking $allHeaders = getallheaders(); session_write_close(); $allLinks = array(); $http = new WireHttp(); foreach($allHeaders as $header => $value) { if('Host' == $header) continue; $http->setHeader($header, $value); } foreach(explode("\n", $tracyConfig['linksCode']) as $link) { $link_parts = explode('|', $link); $url = trim($link_parts[0]); $title = isset($link_parts[1]) ? trim($link_parts[1]) : ''; $url = str_replace($this->wire('config')->urls->httpRoot, '/', $url); if($title == '') { $fullUrl = strpos($url, 'http') === false ? $this->wire('config')->urls->httpRoot . $url : $url; $html = $http->get($fullUrl); libxml_use_internal_errors(true); $dom = new \DOMDocument(); $dom->loadHTML($html); $list = $dom->getElementsByTagName('title'); libxml_use_internal_errors(false); $title = $list->length ? str_replace('|', ':', $list->item(0)->textContent) : $url; } $finalLink = $url . ' | ' . $title; $allLinks[] = $finalLink; } $tracyConfig['linksCode'] = implode("\n", $allLinks); // Calling saveModuleConfigData with underscores because we don't need hooks to run again $this->wire('modules')->___saveModuleConfigData($this, $tracyConfig); }); }
    3 points
  4. Because it's the only reliable way to parse HTML properly and it is much easier to query and replace things - I just wish it didn't mess with the html when saving. Some people say to use saveXML() but that has other problems. The new line issue is easily fixed with a trim(). As for the self closing tags and the slash - I guess that doesn't bother me too much. This shouldn't be so hard ?
    2 points
  5. I'm not totally following what you are doing, but sessioningerprinting should not be an issue if you look at my code the post above with using php function getallheaders() This allows a complete 1:1 copy of the sended headers from the browser also through wirehttp. It includes the same UA header, same cookies, same everything. array(12) { ["Host"] string(32) "pw-change-default-language.local" ["Connection"] string(10) "keep-alive" ["Upgrade-Insecure-Requests"] string(1) "1" ["User-Agent"] string(132) "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 OPR/75.0.3969.149" ["Accept"] string(135) "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" ["Sec-Fetch-Site"] string(4) "none" ["Sec-Fetch-Mode"] string(8) "navigate" ["Sec-Fetch-User"] string(2) "?1" ["Sec-Fetch-Dest"] string(8) "document" ["Accept-Encoding"] string(17) "gzip, deflate, br" ["Accept-Language"] string(35) "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7" ["Cookie"] string(166) "wire=8jfqut56c2ag66iaquvb27jdpu; wire_challenge=72ie9BYKDjbw4h0%2FhBYu1wmngAyUoA3F; wires=5u0b7v49ecudnj39bmdpat4dje; wires_challenge=TmFvvRVgTxBQYsyta4tST6yQJjkzymPd" }
    2 points
  6. Minimal Fieldset Adds a config option to Fieldset/FieldsetGroup/FieldsetPage to render the fieldset without label or padding in Page Edit. When a neighbouring field in the same row is taller than the fieldset the extra height is distributed evenly among rows within the fieldset. Requires ProcessWire v3 and AdminThemeUikit. Why? This module allows you to create layouts in Page Edit that would not be possible without it. It's useful when you want a layout that has two or more fields as rows that are themselves within a row in Page Edit. It's also useful when you have some fields that you want to add to a template as a group (i.e. via FieldsetGroup or FieldsetPage) but having a heading and visible wrapper for the fieldset in Page Edit would be redundant. Example: Installation Install the Minimal Fieldset module. Usage In the field settings for any Fieldset/FieldsetGroup/FieldsetPage, tick the "Remove label and padding for this fieldset" checkbox. https://github.com/Toutouwai/MinimalFieldset https://modules.processwire.com/modules/minimal-fieldset/
    1 point
  7. It's spring break here and my kids are going back to school next week after being out for more than a year. Since it's a break week, the weather is great, and it's also the last week of the year-long covid break from school, I've spent a little less time at the computer this week. I've focused on some smaller module projects rather than the core. More specifically: posted a major update and refactor of the TextformatterHannaCode module, and a completely rewritten TextformatterVideoEmbed module. While making these updates, I've also made note of and attempted to resolve any reported issues in the GitHub repositories. Next week, it's back to the core, with both issue resolutions and pull requests scheduled for upcoming versions. Next week I also get my 2nd shot of covid vaccine, and I'm told it may slow me down a bit for a day, but will be well worth it. I had a day of tiredness from the 1st shot, but it was greatly outweighed by feelings of gratitude and reduction of worry. I highly recommend it as soon as you can get it, if you haven't already.
    1 point
  8. @adrian, I have an idea for a different solution to the problem. I'll report back once I've done some work on it.
    1 point
  9. Thanks for pointing me out those solutions: really rocks! For CKEditor field they are the first solution we'll considering in future. At the point of our production code, also page reference didn't suits well (we would maintain simple textarea and HC). Surely we will change templates and code to use these powerful tools for the next uses. Currently I think the first idea is really flexible (even if less powerful) to adopt it in a lot of ways, for lot of ready-made sites, and with different solutions without efforts. E.g. for food related sites we could use it as the List of Allergens, for many others could be used like a docs repo or a shortcut to a help-desk, tuts o more... pdf-docs.mp4
    1 point
  10. @fedeb That's the largest quantity of pages I've heard of anyone creating in ProcessWire, by a pretty large margin. So you are in somewhat uncharted territory. But that's really cool you are doing that. I would be curious how different the graph would be if you split it up into batches so that you aren't creating more than a certain quantity per execution/runtime. For instance, maybe you create 10k in one execution and another 10k in the next, etc., or something like that. Would the same slowdown still occur? If so, I would start to think it might be the database index and increased overhead in maintaining that index as the quantity increases. On the flip side, if restarting the process to create each set in batches solves the slowdown, then I would think it might be memory or resource related. A couple things you can do to potentially (?) improve your page creation time: 1. At the top of your code (before the loop) put: $template = $templates->get('protein'); Then within the loop set: $page->template = $template; 2. I don't see a parent page assignment. How are you doing that? Double check that you aren't asking PW to load the parent page every time in the loop and instead handle it like with the template in #1 above. 3. What kind of fields are on your "protein" template? Depending on their type, there may be potential optimizations. Especially if any are Page references. Can you paste in a line or two from the CSV? 4. If you can assign a $page->name = "protein" . $i; rather than having PW auto-generate a name from the title, that will save some resources too.
    1 point
  11. Version 0.0.2 now on GitHub https://github.com/MetaTunes/ProcessDbMigrate This version more fully allows for different page ids in source and target systems. A meta value (idMap) maintains the mapping. This allows the replacement of links in RTE fields provided the relevant pages are all in the migration. Also, all existing image variants are migrated. EDIT: Now 0.0.3 fixes install problem and adds upgrade via modules -> refresh.
    1 point
  12. Yeah, I did that and then came across the /> issue. Trouble is my diff method will report any diffs in the source unless I exempt them. I don't like it if you can't predict what will be returned. I think I'll stick with preg_replace for now since the parsing is very limited and see if it works out OK.
    1 point
  13. There were other issues too, like '/>' vs '>' as the img tag end. Eventually I decided to ditch the DOMDocument and just use a simple preg_replace: protected function replaceImgSrc($html, $idMapArray) { if (strpos($html,'<img') === false) return $html; //return early if no images are embedded in html foreach ($idMapArray as $origId => $destId) { bd([$origId, $destId], 'Id pair'); $re = '/(<img.*\/files\/)' . $origId . '(\/.*>)/m'; $html = preg_replace($re, '${1}' . $destId . '$2', $html); } return $html; } Any reason @adrian why you went the DOMDocument route? I'll post an updated script to GitHub shortly, then maybe someone will find some holes in it!
    1 point
  14. Oh @Robin S, you are right. Sadly the value of $_SERVER['REMOTE_ADDR'] changes between the (remote) browser and the (local) webserver that is used by wirehttp. And if $_SERVER['REMOTE_ADDR'] is part of the session fingerprinting (what it is by default) it only matches if the browser and the webserver are on the same machine. ?
    1 point
  15. Just tested now and unfortunately it's logging me out again. I tried moving things to __destruct and removing the session_write_close but that just resulted in the timeout issue ?
    1 point
  16. @Robin S excellent, thank you. That's what I was looking for.
    1 point
  17. I did try your suggestion but the default for $config->sessionFingerprint is "10", which means the session fingerprint is going to include the IP from $_SERVER['REMOTE_ADDR']. When WireHttp requests the page this is going to be the server IP and so it's not going to match the IP in the user fingerprint and a logout will occur. Maybe it would be possible to spoof $_SERVER['HTTP_CLIENT_IP'] or $_SERVER['HTTP_X_FORWARDED_FOR'] but this would mean users have to change from the default session fingerprinting setting. @adrian, when you say you got it working did you test yet on a remote server?
    1 point
  18. Looking at the recent discussion in this thread I wonder if the distinction between your module and mine is not about having a UI or not but is more about where and how you want to use it. I have never ever had the need for "migrating" content of an RTE field to another site for example. And I have never ever had the problem of changing ids on that process. Why? Because my module relates to everything BUT content. Content should be part of the site and not part of the migration. That means I can develop a system that I can setup multiple instances of (for example a local dev and a live production system, or as another example this could also be one setup for sports clubs that is used by multiple sports clubs running the same system and getting the same updates - but keeping thier content). Your module seems to be targeted to another world. Migrating content (and maybe also necessary config fields/templates) from one site to another? Am I right or did I get a wrong impression here?
    1 point
  19. @adrian, your suggestions have been invaluable! I think I have it working OK using ids - basically the 'new' pages all store a meta value for the related old page id so that mapping is possible (of course all pages with the source images must be included in the migration). That means that I only have to do one 'translation' - in the target system, replacing the old id's with the new ones. I used the code in your nameImagePathId() method for this - amended as required: protected function replaceImgSrc($page, $field, $idMapArray) { $files = $this->wire()->config->urls->files; $html = $page->$field; if (strpos($html,'<img') === false) return $html; //return early if no images are embedded in html $dom = new DOMDocument(); @$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')); foreach ($dom->getElementsByTagName('img') as $img) { $src = $img->getAttribute('src'); bd($src, 'Image src for ' . $page); $origId = basename(dirname($src)); $destId = (isset($idMapArray[$origId])) ? $idMapArray[$origId] : $origId; $img->setAttribute( 'src', $files . $destId . '/' . basename($img->getAttribute('src'))); bd($img->getAttribute('src'), 'reset img src'); } return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML())); } $idMapArray is just an array of oldId => newId pairs. The only slight problem is that this introduces line breaks ( \n ) at the start and end of the html and I can't see why.
    1 point
  20. Arrgh, I think it's due to session fingerprinting - the WireHttp request is failing Session::isValidSession(). Will try some more experimentation a bit later.
    1 point
  21. Hmmm, I was testing locally on Windows but I just tested on my Linux remote hosting and it worked there too. PW itself will call session_write_close() at shutdown so I don't think that in itself should cause a logout. Perhaps you have other hooks after ProcessWire::finished that have the effect of starting a new session? Maybe try setting a high number for the hook priority - the idea being that this hook to get the links should be the last thing that runs before shutdown. I just realised that I haven't yet dealt with the HTTPS situation - I've only been testing on HTTP so far. Will report back if/when I can make some progress with HTTPS.
    1 point
  22. Thanks for looking into improving the autodesc - much appreciated. Interesting - I haven't changed the default MySQL settings, so not sure what has changed there. The reason I am using it is that if I change to %= then a search for "stream banks" won't find instances of "streambanks" which I think is pretty important - sometimes it's hard to know whether these sorts of things are one word or two. Fair enough - I am not too worried about this at all - it's just what I thought made sense when I built that example 18 years ago, which BTW was still in production until a couple of weeks ago ?
    1 point
  23. Hello @bernhard, playing around a bit with your example I came up with: https://codepen.io/3fingers/pen/dyNdvyx Note I uncommented line 63 on my example, which looks like the culprit for the blank result, so there is something related to time values I didn't figured out (even though I've tried different formats).
    1 point
  24. @markus_blue_tomato Great, glad to hear it's working well! @StanLindsey This would be very simple to add, I'll plan to add it this week. Question: would just an array of DB hosts be adequate, or would it need separate configuration (host plus db name, user, pass, port, etc.) for each of the readonly db hosts?
    1 point
  25. I made some progress! ? The problem is session blocking: a session is already started and not yet closed when Tracy (or any code in a template file, like I was using for testing) attempts to get the response for the link using the same session cookie. So we need to close the existing session before using WireHttp. It wouldn't be good to close the session during the Tracy module config save because it's probably still needed at that point, so I'm using a hook after ProcessWire::finished when it should be safe to close the existing session. I also made some other minor tweaks for efficiency: only processing the links if they have changed from the existing config data, and moving the instantiation of WireHttp outside the foreach loop. I replaced this code with the following: $existingConfig = $this->wire('modules')->getModuleConfigData($this); $existingLinks = isset($existingConfig['linksCode']) ? $existingConfig['linksCode'] : ''; $savedLinks = isset($data['linksCode']) ? $data['linksCode'] : ''; if($savedLinks !== $existingLinks) { $this->addHookAfter('ProcessWire::finished', null, function($event) { // Make URLs in links panel root relative and get titles if not supplied $tracyConfig = $this->wire('modules')->getModuleConfigData($this); // Close existing session to avoid session blocking session_write_close(); $allLinks = array(); $http = new WireHttp(); $http->setHeader('Cookie', "wire={$this->wire('input')->cookie->wire}; wire_challenge={$this->wire('input')->cookie->wire_challenge}"); foreach(explode("\n", $tracyConfig['linksCode']) as $link) { $link_parts = explode('|', $link); $url = trim($link_parts[0]); $title = isset($link_parts[1]) ? trim($link_parts[1]) : ''; $url = str_replace($this->wire('config')->urls->httpRoot, '/', $url); if($title == '') { $fullUrl = strpos($url, 'http') === false ? $this->wire('config')->urls->httpRoot . $url : $url; $html = $http->get($fullUrl); libxml_use_internal_errors(true); $dom = new \DOMDocument(); $dom->loadHTML($html); $list = $dom->getElementsByTagName('title'); libxml_use_internal_errors(false); $title = $list->length ? str_replace('|', ':', $list->item(0)->textContent) : $url; } $finalLink = $url . ' | ' . $title; $allLinks[] = $finalLink; } $tracyConfig['linksCode'] = implode("\n", $allLinks); // Calling saveModuleConfigData with underscores because we don't need hooks to run again $this->wire('modules')->___saveModuleConfigData($this, $tracyConfig); }); }
    1 point
  26. This module is sort of an upgrade to my earlier ImageToMarkdown module, and might be useful to anyone working with Markdown in ProcessWire. Copy Markdown Adds icons to images and files that allow you to copy a Markdown string to the clipboard. When you click the icon a message at the top left of the screen notifies you that the copying has occurred. Screencast Note: in the screencast an EasyMDE inputfield is used to preview the Markdown. It's not required to use EasyMDE - an ordinary textarea field could be used. Usage: Images When you hover on an item in an Images field an asterisk icon appears on the thumbnail. Click the icon to copy an image Markdown string to clipboard. If the "Description" field is populated it is used as the alt text. You can also open the "Variations" modal for an image and click the asterisk icon to copy an image Markdown string for an individual variation. Usage: Files When you hover on an item in a Files field an asterisk icon appears next to the filename. Click the icon to copy a link Markdown string to the clipboard. If the "Description" field is populated it is used as the link text, otherwise the filename is used. https://github.com/Toutouwai/CopyMarkdown https://processwire.com/modules/copy-markdown/
    1 point
  27. Well, in this case, it's Teppo who is amazing ?
    1 point
  28. Ryan's ProCache module is the other obvious candidate to mention here.
    1 point
  29. I didn´t mean to say which approach is better. I think RockMigrations is a good tool. SolidWire is not a replacement of any migration strategy or tool. It's just an idea to ease the communication and prototyping between developers and other people. It's in the same area of https://plantuml.com/ Just a tool to improving an eagle view of the system. The JSON output is just an idea similar to https://doc.mapeditor.org/en/stable/manual/introduction/ Any other proper migration tool could use it to generate a proper output. But is not mandatory :).
    1 point
×
×
  • Create New...