Leaderboard

Optimizing 404s in ProcessWire

Back with more! Prepare for incoming wall of text... I mentioned adding custom directives to our .htaccess file and wanted to share some more detail on that as well as some other tips. I was reviewing our 404s as a matter of maintenance so to speak to ensure that we had redirects in place as necessary. While reviewing that I found a lot (a lot) of hits that were bogus, clearly bots and even web crawlers for engines we have no interest in being listed for. What I found was in just 48 hours we had 700 total 404s and I imagine on some websites that number could be higher. By analyzing that log and writing custom directives I was able to take 700 404s logged by ProcessWire down to 200 which are "legitimate" in that it's traffic that to be redirected to a proper destination page. I'm sharing my additional directives here as an example. Again, ANY bot/security directives should be at the very top of your .htaccess file. As always, test test test, and modify for your use case. # Declare this at the top of your .htaccess file and remove or comment out all other instances of this directive elsewhere RewriteEngine On # Block known bad URLs # Directories including sub-directories RedirectMatch 404 "\/(wp-includes|wp-admin|wp-content|wordpress|wp|xxxss|cms|ALFA_DATA|functionRouter|rss|feed|feeds|TKVNP|QXXLZ|data\/admin)" # Top level directories only - There are no assets served from these directories in root, only from /site/assets & /site/templates RedirectMatch 404 "^/(js|scripts|css|styles|img|images|e|video|media|shwtv|assets|files|123|tvshowbiz)\/" # Explicit file matching RedirectMatch 404 "(1index|s_e|s_ne|media-admin|xmlrpc|trafficbot|FileZilla|app-ads|beence|defau1t|legion|system_log|olux|doc)\.(php|xml|life|txt)$" # Additional filetypes & extensions RedirectMatch 404 "(\.bak|inc\.)" # Additional User Agent blocking not present in 7G Firewall <IfModule mod_rewrite.c> # Chinese crawlers that cause significant traffic to bad URLs RewriteCond %{HTTP_USER_AGENT} Mb2345Browser|LieBaoFast|zh-CN|MicroMessenger|zh_CN|Kinza|Datanyze|serpstatbot|spaziodati|OPPO\sA33|AspiegelBot|aspiegel|PetalBot [NC] RewriteRule .* - [F,L] </IfModule> Details on this additional config: It blocks some WP requests that get past 7G My added directives redirect to a 404 which tells the bot that it flat out doesn't exist rather than 403 forbidden which could indicate it may exist. I read somewhere that this is more likely to get cached as a URL not to be revisited (wish I could remember the source, it's not a major issue). Blocks a lot of very specific URLs/files we were seeing Blocks Chinese search engine bots, because we don't operate in China. These amounted to a lot of traffic. Blocks common dev files like .bak and .inc.* which aren't protected by default. Obvs you want to eliminate .bak altogether in production, but added safety fallback. I have not seen this cause any issues in the Admin. Also consider if directives could cause problems in another language. Customize by reviewing your logs Additional measures 7G and the directives I created are a healthy amount of prevention of malicious traffic. Another resource I use is a Bad Bot gist that blocks numerous crawlers that add traffic to your site but may or may not generate 400s-500s HTTP statuses. This expands on 7G's basic list. Bad Bot recommendations: Comment out: SetEnvIfNoCase User-Agent "^AdsBot-Google.*" bad_bot There's not really a good reason to block a specific Google bot If you make Curl requests to your server then comment this line out: SetEnvIfNoCase User-Agent "^Curl.*" bad_bot Reason: this will block all Curl requests to your server, including those by your own code. Be sure that you don't need Curl available if leaving this active. This is included in the list to prevent some types of website scrapers. If you want to leave this active and still need to use Curl, then consider changing your User Agent. Comment out: SetEnvIfNoCase User-Agent "^Mediapartners-Google.*" bad_bot Again, not necessary to block Google's bots, might even be a bad idea for SEO or exposure (only they know, right?). Testing There's no such thing as too much testing. These directives are powerful and while written well, may have edge cases (like 'null' mentioned previously). There's no replacement for manual testing, specifically it would be a good idea to test any marketing UTMs or URLs with GET strings you may have out there just in case. For automated testing I use broken-link-checker which can be called from the terminal or as a JS module. I prefer this method to using some random site scanning service. This will detect both 404s and 403s by scanning every link on your page and getting a response which is useful for ensuring that your existing URLs have not been affected by your .htaccess directives. broken-link-checker recommendations: Consider rate limiting your requests using the --requests flag to set the number of concurrent requests. If you don't you could run into rate limits that your managed hosting company, CDN, or you (if you're like me) have built into your own server. This terminal app runs fast so if you have a lot of links or pages those requests can stack up quickly. Consider using the -e flag, at least initially while testing your directives. This excludes external URLs which will help your test complete faster and prevent any false positives if you have broken external links (which you can handle separately). Consider using the -g flag which switches the request to GET which is what browsers do. Shortcut, just copy and paste my command: blc https://www.yoursite.com -roegv --requests 5 If you have access to your Apache access log via a bash/terminal instance then you may consider watching that file for new 404/403 entries for a little bit. You can do this by navigating to the directory with your access log and executing the following command (switch out the name of your log as needed): tail apache.access.log -f | grep "404 " You may consider also checking for 403s by changing out the HTTP status in that command. "This seems excessive" I think this is good for every site and once you get it dialed in to your needs can be replicated to others. There's no downside to increasing the security and performance of your hosting server. Consider that any undesirable traffic you block frees up resources for good traffic, and of course reduces your attack surface. If you need to think about scalability then this becomes even more important. The company I work for is looking to expand into 2 additional regions and I'd prefer my server was ready for it! If you get into high traffic circumstances then blocking this traffic may prevent you from needing to "throw money at the problem" by upgrading server specs if your server is running slower. Outside of that, it's just cool knowing that you have a deeper understanding of how this works and knowing you've expanded your developer expertise further. This isn't meant to be an exhaustive guide but I hope I've helped some people get some extra knowledge and save everyone a few hours on Google looking this up. If I've missed anything or presented inaccurate/incomplete information please let me know and I will update this comment to make it better.

April 11, 2022

4 points

Padloper 2 Released

Padloper 2 has been released I will write a better post later. Need to know for now: Get it from here. We will be transitioning to a new website and/or shop in due course. For now, please get it from that old site. For now, and to allow it to bow out gracefully, initial purchases will be powered by Padloper 1. All hail Padloper 1 :-). Due to #2, and #3 and due to non-SCA compliance on that particular site, there might be Stripe issues. Apologies. Please contact me for help. Support for Padloper 1 has ceased. Security fixes will continue. Things I have promised to look at will be looked at :-). Support subscription period for beta testers' purchases commences today - 30 March 2022. It is a special day...in more than one way... Update 3 April 2022 Frequently Asked Questions A few questions are coming up with respect to this release. For now, I'll answer them here but might start a new threaded dedicated to FAQs. Q: Will beta testers have to purchase a new licence for this release? A: No. Your licence and download link are still valid. The only difference is the countdown of your VIP support commences on 30 March 2022. Subscriptions and updates are valid for one year. This download link was emailed to you when you purchased Padloper 2. Please contact me if you do not have a download link. Q: I have shops built in Padloper 1 that I'd like to migrate to Padloper 2. I don't have the expertise and/or time to do the migration myself. Is there paid support for this? A: Yes, paid support is available. We can migrate your Padloper 1 shop to Padloper 2 per your specifications. This includes both backend and frontend migration. You can purchase either or both backend and frontend migration. Please contact me to discuss. We will soon add this custom work information to our website. Q: Is there a backend/admin demo of Padloper 2 that I can test pre-purchase? A: Yes. We are currently putting final touches to it. An announcement will be made here once this is ready. Q: Which is now the official Padloper support forum? A: This forum will eventually become the official Padloper support forum Although it is a public forum, we will still be able to offer VIP support. The only difference is downloads will be not be available in the forums, since it is open. Support via email will still be available as well. Q: I am getting a called to undefined function bcmul() after install. What does this mean? A: Please see the minimum requirements for Padloper 2. You will need to install the PHP extension bcmath on your server. This is in order to get accurate rounding off of currencies.

April 15, 2024

1 point

htmx: Auto-refresh Frontend Content on Backend Page Save

The problem: how to refresh a page's frontend content in real time without a page refresh when the page has been edited in the backend. This is an alternative approach to the one discussed in the topic below. The above topic attempts to solve the problem using SSE. The solution in this topic instead relies on Local Storage and htmx. There is no ajax polling. Please note that auto-refresh here refers to a page refresh during the same session and in the same domain. It is not cross-site. The solution can be summarised as follows: Use a hook in ready.php to inject custom JavaScript to the ProcessWire backend admin, specifically in ProcessPageEdit. This can be a JavaScript file or inline script. The purpose of this file is to update a given Local Storage key after a page has been edited. Instead of listening to the Save Button click, the script just checks on load if a certain hidden input whose value matches the page currently being edited is present. This hidden input is placed inside ProcessPageEdit form using a hook as detailed below. The hook in this step #1 can be: $this->addHookAfter('AdminTheme::getExtraMarkup', null, 'hookAddCustomScriptsToAdmin'); In ready.php, add a hook to listen to saveReady(), e.g. $this->addHookAfter('Pages::saveReady', null, 'hookNotifyFrontendOfBackendPageSave');. When the page is saved, the handler method hookNotifyFrontendOfBackendPageSave will check if there are changed fields. It will then write to a temporary session variable for the current page with the names of the fields that have changed. On page load/reload, there is a hook in ready.php that monitors ProcessPageEdit, i.e. $this->addHookAfter('ProcessPageEdit::execute', null, 'hookAddMarkupToNotifyFrontendOfBackendPageSave'); The handler hookAddMarkupToNotifyFrontendOfBackendPageSave will check if there is a session variable set in #2 indicating that the page has had recent changes in the fields names set to that session variable. If this is true (that there have been changes), this hook will simply append the hidden input in #1 to the ProcessPageEdit form. The JavaScript in #1 will detect the presence of the hidden input. It will then amend the Local Storage value in #1. It cannot simply set the value to the ID of the page being edited as that will not be construed as a change. It will instead append a timestamp to the id of the page as well and set that as the value. This will enable a listener in the frontend, described in the following steps, to detect that change. In the frontend, we have a simple JavaScript file that listens to changes to the Local Storage storage Event. The frontend also has a simple hidden input in the template file whose pages we want to refresh. This hidden input has htmx attributes. Its trigger is a custom event 'autorefreshpage'. It will listen for this event before firing a message to the server. This hidden input also has hx-swap=none as nothing will be swapped to it. Instead, the server will send back results of changed fields in various markup with hx-swap-oob=true. That will enable swapping markup in any parts of the frontend html in one go. When the Script at #4 detects that a given Local Storage value has changed, it first checks if the page ID for that Local Storage value matches the page ID of the page currently being viewed in the frontend. If the answer is yes, it will fire the custom event autorefreshpage. This will make htmx fire a message to the server whose only value is the page ID it has been passed. On the server, in home.php, we have code that is listening to ajax requests. If it detects an ajax request, it checks if the page ID that htmx sent matches an available session that was recently updated. This would be the temporary session detailed in #2 above. If a match is found, it calls a function in _func.php that will set variables for fields for that page. These use partial templates (optional). These partial templates are for different outputs of the page, e.g. images, headline, etc and all their markup have hx-swap-oob='true' in appropriate places. All these content is sent back as one output back to htmx. It will read the hx-swap-oob and find the markup to replace in the DOM using the html IDs of the markup with the hx-swap-oob. This approach allows the developer to have maximum control over what to send back to htmx and over the markup. In this step, we also delete the temporary session, ready for the next updates. The result is as demonstrated below. A simple and effective DOM swap that places very little strain on the server. I'll clean up the code for this and upload to GitHub. This approach will not work with Save + Exit and similar :-).

December 26, 2022

1 point

Use dependent selects in a module config

@rick messaged me asking for advice about how to use dependent selects in a module config. That is, where the options of one select change depending on what is selected in a another select. I thought others might also find this info useful so I've put together a demonstration module: https://github.com/Toutouwai/DemoDependentSelects The general idea is that the dependent (target) select needs to include all the possible options, and then the options are hidden or shown by JavaScript depending on what is selected in the source select. See the source code for comments.

April 17, 2022

1 point

Visual Studio Code for ProcessWire Developers

GistPad Manage your code snippets and developer notes using GitHub Gists and repositories. https://marketplace.visualstudio.com/items?itemName=vsls-contrib.gistfs Stumbled upon this today. I have watched the demo below by the creator. This is a really awesome tool! Other features include creating code documentation/notes, create code playgrounds right within VSC, interoperates with codepen, etc. Watch the video below, right to the end. Quality is not the best (it was a live stream) but it is worth it.

April 10, 2022

1 point

Padloper 2 Released

I am receiving questions with respect to the release of Padloper 2. I have added an FAQ section to the first post here. Please have a look. Thanks.

April 3, 2022

1 point

Use dependent selects in a module config

Ooh that looks useful. I’m away from my Desktop at the moment, but that looks like it might be handy to make my FieldtypeMeasurement module (even ?) slicker. I.e no need to save the first config field before choosing the dependent one.

March 29, 2022

1 point

Use dependent selects in a module config

Wow! Thank you! I was hoping for a suggestion where to look, not a complete working example. That is the cool thing about you, @Robin S, you are always above and beyond anyone's request for assistance. Much respect!

March 29, 2022

1 point

Htmx, Sse and real time dom swap help.

Well please try the module and see if that works first. Then try a custom approach and see if it breaks. The reason why I created the modules is exactly to avoid hickups when copy/pasting things from this thread or placing things in the wrong spot... So that everybody has a quick test of SSE with a single click (install module) and can proceed from there.

March 17, 2022

1 point

Htmx, Sse and real time dom swap help.

No, the issue is not the JS, the issue is that the data received is not properly formatted, so the client thinks the request failed and it fires a new request which is the same as if it were ajax polling. If the received data is properly formatted the stream can be read and the connection keeps open. This is my JS: const evtSource = new EventSource(url, { withCredentials: true } ); evtSource.onmessage = function(event) { console.log(event); if(event.data==='DONE') { evtSource.close(); if(modal) modal.hide(); if($button.data('reload')) RockGrid2.getGrid($button).reload(); } else if($progress) { $progress.fadeIn(); $progress.html(event.data); } }

March 10, 2022

1 point

Htmx, Sse and real time dom swap help.

I've had the same problem - the script seemed to work but sent several requests just like ajax polling: But I've managed to get it working correctly and the result is a nice stream of data and updating feedback for the user while bulk-editing of data in RockGrid. There's no reload, no ajax-polling: The key is to echo the correct message. I've built a method for this in my module: /** * Send SSE message to client * @return void */ public function sse($msg) { echo "data: $msg\n\n"; echo str_pad('',8186)."\n"; flush(); } If you do that, you can do something like this: <?php header("Cache-Control: no-cache"); header("Content-Type: text/event-stream"); $i = 0; while(true) { $this->sse("value of i = $i"); if($i>30) return; // manual break after 30s for testing while(ob_get_level() > 0) ob_end_flush(); if(connection_aborted()) break; sleep(1); } Not sure if the ob_get_level and ob_end_flush are necessary...

March 10, 2022

1 point

Sign In

FireWire

Points

Posts

kongondo

Points

Posts

bernhard

Points

Posts

wbmnfktr

Points

Posts

Popular Content

Optimizing 404s in ProcessWire

Padloper 2 Released

htmx: Auto-refresh Frontend Content on Backend Page Save

Use dependent selects in a module config

Visual Studio Code for ProcessWire Developers

Padloper 2 Released

Use dependent selects in a module config

Use dependent selects in a module config

Htmx, Sse and real time dom swap help.

Htmx, Sse and real time dom swap help.

Htmx, Sse and real time dom swap help.

Browse

Activity

My Activity Streams

Support

Store

My Details