Jump to content

Personal data security


Loges
 Share

Recommended Posts

Hi all,

 

I'm building my first PW site that will contain user personal info (it's a job matching site where people sign up to create a profile where some is public, some is used by admins in the back end).  Now obviously people are signing up and it's all transparent what info they're providing and what we're doing with it.  My concern is that this info is stored in fields in the database in clear text so if a MySQL database dump goes missing it's got a lot of personal information stored in it.  Frankly there's nothing too private (we don't collect DOB, financial data etc) but it is definitely personal data.  This is in Australia only so not as onerous as GDPR but the new privacy breach reporting/notification laws (and general good practice) mean I want to minimise any potential issue.

 

On other sites I've built (non PW) that handle personal data I do a basic encrypt/decrypt for those fields so the database dump is gibberish (not perfect as obviously with the PHP files it can be decrypted but the SQL dump by itself is safer).

 

I figure I could create a new fieldtype/inputfield in PW where i do a similar thing, but then that of course means any $pages->find() requests on those fields won't work.

 

Has anyone dealt with a similar issue and (hopefully) come up with an elegant solution?

 

Thanks 

Link to comment
Share on other sites

Hi there. Not a complete answer, but first of all I'd suggest taking a closer look at this post, which has a link to an article hopefully resolving some worries regarding encryption of data:

Other than what you've described above I don't have a rock-solid solution for you. I would assume that, as long as you're taking care of other parts of the regulation – such as breach reporting, removal of data on request, and so on – and you encrypt all traffic targeted at personal data (HTTPS), you should be safe.

But then again: IANAL, so please don't take my word for it. I'm just assuming some common sense, really. With current technology it would be next to impossible to encrypt all personal data, while still collecting all the necessary data in the first place. Think of things like server log files etc. :)

  • Like 3
Link to comment
Share on other sites

@Loges: Somewhere in the middle of my (unfortunately growing) todo list is a field encryption module (or, precisely, a whole set of them for different scenarios). The bad news is that it's been there for a while and regularly been overtaken by reality (speak crypto api changes, now a badly under-documented libsodium, breached algorithms, etc.) and I've been hesitant to roll out something that might not be future-proof.

You should be able to implement something quickly though if you don't aim for a generic, fool-proof-in-any-environment solution. With PW's hooks api, you could even add encryption to existing field types like FieldtypeText and its descendants. Here's a short rundown of my thoughts that might get you started with your own module:

  • Use symmetric encryption, store your key in a property in site/config.php, make sure to avoid insecure combinations like AES256 with ECB
  • You need at a minimum the following hooks:
    • FieldtypeText::sleepValue where you encrypt the field values for storage in the DB. Be aware that the value passed to that might either be a string or an array of strings (in multi language sites), each of which you need to encrypt.
    • FieldtypeText::wakeupValue where you decrypt the db values for use inside PW. You get either a string value or a LanguagesPageFieldValue object (multi language sites) for which you need to decrypt the value for every language (use getLanguageValue/setLanguageValue).
    • FieldtypeText::getConfigInputfields to add a property (InputfieldCheckbox) that determines whether to use encryption (and is checked inside sleepValue/wakeupValue) in the field's configuration
  • Use a truly random initialization vector (IV) for encryption, which is, depending on PHP version and configuration, sometimes harder than it sounds.
  • Of course, back up your encryption/decryption key really good :-)
  • Like 8
Link to comment
Share on other sites

Thanks @BitPoet

 

I'll have a crack at a basic version that really just makes it a bit harder to get into.  I'm really just looking to make it hard enough that it's not worth the hassle trying to crack for an opportunist.  I wouldn't be comfortable (or capable) of more than that, and given the low level personal data that's all it'll need anyway.

 

it sounds like you live and breathe this stuff, whereas it's not an area I've spent much time in.  Do you have a recommendation for settings ie I take on board not using AES256 with ECB - are there standard settings that you'd recommend?  I'm assuming we're talking about the openssl encryption module (maybe showing my ignorance here)?

 

Given my low level requirements what's my best bang for buck IV generator?

 

Thanks

Link to comment
Share on other sites

6 hours ago, BitPoet said:

Here's a short rundown of my thoughts that might get you started with your own module:

Thanks for the post! How would getMatchQuery() work if data is encrypted or that's the compromise, i.e. you lose search functionality?

Link to comment
Share on other sites

4 hours ago, Loges said:

Do you have a recommendation for settings ie I take on board not using AES256 with ECB - are there standard settings that you'd recommend?  I'm assuming we're talking about the openssl encryption module (maybe showing my ignorance here)?

 

Given my low level requirements what's my best bang for buck IV generator?

OpenSSL is just one possibility. I've also used phpseclib and am just in the process of figuring out libsodium, but all should be viable tools there. I guess standard AES256 and CBC mode should both be sufficiently secure and reasonably well documented, and openssl_random_pseudo_bytes should likely be safe enough in an up-to-date setup.

3 hours ago, kongondo said:

How would getMatchQuery() work if data is encrypted or that's the compromise, i.e. you lose search functionality?

Yes, you can either encrypt or search. I guess offloading en-/decryption to the database server would be possible with specialized field types, but searches would be absolute performance killers (all such fields would have to be decrypted in-memory, then flat table scans would have to be performed).

  • Like 2
Link to comment
Share on other sites

Yes I envisage the fields to encrypt would not be searchable.  Eg I wouldn't encrypt City, Gender, Broad Age Range, but I WOULD encrypt street address, date of birth.  So the broad demographic data is searchable (and not overly sensitive personal data), whereas the specific personal data is encrypted (and not searchable - and not needed to be).

 

Would another option be to just basically hash the values (with a key to unhash them), so the search would look to match the hash string rather than decrypting everything?  That would only work for exact string matches I suppose and would need some intervention in the find query eg $pages->find("template=person, city=".hashStr('London').") where the hashStr() function converts to the hashed value?

 

That would still make the MySQL dump/tables unreadable to the average or low level person whilst retaining some level of searchability?

Link to comment
Share on other sites

Disclaimer: Complete noob in security here.

So, at the risk of sounding silly, would it make sense to keep the key in a different server?

I mean, if the site's server is compromised, the key would be visible in the code. So, I'm thinking the key could be stored in a different server that's "completely airtight", and the only thing it does is listen to a key request from the main site's server, checks the IP and lends it the key. So any site scripts that needed to handle an encrypted field would have to make that request first.

Does this make sense? Or would a breach where someone can access the DB + PHP files be so far gone that they'd also easily make the server request and expose the key?

  • Like 2
Link to comment
Share on other sites

Yes that's what I do for the most sensitive data (disclaimer also a relative noob).  For those sensitive fields there's nothing readable on the PW server and it calls a remote server to decrypt - basically exactly as you've outlined - checks the IP address of the incoming request then does the decryption and returns as required.

 

It just means that any MySQL dump from either server in isolation is useless, and even if the MySQL dump and the PHP files from the main server are available it's one more step away.  (The remote server MySQL only contains a PW ID key and the encrypted data so nothing identifiable really).  So it's not perfect but would require someone gaining full access to the server in situ (ie on it's current IP address) rather than just hacking a backup somewhere.  Given the data is identifiable but relatively innocuous I hope that's enough to get anyone malicious to give up and go after one of the millions of Wordpress sites :-)

 

Backups are (maybe wrongly) really my biggest concern.  Goodness knows how many there are floating around with web hosts doing auto backups, sitting on my own hard drives, backups to Dropbox, sitting in PW database backup folders on the server etc

 

Again, I'm not dealing with catastrophic data (eg credit cards) so I figure a level of hackery is OK.  For one site we store a person's bank account number so we can generate batch bank payments to them (ie it's nothing that's not sent around on an invoice - we can pay into the account but not draw from it obviously) but I really didn't want hundreds of names, addresses and bank numbers sitting in a MySQL dump in clear text anywhere.

  • Like 1
Link to comment
Share on other sites

11 hours ago, Loges said:

Backups are (maybe wrongly) really my biggest concern.  Goodness knows how many there are floating around with web hosts doing auto backups, sitting on my own hard drives, backups to Dropbox, sitting in PW database backup folders on the server etc

I think that's one of the biggest challenges I face and practically impossible to retrospectively remove data.

As an example of the digital footprint user data can have, here's a good example of the headache involed.

  1. server backs up a copy of a site. 
  2. gets uploaded to dropbox which stores multiple archived versions of backups
  3. my NAS syncs with my Dropbox account and downloads the same backup.
  4. NAS does both snapshots and incremental backups
  5. NAS also backs up backups to Amazon S3 and/or amazon glacier
  6. My own Mac runs CrashPlan or Syncplicity or Time Machine

That's not even considering the backups my host is doing internally :-/

If I had a request from a clients customer to remove their data from the database it'd be a real challenge to perform this thoroughly.

I'll probably look at stream lining my backup strategy but we loose a certain amount of redundancy in the process.

Reading the GDPR guidelines and the part about anonymising user data I'd love to see this a priority with Processwire.

Maybe Formbuilder gets some way of collecting data but that data is only visible to logged in users. Having a compromised database would reveal nothing.

Maybe when creating a field called FirstNam there's a new setting which would anonymise the entries to anyone without the proper permissions.

 

 

 

 

  • Like 1
Link to comment
Share on other sites

Here's a short (or not so) proof-of-concept implementation that adds symmetric encryption to FieldtypeText and derived types. Supports multi language fields and uses either AES256 from phpseclib or sodium's sodium_crypto_secretbox if available on PHP >= 7.2. Key creation is done in the module's configuration settings. There's also a hookable loadKey method to retrieve the key from somewhere else (needs to return the base64 encoded key).

  • Like 8
Link to comment
Share on other sites

5 hours ago, BitPoet said:

Here's a short (or not so) proof-of-concept implementation that adds symmetric encryption to FieldtypeText and derived types.

This is brilliant! Definitely deserves its own thread in the modules sub-forum.

I only tested it briefly so far but I'm amazed at how fast it is - I thought the decryption would cause a noticeable delay.

  • Like 2
Link to comment
Share on other sites

On the topic of searchability of encrypted fields, as long as the number of encrypted pages you want to search across is not too huge you can load the pages to a PageArray and then search the PageArray. Using the auto-join tip mentioned recently here by @thetuningspoon...

$decrypted = $pages->find("template=some_template", ['loadOptions' => ['joinFields' => ['encrypted_field_name']]]);
$items = $decrypted->find("encrypted_field_name%=foo");

 

  • Like 2
Link to comment
Share on other sites

7 hours ago, Robin S said:

I only tested it briefly so far but I'm amazed at how fast it is - I thought the decryption would cause a noticeable delay.

Glad to hear that. I also toyed with asymmetric encryption using phpseclib's pure PHP implementation, and that really did a number on the CPU. Symmetric encryption (and using native code in sodium or openssl) thankfully behaves a bit more deterministic in that regard.

Link to comment
Share on other sites

It stays encrypted. Same when you set an encrypted field back to unencrypted. I'm still pondering how to tackle that properly (if at all), since decrypting every occurrence of a field in one go is likely to run into timeouts when the count gets high.

  • Like 1
Link to comment
Share on other sites

On 19.2.2018 at 1:18 PM, heldercervantes said:

I mean, if the site's server is compromised, the key would be visible in the code.

There are also a lot of ways to leak data without having a full server compromise like publicly available sql dumps (put somewhere in the webroot) or permission issues on shared hostings. So encryption does still have a value even if the secret key and the encrypted content are on the same machine.

  • Like 3
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...