Data Governance for AI Data


Data being the cornerstone of Artificial Intelligence, it is important to understand the ethical and legal ramifications of obtaining and using any data. Governance is a term that has been applied to a number of areas of technologies. Governance is about ensuring that processes follow the highest standards of ethics, while following legal provisions in spirit, as well as to the letter of the law. We have the ability to capture large amounts of data today. Most of this data comes from customers’ devices and equipment, and the vast majority of internal data is also usually customer data. The bigger problem that comes with storing vast amounts of data, from whatever the source, is ensuring its security. As the holder of data, an organization is responsible for not disclosing any user’s data to third parties without the user’s consent. For good theft-prevention and regulatory practices, data privacy cannot merely be an afterthought after the information has been stolen. Data storage systems, if designed with security in mind, can go a large way in thwarting attempted hackers.

Data governance involves ensuring that the data is being used to further the goals of the organization, while remaining compliant with local laws and ethical requirements. On the ethical side, data should not be obtained without the consent of the individuals featured in it. Additionally the individuals featured, as well as the proprietors of said data, should all be aware what the data will be used for. Data that is collected without the express consent of the user should not be used and shouldn’t have been collected in the first place. “Do Not Track” requests by browsers of the users should also be honored. Most modern browsers allow this option to be set, but the implementation is left up to the integrity of individual websites.

While complete Data Governance should be the goal, it will likely take some time for the organization’s thinking to mature, so the initial focus should be on improving processes and avoiding repeat mistakes. A proactive approach is always better in data management, as abuse of data could have potentially huge impacts. Companies have gone bankrupt in the wake of critical data breaches.

Sample Data Governance Policies

Listed below are some data policy measures that can act as a good starting point:

  • Data Collection Policies: Users should be made aware of what data is being collected and how long it will be stored. Dark design patterns that imply consent rather than ask the user explicitly should not be implemented. Data collection should be “opt-in” rather than “opt-out,” or in other words, data collection shouldn’t be turned on by default and no data should be collected without specific user approval. If data will be sent to third parties for processing/storage, the user should also be informed of this upfront. A sane data collection policy goes a long way in establishing goodwill and customer satisfaction.
  • Encryption: Encrypting sensitive information such as credit card information has become a standard industry practice, although user data remains unprotected. The impact of a data breach can be lessened if the data is encrypted. Encryption alone could be the sole factor between a bankrupting event and simply a PR issue. It is obviously necessary to protect the keys and the passwords used to encrypt the data as well. If the keys are exposed, they should be revoked, and passwords should be changed for good measure.
  • User Password Hashing: In some scenarios such as user passwords, a technique called hashing can be applied instead of encryption. Hashing is a process which takes a user’s password and turns it into a unique text string which can only be generated from the original password. This process only works in one direction meaning there is no way to retrieve the original password from the password hash. For example, the password password123 (which is a terrible password) could be converted using a hash function into the string a8b3423a93e0d248c849d. This hashed password is then stored in the database instead of the real password. The next time a user wants to log into the system, the provided password is hashed and then checked to see if it matches the stored hashed password. In this way, hackers would only able to steal password hashes which are worthless and not the original passwords. This extra layer of protection especially aids users who use the same password for multiple sites (also not a recommended practice) since once a hacker has a password, they can attempt to use that same email and password combination on other popular Internet sites, hoping to get lucky.
  • Access Control Systems: All data should be classified based on an assessment of factors like its importance to the user and the company, whether it contains personal data of users or company secrets, etc. For example, the data could be classified as “public,” “internal,” “restricted,” or “top secret.” Based on the classification assigned to the data, appropriate security measures should be established and followed. Access to data must be controlled, and only approved users should be granted access.
  • Anonymizing the Data: If the data needs to be sent to a third parties or even other less secure internal groups, all potentially identifying information should be scrubbed from the data, like names, addresses, telephone numbers, IP addresses, etc. If a unique number is allotted to an individual, it should be randomized and reset as well. No data should be shared with third parties without sufficient consent being obtained from the users who are featured in the data being shared.

Creating a Data Governance Board

As we’ve seen, data governance is a critical part of any organization’s data strategy. In order to develop the initial data governance policies, a data governance board can be constituted. The board will develop the organization’s data governance policies by looking at best practices across the globe like GDPR, HIPAA provisions, etc. The Board should be formed with people who can drive these big decisions. The necessity may arise for the Board to push through difficult decisions that are at odds with the aims of the organization, in order to protect the rights of the people whose data is at risk.

In most cases, it is easier to start with an existing set of data governance rules, then adapt the rules to fit your organization. Your Data Governance Board will aid in making key decisions for which policies may not yet have been established, setting precedents and then instating newer policies, as the organization evolves and grows to handle more data. Such a process will help to ensure that the costs of governing the data do not exceed the benefits derived from it.

Interested in learning more about Data Governance for AI Data?
Enter your email below: