Where is my data?


Lately I am playing around with Windows Azure, a cloud based operating system that enables me to program applications that “live” in the cloud provided by Microsoft.

In my opinion cloud computing has huge potential for companies to change there IT structure in a way that is more scalable, reliable and cheaper. But there is one big question that keeps me thinking...

“Where is my data?”

Cloud-based services are highly distributed, accessible from multiple locations, by multiple users and even by multiple third-party service providers when needed. That is where they are designed for.

This results in the fact that the data of your application can be located on one server, or multiple servers. These servers can be hosted, cached and backed-up at different locations based on usage patterns and which cloud service you have chosen.

As Google's chief privacy counsel, Peter Fleischer, recently said, "It's very hard to answer the apparently simple question:

”Where's my data?”

You cannot locate your data, but you can still access it. This brings up a very interesting problem. Assuming your application is not only storing and using your company’s production data, but that of your customers and your personnel.

First of all you need to be aware of the fact that local laws may apply to your data stored on servers within the cloud. Considering the Patriot Act and other USA litigation you might have to think about what this means for accessing your data when it is stored in the USA.

Second you need to be aware of the fact that in order to benefit from the optimized use of infrastructure and resources, cloud computing assumes that your data will be moved geographically. Therefore you will hardly see any cloud computing service offerings that guarantee that your data will never be transferred out of a specific geographical region.

For example Principle 8 of the Data Protection Act in United Kingdom Law states: “Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.” The EEA includes all countries in the European Union, together with Iceland, Liechtenstein and Norway.

So when all your data is stored and processed within the EEA there will be no issue. Equally, if the service is provided within the approved jurisdictions only there will be no data protection issue (i.e. within Argentina, Guernsey, Isle of Man, Jersey and Switzerland together with Canada and the USA in certain circumstances).

This sounds good in theory, but in reality you will need to work in a situation where your data will be stored and processed on any number of servers in any number of jurisdictions worldwide.

In the end you will have the responsibility to ensure that you comply with national and international law and that adequate protection is given to your data within the cloud. But it will be very difficult to do this when you do not have the answer to the apparently simple question:

“Where is my data?”

Related Posts:

About the Author:

My name is Henrico Dolfing (@henricodolfing). Dutch by birth, I lived in the Netherlands, Germany and the USA. Currently I am living in Zurich, Switzerland. I'm working as a consultant on Enterprise Collaboration and CRM solutions. henricodolfing.com is my personal blog were I talk about SharePoint, Enterprise Collaboration, Presentations and Visualization. In my spare time I read, run, SCUBA dive and sail. I'm on my way of learning new things every day and this guest post at Jeff's blog is part of that process. Not to mention great fun on its own.

Comments

Saqib Ali said…
The beauty of the Cloud Computing paradigm is that the customer doesn't have to worry about the exact location of the data, as long as the provider can guarantee the "Confidentiality", Integrity and the Availability. In fact I prefer if the contract between the customer and the cloud computing provider doesn't include the clause about the exact location of the data. This gives the provider the agility and nimbleness during disaster recovery.


Confidentiality needs to be addressed. But I don't think "home-grown" clouds is the way to go. There are other ways to address the issue of confidentiality.

One way would be to use Host Proof Hosting Pattern. Though it requires considerable amount of computing power on the client side, it can certainly be used for small amount of data e.g. Personally Identifiable Information etc.
Anonymous said…
There are some major differences between Google's cloud and Microsoft's cloud. Whereas the question "Where's my data?" is a tough question to answer with Google, you have the ability to "pin" your data to datacenters in specific areas in Microsoft's cloud. This would resolve most of the problems mentioned in your article.
Thanks for clearing that up, especially the part about the Data protection act. It's a tricky business to make your information accessible to whomever needs it (within your workforce, of course) and still keep it from falling into the wrong hands.