Google plans to host open access scientific data, according to a blog post at Wired Science.
I work with social science data sets, which are generally not as large as hard science data sets can be, but there are some similar issues.
First among them is getting the researchers who are collecting the data excited about sharing that data. The National Institutes of Health (NIH) mandate data sharing for research projects they fund, but this is often the last step of the research process, conducted when funds are running out or have been exhausted.
Even if the researcher is interested in sharing his or her data, it is not a simple process. Just putting raw data online doesn’t do any good without appropriate documentation, description, and access tools. In addition, social science datasets usually involve human participants, so any identifying information must be stripped from the data, or the data must be restricted to researchers who have signed usage agreements and put appropriate security measures in place to protect participants’ personal information. Google does not exactly have a spotless record when it comes to privacy protection.
Archiving and preservation, however, is where I’m really not sure I trust Google. As I mentioned, data dissemination often comes at a stage in the research when funding has run out, so free looks good. But will Google’s free service continue to exist if Google someday decides that it is not a good business investment? Or what happens if (gasp!) Google goes out of business, or is sold to another company?
Libraries and institutional archives have a good track record on privacy and on long-term preservation. Google may provide increased open access, but I don’t think it can eliminate the need for solid, continually funded institutional data archiving.