Research Computing Environment at IQSS

HGSE students and faculty conducting research can apply for access to the Research Computing Environment (RCE), hosted by Harvard’s Institute for Quantitative Social Sciences (IQSS).  The RCE is a remote data storage and analysis platform designed to support social science research. It provides free, remote, persistent access to a powerful cluster of servers to use statistical software. Thus, using the RCE allows researchers to work on data analysis from wherever they like. Users are provided with a minimum of 50GB of storage space and also have free access to statistical applications such as Stata, R, SAS, Matlab, Mathematica, Gauss, and StatTransfer. 

  1. Getting an RCE Account
  2. Accessing the RCE
  3. Statistical software available on the RCE
  4. Use of the RCE - FAQs
  5. Other RCE Resources
  6. Read what other HGSE users of the RCE had to say!

I.  Getting an RCE Account

To gain access to the RCE, researchers being the process by emailing a basic request message to support@help.hmdc.harvard.edu.  Applicants are required to have all IRB paperwork completed before receiving access to the RCE.  Researchers working with secure data should also note that the RCE has been approved for secure data up to Level 3.  For more information about Harvard Research Data Security Policy (HRDSP) or about research data security in general, please review the Research Security section of the Information Technology website or contact HGSE’s Information Security Officer, Sarah Pruski, for questions or guidance regarding research data security.

After you send an initial inquiry you will be prompted to submit the following information: 

1. Your Harvard affiliation and contact information for your account sponsor, e.g., research project’s PI.  If you are seeking access to the HMDC Research Computing Environment independently, contact information for the director of your research (e.g. your thesis advisor).

2. A very short description of your project or the nature of your work.

3. Which statistical programs (R, Stata, SAS, etc.) you are interested in using



4. Anticipated total disk space needed


5. If you require backups of your data


6.  If you are  working with any confidential data (for example, identifiable human data documentation from the appropriate Harvard IRB classifying the data as Level 3 or lower is required.

Harvard Confidential Information includes any of the following:   

  • A person's name + state, federal or financial identifiers   
  • Business information specifically designated by the School as confidential 
  • Identifiable business information that puts individuals at risk if disclosed   
  • Research data containing private information about identifiable individuals 
  • Student records (such as collections of grades, correspondence)

7. Expected project duration.

8. If you plan to use cluster resources to run your jobs, expected job duration.

9. Number of concurrent jobs you expect to

10. Additional concerns you may have about the handling of your data / job

II.  Accessing the RCE

Once access is granted, you will receive a link to set up a unique passcode for yourself. Then after that you’ll receive a separate email with connection information:    

·      Mac OS X and Linux users: You will be asked to download OpenNX, which can be found here: http://opennx.net/download.html

·      Windows users: You will be asked to download and install the NoMachine NX3 client from HMDC: https://downloads.hmdc.harvard.edu/nx/nxclient-3.5.0-9.exe

·      For more detailed connection information please see the following page:
http://projects.iq.harvard.edu/rce/book/accessing-rce-0

IQSS provides a detailed guide about how to access the RCE from a variety of different computer settings, once you have an account: 

http://projects.iq.harvard.edu/rce/book/accessing-rce-0

You can request assistance getting started using the RCE by contacting stathelp@gse.harvard.edu

III.  Statistical applications available on the RCE:

IV.  Using the RCE - FAQs

Q:  How do I get my data into the RCE?

A:  This can be done in a variety of ways and largely depends on specific data use agreements.  Some users upload the data via a direct file transfer from a hard disk or storage device, some will bring it in from cloud-based service such as google Drive, some use secure file transfer services. 

Q:  How can I print or save results from my analysis?

A:  Again, this depends on the specific restrictions of the data use agreement.  Students can copy results from models and email them to themselves to print from another computer.  Some users will manually write model results on external software (e.g., MS Word or Excel) or on paper. 

Q:  Can I save new versions of my data as I clean and edit files? 

A:  Yes.  You just need to make sure to save files in the appropriate directory, which will likely be your shared folder. 

Q:  What happens if I exceed the data quota? 

A:  You will get temporarily locked out of your account and will need to contact the RCE helpdesk.

Q: Whom do I contact if I have questions?

A:  HGSE researchers can either contact the IQSS directly at support@help.hmdc.harvard.edu.  You can also check with our in-house Research Technology support staff by emailing stathelp@gse.harvard.edu.

V.  Other RCE Resources

IQSS already has a lot of useful information about the RCE, such as:

-A detailed guide about how to access the RCE from a variety of different computer settings, once you have an account: 

http://projects.iq.harvard.edu/rce/book/accessing-rce-0

-A guide to working in the RCE, once you have an account:

http://projects.iq.harvard.edu/rce/book/rce-basics

-A guide for running batch jobs in the RCE, once you have an account:

http://projects.iq.harvard.edu/rce/book/batch-processing-basics

-A highly technical FAQ section:

http://projects.iq.harvard.edu/rce/faq

Read what other HGSE users of the RCE had to say!

The following comments from current and former HGSE researchers and students—collected between fall 2013 and fall 2015—provide some interesting insights into the opportunities and limitations of working with the RCE. 

  • I've been using the RCE for 5 years now, and I like it a lot. It has allowed me to keep several large datasets secure, while still accessing them from the locations (including several states) that I have been working. 
  • I used the RCE for about two years.  In general, I thought it was an easy way to gain access to a lot of statistical software without purchasing it or downloading a temporary license from FAS.  However, really large programs such as Matlab did not run well.  I’m not sure if that was a problem with the strength of my internet connection, available space on my hard drive, or something else about the RCE.  Also, if I increased the size of a dataset and overshot my allotted data storage space, I would be locked out of my account the next time I tried to log on and one of the support staff would have to help me compress data and get back into my account.  
  • Thanks to RCE I have been able to conduct part of my analysis while I collect data abroad. It is a little complicated to learn how to use it and get used to it. I have had some problems when uploading the data but have been able to figure it out eventually.
  • I've been using the RCE for the past 8 months or so. The only real issue I have encountered has been around saving data and output to a personal folder. The HMDC allocates very limited space to users for this, and so you need to be sure to save to a shared space. This wasn't immediately apparent to me when I began using the RCE and so I very quickly exceeded my quota and was locked out of the environment. I had to delete files using Linux code in the default (i.e. non-intuitive) Linux environment. After that I've made sure to always only save to the shared folder and I haven't encountered any problems.
  • I've used a bit for the past few months. I didn't have much trouble connecting (I use Linux and Windows), and while sometimes it’s a bit slow to load I've had good experiences with it. On the collaborative side of things I can see it working well if everyone is working with files largely within the RCE, though I can imagine there might be issues if people are constantly downloading/uploading the same files for work on their own machines. As most of my work on it has been its use as a repository for static files shared across a large number of users.
  • I’m a big fan of the RCE. Having access to a persistent connection that I can access to at work, at home, and on the road is vital to my work. There are a few Linux commands that people should know how to use, just to deal with the relatively few glitches that might arise. A quick tutorial about it would be helpful.
  • “If the RCE goes away, I’ll need to retire!”