Sorry I meant to talk to you all about this at some point. Jeremy, Jim, and I bought into discovery a few years ago. The process and pricing has changed a little bit over the years. It definitely depends on what you need, but I highly recommend buying in to Discovery and recommend that you add to the DBIC group rather than doing something on your own. This is because it is highly advantageous to pool resources.
The standard nodes are about $5k and typically have 16 cores with 8gb of ram per core (128gb). Research computing is now continually buying them and negotiating package deals, which allows them to provision new resources to us immediately when we need them. The more of us that buy in on the same group, the more cores we can pool together. We get a 4 x multiplier on the cores we buy. I think we at least 10 at the moment, but I can’t remember exactly. We get them for about 5 years until the warranty expires, then they are cycled out and we have to buy more, which is why I’m recommending that you not rush to buy a bunch right away, but add them as needed. We bought a high-ram node (1.5tb) with 24 cores called ndoli. These have actually gotten more expensive from when we originally purchased it and we decided to do an extended warranty for a few years rather than buy a new one. The CPU nodes are accessed through the main scheduler and ndoli is accessed by directly connecting to it, though we may revisit this in the future. Anyone that is part of the DBIC group, which is basically the entire department, has access to these resources. We are hoping that we will be able to continue to augment this system with funds from the imaging center, individual PI’s startup, and Tor’s funds. There are GPU nodes on discovery that we can use, but to my knowledge no lab has bought their own on discovery. People typically just build local workstations for this.
Storage is a little more complicated. There are several tiers of storage on DartFS, I can’t remember the exact differences or prices, but it is roughly $100 per tb. There is a slow and faster option and with and without snapshots. We originally bought about 40tb for the imaging center, but now we recommend that every lab just buy what they need. It was too hard to keep track and some labs (like ours) were using a lot more than others. I think our lab probably has about 30-40tb currently and we recently switched to the faster storage with snapshots. I’m personally not a huge fan of DartFS, it is insanely slow and does not use standard unix permissions, instead it uses Access Control Lists (ACL), which are way more complicated and need to be managed in a windows computer. I highly recommend you request to have your storage use the standard POSIX permissions as the ACL stuff is a nightmare and conflicts with a lot of software we use. It’s a long story about why storage is set up this way, but it will likely change in the future. The short answer is that they decided to make it possible to mount this storage directly to any computer. So you could mount your discovery storage to your laptop and work with your data on any computer. Cool idea, however, in practice it creates a host of problems and I strongly recommend that you do not do this as it automatically converts your storage to use ACL permissions rather than POSIX, and it is a huge pain to switch it back. None of this storage is currently backed up other than the RAID 5/6 they are using and the snapshots. Definitely make sure you have anything important (like data) backed up elsewhere. DBIC storage is currently not HIPAA compliant, so make sure you aren’t storing anything that is PHI, though I understand this will change soon.
Additional Computing Resources
In addition to everything on Discovery, our lab has purchased a few more things. We have a ton of laptops for students and RAs to work on and collect data. We have a few windows and linux computers for various things. We have a workstation for RAs to work on things. We have about 80tb of direct attached storage configured as RAID 5. We have a webserver that is in the Moore server room, which hosts a bunch of web apps that we have developed including http://neuro-learn.org/.
There are some legacy servers in the Moore server room that were used by other labs before many of us started migrating over to Discovery. I believe these are still active and Andy Connolly is maintaining and managing them. Not much storage - maybe 30tb, a few rack mounted compute nodes from microway - probably at least 2 with about 64 cores, but they are very old and pretty slow (I think Dr. Zeus and Hydra). Research Computing at Dartmouth was historically not great, which is why many departments including PBS built and managed everything on their own. I think CS still does. However, about 5 years ago, they started to make some substantial investments and have begun to hire a bunch of new people and grow the infrastructure. They’ve been trying to get more departments like ours to move back to centralized computing. We’ve been very happy working with them and encourage all of the new labs to do the same rather than manage your own resources.