How to Get Data

Step 1: Install DataLad

RBC is accessible via datalad. Follow the instructions here to get it installed.

Step 2: Pick a dataset to clone

You can find each of the unprocessed BIDS MRI data, processed functional and processed structural derivatives in their own git repositories.

The git repositories (you can find then all here) are consistently named such that:

If you’re looking for BIDS MRI, the repo will be named <study>_BIDS
If you’re looking for processed functional data, the repo will be named <study>_CPAC
If you’re looking for BIDS MRI, the repo will be named <study>_FreeSurfer

where <study> is replaced with HBN, NKI, PNC, BHRC of CCNP.

Step 3: Clone the data and take a look

Cloning

Getting data on to your system will involve running a command like this:

$ datalad clone https://github.com/ReproBrainChart/<study>_<content>.git

Suppose I’d like to get some processed anatomical data from PNC. I would replace <study> with PNC and <content> with FreeSurfer. My command would be

$ datalad clone https://github.com/ReproBrainChart/PNC_FreeSurfer.git

You will see some warnings such as these:

[INFO   ] Unable to parse git config from origin
[INFO   ] Remote origin does not have git-annex installed; setting annex-ignore
[INFO   ] This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
install(ok): /home/cieslakm/data/PNC_FreeSurfer (dataset)

but you now have a DataLad dataset with everything you need to look at some data!

Getting data

That command probably finished a lot faster than you were expecting: PNC FreeSurfer has 1592 subjects in it! But this is normal for DataLad. You will see a PNC_FreeSurfer directory that you can look around in and see all the files you might want to copy to your workspace.

For example, let’s take a look at the data we might want from one of the PNC subjects:

$ cd PNC_FreeSurfer/freesurfer/sub-192413932
$ ls
sub-192413932_brainmeasures.json@  sub-192413932_fsaverage.tar.xz@
sub-192413932_brainmeasures.tsv@   sub-192413932_fsLR_den-164k.tar.xz@
sub-192413932_freesurfer.tar.xz@   sub-192413932_regionsurfacestats.tsv@

We know from the data dictionary that the tabular data is available in the two TSV files, so let’s get a copy of them that we can open:

$ datalad get *.tsv
get(ok): freesurfer/sub-192413932/sub-192413932_regionsurfacestats.tsv (file) [from output-storage...]
get(ok): freesurfer/sub-192413932/sub-192413932_brainmeasures.tsv (file) [from output-storage...]
action summary:
  get (ok: 2)

Shell glob patterns can be used to get whichever files you might need.