Link Search Menu Expand Document

DataHub - import data

Both Data Portal and DataHub works with many different types and format of data. This leads to inability to create fully automatic data loader, as the data model are still changing and multiple formats of data are still coming. Now (Aug 19’) we are working with data stored in .JSON and .xlsx formats. To handle import of these data, EBI-Parser was implemented You are able to run import of data into instacne of DataHub through its CI job. Currently, three deploy targets are set up and it is possible to extended of custom targets if needed. Data are stored in GitLab LFS storage. In the Jun of 20’ big part of the demanded data are in the Data Portal. This makes the import of data into Data Hub easier, as only few records needs to be imported.

Import process

  • After successful initialization you can choose in EBI-parser CI job desired deploy target. If you want to import data locally, you can replicate sequence of commands and alter target of databases. Or you can add your virtual server address to CI job, commit your changes and run through GitLab CI interface. Upload of data takes 30 minutes.

  • If desired, you can upload old (Feb 18’) expression data. To do this, you need to run make import-expressions in datahub-docker folder. After the import is done, you have to run build_cache CI job in DataHub repository. Again, it is possible, that you will have to append your custom virtual machine as target.

  • Currently (Jun 20’) inside datahub-docker repository exists CI job, which is able to deploy and populate Data Hub instance (on any project branch). This is done by 4 stages, which has to be run manually from the GitLab web interface. Import of the expression data is optional.