Data Scientist (Remote)



Main Tasks & Requirements:

  • Design of an R package and a Python module to simplify the reuse of the harmonised data packages and their metadata, starting from case study materials provided as input and considering possible additional use cases;
  • Preparation of a plan to manage an agile project for the development and maintenance of the R package and the Python module – inside the client’s open source repository – to uptake feedback received from the users’ community of the R package and the Python module, until they will be mature;
  • Development of the R package and the Python module, including the versioning of the code inside the client’s repository, and creation of tutorial with practical examples for the self-training of the users of those software components;
  • Reporting frequently on the progress in the design, development and tutorial of the R package and the Python module, by adhering to the SCRUM and the PM2 Agile methodologies as for the management of iterations and tasks;
  • Assistance with refining the [meta]data harmonisation and packaging standards, also to introduce possible improvements in relation to the data principles and their implementation;
  • Interaction with the technical coordinator, the scientific coordinator, the scientists using data, the data harmonisation and integration expert also by writing technical reports to be revised in cooperation within the project team and possible other external stakeholders.

Experience and Technical Requirements:

  • Excellent knowledge of R [Studio] and Python, including packages to manipulate different [meta]data formats (CSV, JSON, JSON-LD, RDF, XML, SQLite, etc.);
  • Ability to design data models and to use R and Python transforming them efficiently and fast, to support data analysis and visualisation (e.g. with R Markdown and ggplot or Shiny, Pandas, etc.);
  • Ability to design and to develop R software packages and Python software modules, by applying high quality standards and ensuring the code versioning (Git);
  • Sound knowledge of reference R packages and Python modules to retrieve and manipulate data, preferably developed by communities implementing authoritative data packaging and publication standards;
  • Ability to cope with evolving FAIR data modelling and formatting standards;
  • Very good communication skills with technical and non-technical audiences;
  • Analysis and problem-solving skills;
  • Capability to write clear and structured technical documents;
  • Ability to participate in technical meetings;
  • Ability to give business and technical presentations;
  • Master’s degree (preferable) and a minimum of 5 years of experience in data science with R and Python;
  • Certifications in PM2 Agile or SCRUM are a plus;
  • Good English level (minimum B2).

