DIVA in Jupyter notebooks

class: center, middle, titlepage

#  Introduction to Data-Interpolating Variational Analysis
Alexander Barth, Charles Troupin, Aida Alvera-Azcárate, and Jean-Marie Beckers

Link to these slides:
https://tinyurl.com/DIVAnd-VRE

GHER, University of Liège, Belgium

![logo](Fig/logo_ulg2.svg)
![logo](Fig/GHER.svg)
![logo](Fig/seadatacloud.png)

---

# What is DIVA?

![right300:](Fig/Divand_realistic_example.svg)
* DIVA: Data Interpolating Variational Analysis
* Objective: __derive a gridded climatology from in situ observations__
* The variational inverse methods aim to derive a continuous field which is:
 * __close to the observations__ (it should not necessarily pass through all observations because observations have errors)
 * "__smooth__"

---
# Draftman spline

* Splines are a type of curve
* Originally developed for ship-building and aircraft design (before computer modelling)
* Draw a smooth curve through a set of points.
* Placing metal weights (called knots or ducks) at the control points
* bending a thin metal or wooden rod (the spline) through the weights.
http://pages.cs.wisc.edu/~deboor/draftspline.html

---
# Cost function

* Formalized via a cost function:
$$
J[\varphi] = \sum\_{j=1}^{N\_d} \mu\_{j}[d\_{j}-\varphi({\mathbf x}\_{j})]^{2} +  \|| \varphi- \varphi\_{b} \|| ^{2}
$$
 where $d\_{j}$ are the measurements at the location ${\mathbf x}\_j$ and their weights $\mu\_j$, $\varphi\_{b}$ is a background estimate of the field.

The norm in the previous equation has a particular form:

$$
\|| \varphi \|| ^{2} = \int_\Omega \frac{1}{L^4} (\nabla^2 \varphi)^2 + \frac{2}{L^2} (\nabla \varphi)^2 + \varphi^2 \; dx
$$

where $L$ is the correlation length-scale.

---
# Simple example

![right450:](Fig/simple_example.png)
* Two observations at the location (-0.5,0) and (0.5,0)
* Both values are equal to 1 but the relative expected error variances are 0.5 and  1 respectively.
* Correlation-length scale is 0.2

---
# Topography

* decouples basins based on __topography__

---
# Ocean currents

![right450:](Fig/orca_test_divand_adv_point_2d.svg)
* __ocean currents__ can be taken into account
* Background covariance (left panels) relative to the location marked by a cross and surrounding grid points and background variance (right panels). The upper (lower) panels corresponds to the case without (with) advection constrain.

---
# Error variance estimation

<img src="Fig/cpme.png" style="width: 400px; float: left; margin-right: 40px">
<img src="Fig/scaled_error.png" style="width: 400px; float: left; margin-right: 40px">
 * data distribution general uneven
 * the expected error of the climatology thus not constant
 * where we have more data, a smaller error is expected
 * for some application, it is peferable to mask areas where the expected error is high

---
# Outlier
<img src="Fig/DIVAnd_qc0.png" style="width: 350px; float: left">
<img src="Fig/DIVAnd_qc1.png" style="width: 350px; float: left">
<img src="Fig/DIVAnd_qc2.png" style="width: 350px; float: left; margin-right: 40px">
* outliers have generally quite different values than other data points in the vincinity
* analysis represent the mean state smoothed over a certain length-scale
* the residual is the difference between the observations and the analysis
* outliers have often large residuals
---

# Ways to use DIVAnd

![right450:](Fig/example_jupyterhub.png)
* Open source: https://github.com/gher-ulg/DIVAnd.jl
* Integration with __Jupyter notebooks__ (SeaDataCloud Virtual Research Environment)
* DIVA is integrated in __Ocean Data View__
* __REST interface__ in development
* Play with DIVAnd: http://data-assimilation.net/Tools/divand_demo/html/

---
# Example: Chlorophyll-a data product

![right450:](Fig/DIVA_chla.png)
* 6-year running average (previously 10-year running average) analysis
* Developed by AU-BIOS (Denmark), HCMR (Greece), Ifremer (France), NIMRD (Romania), SMHI (Sweden)
* Only the interpolated field in the proximity of the observations is shown
* Interpolated field on the full domain is available
* Used/developped in projects: SeaDataNet/SeaDataCloud, EMODnet Chemistry, EMODnet Physics, EMODnet Biology
* HPC application: HPC Phidias
* Well suited for HPC environements (to statisfy CPU and memory resources)

---
# DIVAnd and DIVA

* DIVA: Fortran tool with shell scripts
* DIVAnd: rewrite of DIVA in __Julia__ (DIVAnd.jl)
* Julia: good trade-off between __efficiency__ of a compiled language and __flexibility__ of a dynamic language
* Facilitate the installation:
  * Use __Jupyter notebooks__ fully configured environment for DIVAnd.jl
  * __Docker container__ allows one to easily replicate these environments
* *Are you familiar with a programming language? If yes, which?*

---
![full:](Fig/julia-origin.svg)

---
# Jupyter notebooks

![right300:](Fig/example_jupyterhub.png)
* Integrated web environment
   * __Computing__
       * Interactive
       * *Ju*lia, *Py*thon, *R*,...
   * __Visualization__
   * __Documentation__
       * High-quality type setting and equations (Latex)
       * Export to HTML and PDF (among others)
* Easy to __share__, on e.g. nbviewer.jupyter.org and github.com 
* Facilitate __reproducibility__ and peer-review (of DIVA climatologies in particular)
* Significant community around Jupyter notebooks
* Also involvement of players outside of the scientific community (Google, Microsoft with Azure ML)
* Jupyter notebooks: __single__ user

---
# Jupyter architecture
![](Fig/jupyter.svg)

<!--
# Jupyterhub
* __Multiple users__
* Web-proxy in front of jupyter
* __Authentication__:
   * OAuth, LDAP, ...
* __Isolation__:
   * Systemd-nspawn (light-weight namespace containers in Linux), Docker containers, ...

-->

---
# Jupyterhub architecture
Jupyterhub: __multiple__ users
![jh:](Fig/jupyterhub.svg)

---
# Jupyter lab
![jl:](Fig/jupyterlab.png)
* Jupyter lab interface: more similar to an Integrated Development Environment (IDE)
* Compatible with the same notebook format

---
# Overview

* Overview of main compenets in the Virtual research environement used during this workshop

---
# Jupterhub

* __Docker containers__, preinstalled with Julia and various Julia packages:
   * Plotting library (PyPlot) and a more specialized library for ocean data
   * DIVAnd
   * ...
* Julia packages are precompiled
* Transfer files via __WebDAV__ in Julia:
   * Using explicit download and upload requests
```julia
# download from NextCloud to Jupyter Hub
get("file_in_nextcloud.nc","file_in_jupyterhub.nc")
# upload from Jupyter Hub to NextCloud
put("file_in_jupyterhub.nc","file_in_nextcloud.nc")
```

---
class: middle
# Conclusions

* DIVAnd and DIVA are open source and available under the GPL licences
* New approach to generate DIVA climatologies using a cloud computing infrastructure
* Template of jupyter notebooks are be provided which users can adapt
  * Improve the __consistency__ between product
  * Facilitate __reproducibility__
* Jupyter notebook is not a software specific to SeaDataCloud
  * Users might already be familiar with Jupyter notebooks
  * But if not, learning to work with Jupyter notebooks can also be useful in other contexts
  * Jupyter can easily installed on a local machine
* Jupyterhub:
  * Docker allows to provide a __standardized computing environment__ to all users
  * The jupyter notebook can be used to fully __document the generation of the climatology__

---
class: middle
# Organization

* Only a subset of the notebooks will be covered
* Notebooks (directory __work/DIVAnd-Workshop/Exercises/__)
    * Presentation notebooks (new notebook from scratch)
    * 02-Julia-introduction.ipynb
    * 06-topography.ipynb
    * 09-ODV-data-import.ipynb
    * 90-full-analysis.ipynb
* Exercices for the Workshop are the folder __work/DIVAnd-Workshop/Exercises/__
* Solutions are in corresponding files in work/DIVAnd-Workshop/