class: center, middle, titlepage # Introduction to Data-Interpolating Variational Analysis Alexander Barth, Charles Troupin, Aida Alvera-Azcárate, and Jean-Marie Beckers Link to these slides: https://tinyurl.com/DIVAnd-VRE GHER, University of Liège, Belgium ![logo](Fig/logo_ulg2.svg) ![logo](Fig/GHER.svg) ![logo](Fig/seadatacloud.png) --- # What is DIVA? ![right300:](Fig/Divand_realistic_example.svg) * DIVA: Data Interpolating Variational Analysis * Objective: __derive a gridded climatology from in situ observations__ * The variational inverse methods aim to derive a continuous field which is: * __close to the observations__ (it should not necessarily pass through all observations because observations have errors) * "__smooth__" --- # Draftman spline * Splines are a type of curve * Originally developed for ship-building and aircraft design (before computer modelling) * Draw a smooth curve through a set of points. * Placing metal weights (called knots or ducks) at the control points * bending a thin metal or wooden rod (the spline) through the weights. http://pages.cs.wisc.edu/~deboor/draftspline.html --- # Cost function * Formalized via a cost function: $$ J[\varphi] = \sum\_{j=1}^{N\_d} \mu\_{j}[d\_{j}-\varphi({\mathbf x}\_{j})]^{2} + \|| \varphi- \varphi\_{b} \|| ^{2} $$ where $d\_{j}$ are the measurements at the location ${\mathbf x}\_j$ and their weights $\mu\_j$, $\varphi\_{b}$ is a background estimate of the field. The norm in the previous equation has a particular form: $$ \|| \varphi \|| ^{2} = \int_\Omega \frac{1}{L^4} (\nabla^2 \varphi)^2 + \frac{2}{L^2} (\nabla \varphi)^2 + \varphi^2 \; dx $$ where $L$ is the correlation length-scale. --- # Simple example ![right450:](Fig/simple_example.png) * Two observations at the location (-0.5,0) and (0.5,0) * Both values are equal to 1 but the relative expected error variances are 0.5 and 1 respectively. * Correlation-length scale is 0.2 --- # Topography * decouples basins based on __topography__
--- # Ocean currents ![right450:](Fig/orca_test_divand_adv_point_2d.svg) * __ocean currents__ can be taken into account * Background covariance (left panels) relative to the location marked by a cross and surrounding grid points and background variance (right panels). The upper (lower) panels corresponds to the case without (with) advection constrain. --- # Error variance estimation
* data distribution general uneven * the expected error of the climatology thus not constant * where we have more data, a smaller error is expected * for some application, it is peferable to mask areas where the expected error is high --- # Outlier
* outliers have generally quite different values than other data points in the vincinity * analysis represent the mean state smoothed over a certain length-scale * the residual is the difference between the observations and the analysis * outliers have often large residuals --- # Ways to use DIVAnd ![right450:](Fig/example_jupyterhub.png) * Open source: https://github.com/gher-ulg/DIVAnd.jl * Integration with __Jupyter notebooks__ (SeaDataCloud Virtual Research Environment) * DIVA is integrated in __Ocean Data View__ * __REST interface__ in development * Play with DIVAnd: http://data-assimilation.net/Tools/divand_demo/html/ --- # Example: Chlorophyll-a data product ![right450:](Fig/DIVA_chla.png) * 6-year running average (previously 10-year running average) analysis * Developed by AU-BIOS (Denmark), HCMR (Greece), Ifremer (France), NIMRD (Romania), SMHI (Sweden) * Only the interpolated field in the proximity of the observations is shown * Interpolated field on the full domain is available * Used/developped in projects: SeaDataNet/SeaDataCloud, EMODnet Chemistry, EMODnet Physics, EMODnet Biology * HPC application: HPC Phidias * Well suited for HPC environements (to statisfy CPU and memory resources) --- # DIVAnd and DIVA * DIVA: Fortran tool with shell scripts * DIVAnd: rewrite of DIVA in __Julia__ (DIVAnd.jl) * Julia: good trade-off between __efficiency__ of a compiled language and __flexibility__ of a dynamic language * Facilitate the installation: * Use __Jupyter notebooks__ fully configured environment for DIVAnd.jl * __Docker container__ allows one to easily replicate these environments * *Are you familiar with a programming language? If yes, which?* --- ![full:](Fig/julia-origin.svg) --- # Jupyter notebooks ![right300:](Fig/example_jupyterhub.png) * Integrated web environment * __Computing__ * Interactive * *Ju*lia, *Py*thon, *R*,... * __Visualization__ * __Documentation__ * High-quality type setting and equations (Latex) * Export to HTML and PDF (among others) * Easy to __share__, on e.g. nbviewer.jupyter.org and github.com * Facilitate __reproducibility__ and peer-review (of DIVA climatologies in particular) * Significant community around Jupyter notebooks * Also involvement of players outside of the scientific community (Google, Microsoft with Azure ML) * Jupyter notebooks: __single__ user --- # Jupyter architecture ![](Fig/jupyter.svg) --- # Jupyterhub architecture Jupyterhub: __multiple__ users ![jh:](Fig/jupyterhub.svg) --- # Jupyter lab ![jl:](Fig/jupyterlab.png) * Jupyter lab interface: more similar to an Integrated Development Environment (IDE) * Compatible with the same notebook format --- # Overview * Overview of main compenets in the Virtual research environement used during this workshop
--- # Jupterhub * __Docker containers__, preinstalled with Julia and various Julia packages: * Plotting library (PyPlot) and a more specialized library for ocean data * DIVAnd * ... * Julia packages are precompiled * Transfer files via __WebDAV__ in Julia: * Using explicit download and upload requests ```julia # download from NextCloud to Jupyter Hub get("file_in_nextcloud.nc","file_in_jupyterhub.nc") # upload from Jupyter Hub to NextCloud put("file_in_jupyterhub.nc","file_in_nextcloud.nc") ``` --- class: middle # Conclusions * DIVAnd and DIVA are open source and available under the GPL licences * New approach to generate DIVA climatologies using a cloud computing infrastructure * Template of jupyter notebooks are be provided which users can adapt * Improve the __consistency__ between product * Facilitate __reproducibility__ * Jupyter notebook is not a software specific to SeaDataCloud * Users might already be familiar with Jupyter notebooks * But if not, learning to work with Jupyter notebooks can also be useful in other contexts * Jupyter can easily installed on a local machine * Jupyterhub: * Docker allows to provide a __standardized computing environment__ to all users * The jupyter notebook can be used to fully __document the generation of the climatology__ --- class: middle # Organization * Only a subset of the notebooks will be covered * Notebooks (directory __work/DIVAnd-Workshop/Exercises/__) * Presentation notebooks (new notebook from scratch) * 02-Julia-introduction.ipynb * 06-topography.ipynb * 09-ODV-data-import.ipynb * 90-full-analysis.ipynb * Exercices for the Workshop are the folder __work/DIVAnd-Workshop/Exercises/__ * Solutions are in corresponding files in work/DIVAnd-Workshop/