Cambridge-ICCS · jackdfranklin · May 7, 2026 · Apr 1, 2026 · Apr 15, 2026 · Apr 15, 2026
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -0,0 +1,43 @@
+name: Build and deploy slides
+
+on:
+  pull_request:
+    branches: [ "main" ]
+  push:
+    branches: [ "main" ]
+
+  # Allows manual run
+  workflow_dispatch:
+
+jobs:
+  # Builds slides with quarto and deploys them to a branch
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Quarto
+        uses: quarto-dev/quarto-actions/setup@v2
+
+      - name: Render Quarto Project
+        run: |
+          cd src
+          quarto render slides.qmd
+          cd ../
+
+      - name: Test pages build
+        if: github.ref != 'refs/heads/main'
+        uses: JamesIves/github-pages-deploy-action@v4
+        with:
+          branch: test-pages
+          folder: src
+          dry-run: true
+
+      - name: Deploy pages for main
+        if: github.ref == 'refs/heads/main'
+        uses: JamesIves/github-pages-deploy-action@v4
+        with:
+          branch: gh-pages
+          folder: src
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+*.html
+src/slides_files
diff --git a/src/dependencies.qmd b/src/dependencies.qmd
@@ -0,0 +1,43 @@
+# Dependencies
+
+## Dependencies
+
+- All software has dependencies
+- Some are more obvious than others:
+  - Data/input
+  - Packages/libraries e.g. numpy, Eigen
+  - System libraries
+  - Compiler/Interpreter
+- If your code can't run without it, it's a dependency!
+
+## How to discover dependencies
+
+- Some dependencies may be "implicit"
+- For example, you may have a library installed on your system
+- Since the code "just works", you may not be aware of the dependency
+- To find these, try running on a different system (or multiple) and see what breaks
+
+## How to declare dependencies
+
+- List them in a tracked file in the repository
+  - e.g. add a "Dependencies" section to your README.md
+- Specify:
+  - Versions of each dependency e.g. numpy 2.3.9
+  - Where/how to aquire the dependency
+
+## Dependency metadata
+
+- There are automated ways of resolving dependencies
+- Usually language/tool specific
+- Some tools automatically update dependency metadata
+  - e.g. Rust's cargo, Julia's Pkg, uv for Python
+  - Project file: Depencies and compatible versions
+  - Lock file: Write exact version (plus other metadata e.g. source) of *every* 
+    dependency you are using
+  - Important to track both - lock files record the exact environment you use
+
+## System dependencies
+
+- Conda
+- Docker
+- Nix/Guix
diff --git a/src/documentation.qmd b/src/documentation.qmd
@@ -0,0 +1,30 @@
+# Documentation
+
+## Documentation
+
+- Not all information can be conveyed in code
+- We need to tell other people how to use our projects
+- And sometimes ourselves!
+- Documentation covers anything outside of the code/metadata
+
+## README
+
+- Markdown file at the project root
+- Should contain:
+  - Description of project
+  - Dependencies
+  - Instructions on building/running
+
+## Comments
+
+- Comments in code are also another form of documentation
+- Comments should:
+  - Explain *why* the code is doing something
+  - Give context that is external to the scope
+
+## Generating Docs
+
+- Use tools that generate docs from source code
+- Single source of truth
+- Comments/Docstrings embedded in code
+- Reduce separation between code and docs
diff --git a/src/fair_principles.qmd b/src/fair_principles.qmd
@@ -0,0 +1,24 @@
+# FAIR Principles
+
+---
+
+- Findable: Software, and it's metadata, are easy for humans and machines to
+  find.
+
+---
+
+- Accessible: Software, and it's metadata, are retrievable via standardised
+  protocols.
+
+---
+
+- Interoperable: Software interoperates with other software by exchanging
+  data and/or metadata, and/or through interaction via a application
+  programming interfaces (APIs), described through standards.
+
+---
+
+- Reusable: Software is both usable (can be executed) and reusable (can be
+  understood, modified, built upon, or incorporated into other software).
+
+See: https://www.nature.com/articles/s41597-022-01710-x
diff --git a/src/introduction.qmd b/src/introduction.qmd
@@ -0,0 +1,29 @@
+## What is reproducibility?
+
+For this course we will take the following definition:
+
+- *Reproducible*:
+  Performing the same analysis on the same data produces the same results
+
+## Why is reproducibility important?
+
+In the context of scientific computing/analysis, we want to be able to:
+
+- Verify our own results
+- Verify the results of others
+
+By making our work reproducible, we ensure that both these things are not just
+possible, but straightforward
+
+## Additional benefits
+
+- Safely implement changes
+- Can perform workflow on different inputs more easily
+- Simpler for new team members to get started
+- Better collaboration
+
+## Where do we go from here...
+
+Throughout the rest of this session, we will walk through the steps that we can
+take to go from an ad hoc collection of scripts into a reproducible scientific 
+workflow!
diff --git a/src/introduction_walkthrough.qmd b/src/introduction_walkthrough.qmd
@@ -0,0 +1,7 @@
+## A likely scenario
+
+- You have just joined a new research group as a Student/Researcher/PI.
+- The group use a custom pipeline/setup to perform their data analysis/simulations.
+- You try to get the setup working on your local system/a new hpc system and...
+    *It doesn't work!*
+
diff --git a/src/slides.qmd b/src/slides.qmd
@@ -0,0 +1,57 @@
+---
+title: Reproducibility in Scientific Computing
+
+format:
+  revealjs:
+    theme: night
+    logo: https://iccs.cam.ac.uk/sites/default/files/iccs_ucam_combined_reverse_colour.png
+
+authors:
+  - name: Jack Franklin
+  - name: Marion Weinzierl
+---
+
+{{< include introduction.qmd >}}
+
+{{< include version_control.qmd >}}
+
+{{< include dependencies.qmd >}}
+
+{{< include testing.qmd >}}
+
+{{< include documentation.qmd >}}
+
+{{< include fair_principles.qmd >}}
+
+# Conclusion/Outlook
+
+## Reproducibility is important
+
+Primary benefits:
+- Confidence in scientific results
+- Peer review/cross analysis
+
+Additional benefits:
+- Allows for code resuse
+- Better collaboration
+
+## Ingredients for reproducibility:
+
+- Version Control
+- Dependency Metadata
+- Public Accessibility
+
+## Even better if
+
+- Testing for:
+  - Verification
+  - Regression checks
+
+## Make it easy!
+
+- When starting from scratch, much easier to implement these as you go
+- For a large project:
+  - Add to VC
+  - Document dependencies
+  - Follow best practice for new code
+  - Implement small improvements whenever modifying
diff --git a/src/testing.qmd b/src/testing.qmd
@@ -0,0 +1,45 @@
+# Testing
+
+## Testing
+
+- Important to test code
+- Check that code does what it should
+- Test on inputs outside of the "normal" range
+- Verify that results of code do not change
+- Can also be used to check dependency changes
+
+## Unit tests
+
+- Test the smallest logical unit of the code
+- Ensure each component works as intended
+- Test functions for known results
+- Compare to previously produced results
+
+## Integration tests
+
+- Test that components work together
+- Try to have a range of complexity of tests
+- Can use previous results to validate model
+- Ensure no regression of results
+
+## Adding tests to a project
+
+- Often we inherit large projects with no unit tests
+- How do we improve test coverage in this case?
+
+## Adding tests to a project
+
+  1. Create integration tests - use previous results or create "golden outputs"
+  2. Identify and extract parts of the code which can be split apart
+  3. Create unit tests for the new functions
+  4. Run the integration tests to ensure results have not changed
+  5. Repeat 2-4 until all code has unit tests
+
+- Whenever you change a part of the code, try to use this method
+- Code coverage will slowly improve, with less extra work
+
+## Automating tests (CI etc)
+
+- Automate testing to ensure tests pass for every commit
+- Also useful for tests that can take a long time/need lots of resources
+- If hosting code on e.g. GitHub, GitLab etc, can use Continuous Integration (CI)
diff --git a/src/version_control.qmd b/src/version_control.qmd
@@ -0,0 +1,51 @@
+# Version Control
+
+## Version Control
+
+- The first thing we should do is move our project into version control (VC)
+- This way we never lose the original state of the project
+- We can then try things without worrying about breaking anything!
+- This will also benefit any later development, so the sooner the better
+
+## What to add to VC
+
+- DON'T do this:
+``` bash
+git add .
+```
+
+- Our repository should only contain:
+  - Code/scripts
+  - Documentation
+  - Metadata
+  - i.e. just text files
+
+There will be some exceptions to this rule, but for the vast majority of cases
+it will be true.
+
+## What to add to VC
+
+- Large datafiles should be hosted separately (e.g. on Zenodo)
+- External dependencies should be declared
+  - e.g. link to Zenodo dataset in docs and code
+- Use .gitignore to automatically ignore any unwanted files
+  - e.g. build outputs
+
+## Aside - testing with worktrees
+
+- git worktrees are like "local clones" of a repository
+- Create a worktree:
+``` bash
+git worktree add -b <new-branch-name> <path>
+```
+- Will make a new directory, with only files that are tracked
+- Can use as a cleanroom to ensure all dependencies are there
+- For more info: `git worktree add --help`
+
+## What to do next?
+
+- The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket)
+- This will make collaboration with other people a lot easier!
+- It will also mean that any work done can be accessed by collaborators
+
+