diff --git a/.gitignore b/.gitignore
index 69bd11ab..99a81a6b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,10 +13,11 @@ site_libs/
*.rmarkdown
*_cache/
+*_freeze/
*.html
-*.egg-info/
-**/*.quarto_ipynb
-**/*.ipynb_checkpoints
+renv
+*.code-workspace
+*.egg-info/
**/*.quarto_ipynb
diff --git a/README.md b/README.md
index f6fcd3a8..85248a4f 100644
--- a/README.md
+++ b/README.md
@@ -1,54 +1,137 @@
-[](#contributors-)
+[![All Contributors][ac_badge]](#contributors-)
+[ac_badge]: https://img.shields.io/badge/all_contributors-10-orange.svg?style=flat-square
+
# Data science team repo
-If you have any questions or need help, please contact anyone from [the Data Science team](https://the-strategy-unit.github.io/data_science/about.html).
+If you have any questions or need help, please contact anyone from
+[the Data Science team][about].
+
+[about]: https://the-strategy-unit.github.io/data_science/about.html
This repo features:
* Presentations the team have delivered
-* A website/ blog that the data science team can use to blog and compile other forms of resources
+* A website/ blog that the data science team can use to blog and compile other
+ forms of resources
* Guidance on good coding practices, also known as a "style guide"
## Contributing to this repo
-All members of the Strategy Unit organisation on Github should be able to contribute to this repository.
+All members of the Strategy Unit organisation on Github should be able to
+contribute to this repository.
1. Create an issue for the thing you want to add on GitHub
-2. Clone the repository (in RStudio, File > New Project > Checkout a project from a version controlled repository). Paste in the URL of this repository.
-3. Check out the main branch and check it's up to date in the RStudio Terminal (type `git checkout main && git pull` in terminal)
+2. Clone the repository (in RStudio, File > New Project > Checkout a project
+ from a version controlled repository). Paste in the URL of this repository.
+3. Check out the main branch and check it's up to date in the RStudio Terminal
+ (type `git checkout main && git pull` in terminal)
4. Check out a new branch (`git checkout -b issue-number` in terminal)
+5. Follow instructions in the {renv} section.
-### How to create a new presentation
+### {renv}
+
+Please note this project uses {renv}.
+This is a way of managing the different packages that are required for each
+blog post and presentation.
-1. Make the presentation with quarto, and put it in `presentations/` in a `YYYY-MM-DD_Talk-title` folder. Your presentation should conform to the SU branding. It should have the filename `index.qmd`
-2. Copy and edit the yaml header from another post to ensure you have the correct metadata (e.g. theming, author, date)
+1. Install the R toolchain [RTools][rtools] to compile some of the packages.
+2. Install {renv} (`install.packages("renv")` in Console)
+3. Run `renv::restore()` in Console to install all the required packages, and
+ ensure that you are on the version of R detailed in the `renv.lock` file in
+ the parent directory.
+[rtools]: https://cran.r-project.org/bin/windows/Rtools/
-### How to create a new blogpost
+There is a separate renv lockfile for each blogpost/presentation, so it will be
+difficult to render the whole website at once.
+
+To work on an already published blogpost/presentation:
+
+1. Run `renv::use(lockfile="/path/to/page/renv.lock")` in Console
+2. Edit the .qmd file that you are working on.
+ To preview your changes, run `quarto preview path/to/page.qmd` in terminal.
+
+### How to create a new presentation
+
+1. Make the presentation with quarto, and put it in `presentations/` in a
+ `YYYY-MM-DD_Talk-title` folder.
+ Your presentation should conform to the SU branding.
+ It should have the filename `index.qmd`
+2. If your presentation requires any specific packages, capture them with
+ `renv::snapshot("PATH_TO_THE_FOLDER")` and ensure that you include at the top
+ of your .qmd file (after the yaml block) the following code chunk:
+
+````
+```{r lockfile}
+#| include: FALSE
+renv::use(lockfile = "renv.lock")
+```
+````
+
+### How to create a new blogpost
1. Navigate to the `blogs/posts` folder
-2. Create a folder for your blogpost, following the naming convention `YYYY-MM-DD_title-of-post`
-3. Copy a previous blogpost index.qmd file into your folder and use that as your template
-4. Write your blogpost. To preview changes, run `quarto preview path/to/index.qmd` in terminal.
+2. Create a folder for your blogpost, following the naming convention
+ `YYYY-MM-DD_title-of-post`
+3. Copy a previous blogpost index.qmd file into your folder and use that as your
+ template
+4. Write your blogpost. To preview changes, run `quarto preview path/to/index.qmd`
+ in terminal.
+5. If your blogpost requires any specific packages, capture them with
+ `renv::snapshot("PATH_TO_THE_FOLDER")` and ensure that you include at the top
+ of your .qmd file (after the yaml block) the following code chunk:
+
+````
+```{r lockfile}
+#| include: FALSE
+renv::use(lockfile = "renv.lock")
+```
+````
### How to create/edit pages on the website
-1. Find the `.qmd` file that you wish to edit. For example, if you want to add to the Style Guide page, edit the `style/style_guide.qmd` file.
-2. If creating a new page, copy an existing `.qmd` file and use that as a template.
-3. If you want to see how it looks before pushing to GitHub, click Render in RStudio. The HTML version of your new post should open in your browser.
+1. Find the `.qmd` file that you wish to edit.
+ For example, if you want to add to the Style Guide page, edit the
+ `style/style_guide.qmd` file.
+2. If creating a new page, copy an existing `.qmd` file and use that as a
+ template.
+3. If you want to see how it looks before pushing to GitHub, click "Render" in
+ RStudio.
+ The HTML version of your new post should open in your browser.
### Pushing your blog post/presentation/page to GitHub
-1. Save your file, then add and commit it (`git add file.qmd` and `git commit -m "Add blog post/presentation about x"`). If you have any computed blocks in your content, ensure that you have run the code locally; this should generate files in the `_freeze` directory. You must ensure that these files are added to version control.
-2. Push your content to your branch in GitHub (`git push origin branchname`).
-3. Then, on GitHub, make a pull request to main. Put any member of the Data Science team down as a reviewer. Link your pull request with your issue by typing `Closes #issuenumber` in the comment field of your pull request.
-4. When approved and merged to main, the Quarto page will automatically be rendered thanks to the GitHub action that has been set up.
+1. Save your file, then add and commit it (`git add file.qmd` and
+ `git commit -m "Add blog post/presentation about x"`)
+2. Push your content to your branch in GitHub (`git push origin branchname`).
+3. Then, on GitHub, make a pull request to `main`.
+ Put any member of the Data Science team down as a reviewer.
+ Link your pull request with your issue by typing `Closes #issuenumber` in the
+ comment field of your pull request.
+4. When approved and merged to main, the Quarto page will automatically be
+ rendered thanks to the GitHub action that has been set up.
+
+### Potential issues
+
+The GitHub action runner does not have R installed on it, so as mentioned above, all computations must be run locally and then added to the _freeze folder.
+
+#### code-fold blocks
+
+If you have an `R` code block that has `#| code-fold: true`, then this can cause issues as quarto will need to run `R` with `{rmarkdown}` and `{knitr}` even when you have freeze'd the computations. You can get around this by using code-fold across the entire post (via the documents `format: html` options), or by doing something like:
+
+```
+
+Your code block title
+
+[your code chunk here]
+
+```
### Potential issues
@@ -68,7 +151,9 @@ If you have an `R` code block that has `#| code-fold: true`, then this can cause
# Contributors ✨
-Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+Thanks goes to these wonderful people ([emoji key][key]):
+
+[key]: https://allcontributors.org/docs/en/emoji-key
@@ -100,4 +185,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
-This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
+This project follows the [all-contributors][ac_spec] specification.
+Contributions of any kind are welcome!
+
+[ac_spec]: https://github.com/all-contributors/all-contributors
diff --git a/_freeze/blogs/posts/2024-05-13_one-year-coffee-code/index/execute-results/html.json b/_freeze/blogs/posts/2024-05-13_one-year-coffee-code/index/execute-results/html.json
new file mode 100644
index 00000000..26f1b3c5
--- /dev/null
+++ b/_freeze/blogs/posts/2024-05-13_one-year-coffee-code/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "f110de93ff9c1ac607ec6f6fb82fcce9",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"One year of coffee & coding\"\nauthor:\n - Rhian Davies\ndate: \"2024-05-13\"\ncategories: [learning]\nreference-location: margin\ncitation-location: margin\n---\n\n\n\n\nThe data science team have been running coffee & coding sessions for just over a year now. When I joined that Strategy Unit, I was really pleased to see these sessions running as I think making time to discuss and share technical knowledge is highly valuable, especially as an organisation grows. \n\nCoffee and coding sessions run every two weeks and usually take the form of a short presentation, followed by a discussion. Although we have had a variety of different sessions including live coding demos and show and tell for projects.\n\nWe figured it would be a good idea to do a quick survey of attendees to make sure that the sessions were beneficial and see if there were any suggestions for future sessions. We had 11 responses, all of which were really positive, with 90% agreeing that the sessions are interesting, and over 80% saying that they learn new things. Respondents said that the sessions were well varied across the technical spectrum and that they \"almost always learn something useful\".\n\nThe two main themes of the results were that sessions were _inclusive_ and _sparked collaboration._ ✨\n\n> I like that everyone can contribute\n\n> It's great seeing what else people are doing\n\n> I get more ideas for future projects\n\nSome of the main suggestions included more content for newer programmers and encouraging the wider analytical team to share real project examples. \n\nSo with that, why not consider presenting? The sessions are informal and everyone is welcome to contribute. If you've got something to share, please let a member of the data science team know.\n\nAs a reminder, materials for our previous sessions are available under [Presentations](https://the-strategy-unit.github.io/data_science/presentations/).\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/blogs/posts/2024-11-12_coffee-coding-github-planner/index/execute-results/html.json b/_freeze/blogs/posts/2024-11-12_coffee-coding-github-planner/index/execute-results/html.json
new file mode 100644
index 00000000..45951c64
--- /dev/null
+++ b/_freeze/blogs/posts/2024-11-12_coffee-coding-github-planner/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "d4d9d86a461207456c3b3f6e99fa1072",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Using GitHub to plan and organise Coffee & Coding\"\nauthor: \"YiWen Hon\"\ndate: \"2024-11-12\"\ncategories: [GitHub, learning]\n---\n\n\n\n\n## Coffee & Coding\n\nCoffee & Coding is a fortnightly hour-long session organised by the Data Science team, open to all members of the Strategy Unit with an interest in coding. It's been [well received](../2024-05-13_one-year-coffee-code/index.qmd) and is a valued source of professional development and general geekery in the team.\n\nWe've been experimenting with using [GitHub](https://github.com/) as an organisational tool for our team's work, and are testing the same approach for Coffee & Coding sessions as well. Previously, future Coffee & Coding sessions were haphazardly listed in a Google Doc that was only accessible to members of the Data Science team, and we wanted a more open approach. We also didn't have a good record of topics that were previously covered.\n\nYou'll need a GitHub account to enjoy the full functionality of the planner. If you need help setting this up, get in touch with any member of the Data Science team.\n\nAny feedback on this new system for organising and planning Coffee & Coding is very welcome! Hope you enjoy using it.\n\n## Viewing upcoming sessions\n\nWe have created [a fully open GitHub project for tracking Coffee & Coding sessions](https://github.com/orgs/The-Strategy-Unit/projects/14/views/1). Any sessions with scheduled dates can be seen in the \"Upcoming sessions\" view. Clicking on a session title brings up more information, including a brief overview of the session and the people running it. Users with GitHub accounts can make comments or post emoji reactions.\n\n{fig-alt=\"A short clip showing a person clicking on an upcoming session title. A pop up box appears with more information\"}\n\n## Adding session ideas\n\nTo add a session idea:\n\n1. [Create a new issue](https://github.com/The-Strategy-Unit/data_science/issues/new?template=Blank+issue) on the [data_science repository](https://github.com/The-Strategy-Unit/data_science). Provide a useful title and description for the session.\n2. Give your new issue the label C&C☕\n3. If you would like to run or contribute to the session, assign yourself to it.\n4. Click \"Create\" to save your session idea as a GitHub issue. You should then be able to see it listed as a \"Potential session\" on the planner, and others will be able to view, vote for, and comment on your session idea.\n\n{fig-alt=\"A short clip showing a person creating a new session idea as a GitHub issue, and giving it a title, description, and label\"}\n\n## Voting for session ideas \n\nWe will use thumbs up (👍) emoji reactions to suggested sessions as a voting system to help us with planning and scheduling.\n\nIf you see any potential sessions that you are interested in, react to them with a thumbs up emoji. You can see all planned sessions, in order of votes received, [listed here](https://github.com/The-Strategy-Unit/data_science/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22C%26C%20%E2%98%95%22%20sort%3Areactions-%2B1-desc).\n\n{fig-alt=\"A short clip showing a person reacting to a GitHub issue with a thumbs up emoji\"}\n\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/blogs/posts/2024-11-29-mapping-my-r-learning/index/execute-results/html.json b/_freeze/blogs/posts/2024-11-29-mapping-my-r-learning/index/execute-results/html.json
new file mode 100644
index 00000000..293b105e
--- /dev/null
+++ b/_freeze/blogs/posts/2024-11-29-mapping-my-r-learning/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "9df55ebf84ff3c91d6b6095676631027",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Mapping my R journey so far: ten things that I have done along the way\"\nauthor: \"Sheila Ali\"\ndate: \"2025-03-10\"\ncategories: [learning]\n---\n\n\n\n\nThis blog post follows up from a talk I gave last year at coffee and coding about my experiences of learning how to code using Rstudio. Here I build on that talk to share some more reflections and advice for others who are starting out on their R learning journey.\n\n1. **I faced up to my fears**\n\nI have tried to learn R a few times over several years, with mixed success. When I first tried learning it a few years ago, I only managed to learn some basics. The second time, I was going through a crisis of confidence about my ability, and so when I had difficulties with learning R, I thought it was more evidence to show that I couldn't do it. I tried again, and got to the stage of making a plot with some of the data that was included with Rstudio. Soon after that I got swept up in the demands of everyday life, and gradually my work moved away from the world of quantitative data into qualitative research, and I had fewer opportunities to use R. Still, in the back of my mind I had this strange feeling of both wanting to avoid R, but also wondering what it would have been like if I had persisted with learning it.\n\nA couple of years later, when I started my current job, I heard about the NHS-R community, and felt encouraged to learn R again. I tried to join my colleagues who were participating in [Advent of Code](https://adventofcode.com/2024/about). But I couldn't understand a lot of what was going on, and when I tried to participate in some of the exercises, I immediately hit some hurdles with the basics, which was discouraging.\n\nIt seemed important to try and change my approach, so that learning R didn't seem so daunting. I came across the [aRtsy](https://cran.r-project.org/web/packages/aRtsy/readme/README.html) package and was amazed by the colourful and intricate artwork that it could produce. But better still, all of the code was open-source. I experimented with the code, making very small changes to see what kind of images it would create.\n\n package and the [canvas_nebula](https://koenderks.github.io/aRtsy/#nebula) function](Nebula.jpg)\n\nI also discovered colour palettes such as those in the [wesanderson](https://github.com/karthik/wesanderson) package, and tried experimenting with those along with the generative art functions. I soon found that my fear of R was quickly replaced by a geeky fascination with all of the beautiful artwork that could be created with only a few lines of code. It felt like a low-stakes situation, because the worst that could happen was that the code wouldn't work. Suddenly, the process of coding felt less intimidating, and it had opened up a wealth of possibilities[^1].\n\n[^1]: If this topic is of interest, I would recommend getting involved in the [Tidy Tuesday community activity](https://github.com/rfordatascience/tidytuesday) and also having a look at [Nicola Rennie's data visualisations](https://nrennie.rbind.io/tidytuesday-shiny-app/).\n\n2. **I found a supportive community**\n\nThe great thing about R is that it is free and open source. I believe this lends itself well to a culture of shared learning. When I joined the SU's [Coffee and Coding](https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-13_one-year-coffee-code/) sessions and [NHS-R community's Coffee and Code](https://nhsrway.nhsrcommunity.com/community-handbook.html#coffee-and-coding), I felt like a child asking very silly questions, but to my surprise, all of the people I have met have been keen to answer my questions. I learned to recognise and value the people in those communities who would encourage me and fellow learners by making time to answer our questions and help us learn.\n\n3. **I approached learning R like I would approach learning any other language**\n\nThis meant learning some of the key words and phrases, and getting exposure to the language in various ways: reading learning materials, watching tutorials, and spending time with people who were using it, and writing my own code. This had an incremental effect and over time, the more information I absorbed, the more familiar I became with the terminology.\n\n4. **I set myself a goal and structured my learning to help me reach it**\n\nIn my day job, I was working on a qualitative case study and wanted to illustrate my findings using geospatial and population density data in the form of a choropleth map. Unfortunately this was one of the most challenging tasks I could have chosen as an R novice, but luckily, I had kind mentors who both believed I could achieve the task and were also on hand to help me learn the skills I needed. So I set myself the goal of trying to learn how to create a choropleth map by the end of the year. This involved breaking the task down into steps, and learning skills which I could build on along the way. I celebrated my small wins, even the tiny ones, until I achieved the goals I set for myself.\n\n5. **I figured out how I learn best**\n\nThis involved watching tutorials on YouTube, working through books (such as [R for Data Science](https://r4ds.hadley.nz/) and [R for non-programmers](https://r4np.com/), trying out online coding courses, using search engines and forums, and asking my colleagues and mentors for advice about what resources I should look at as well as what to avoid.\n\nAlthough learning resources were plentiful, I faced some common barriers when trying to use them. Often tutorials were not always written in a way that I could reproduce the code or access the data they cited, or were written in very technical language, which meant that I had to go away and learn some key concepts to be able to understand them properly. Therefore an important part of the learning journey for me has been to gradually build up a vocabulary of words and concepts in Rstudio. This has enabled me to better understand what key concepts I need to learn, and to understand the content of any training materials or tutorials. I realised that chipping away at it, spending an hour here and there, several times a week, was the best approach for me specifically, with some bigger blocks of time set aside occasionally for more difficult tasks where I could just spend a couple of hours trying out different things or understanding the problem in more depth.\n\n6. **I applied what I was learning to real data**\n\nWhen I became more confident with trying out some packages and functions in R, I decided to find opportunities to apply my learning to real data. I practiced using the inbuilt datasets in Rstudio, the palmerpenguins dataset, and the datasets that were referred to in the books and learning resources I was using. For creating my choropleth maps, I then used data from the UK [Census](https://www.ons.gov.uk/census) as well as geographical data about local authority geographical boundaries. Applying my learning to real data was an essential step in learning some of the key data wrangling skills.\n\n7. **I embraced failure and started using it as a tool for learning**\n\nOver time I understood that failure is part of the learning journey, and a helpful tool for the learning process itself. If I could figure out what didn't work, that often gave me information about what had gone wrong. This was useful as it either pointed me towards what I needed to fix, or gave me the words and concepts I could look into to help me solve the problem. Sometimes the process of trying to learn different functions accidentally produced hilariously terrible results[^2]\n\n[^2]: This amused me greatly, as a fan of the [Terrible Maps social media pages](https://www.instagram.com/terriblemap/p/DCh2NhfB2JX/).\n\nAs well as providing some humour to contrast with the often frustrating process of learning to code, these failures also helped me to get unstuck. More often than not, they were a catalyst for problem-solving as they provided useful information about what specific aspect of the code had gone wrong, which would give me a clue about what I needed to look into to fix the problem.\n\n{fig-align=\"center\"}\n\n8. **I looked for inspiration to encourage me to keep going**\n\nOne of my worries about trying to learn R was that learning new things took more time, now I was years older than the last time I tried. But I was fairly confident that there must have been other people out there who had successfully learned how to code when they were my age or older. This led to a fascinating rabbit hole of learning about people who had successfully learned to code later in their life and the hidden history of [women in coding](https://www.codecademy.com/resources/blog/eniac-six-women-programmed-computer/). I bookmarked these stories so that I could revisit them on the days where I was having a difficult time understanding a particular concept or getting my code to work.\n\n9. **I *made it sew***\n\nThroughout my R learning journey, I have found that coding has been a useful conduit for my creativity, and similarly, my creative projects outside of work have been a catalyst for learning some key concepts related to coding[^3].\n\n[^3]: This has also worked the other way around, with my R learning journey helping me with learning new crafts. I have recently begun learning sewing and dressmaking. I have quickly found that the learning journey is just as intimidating, meticulous and complicated as it was for learning R. I have also unintentionally chosen a very complicated project for a beginner, which has resulted in a very steep learning curve and lots of failures and mistakes along the way. Throughout the process, I have applied some of the same principles as I did for learning coding. For example, one of the key parts of my journey of learning sewing and dressmaking has been the process of embracing and learning from failure. This has been essential both in terms of knowing what not to do next time, but also to learn how to fix mistakes, ideally early on in a practice situation (e.g. when creating a mock-up). Luckily there is a large community of supportive fellow learners and patient mentors, who are keen to help with fixing mistakes and to pass on their knowledge to new learners. I’m pleased to say, with a lot of help (and many failures) along the way, I did eventually manage to produce three choropleth maps and submitted them with the report late last year.\n\nI realised this a few months ago when my friend got me a beginner's embroidery kit, and as I followed the pattern and learned how to create the different types of embroidery stitch, I reflected that just like with the embroidery pattern I was working on, I needed to structure the coding for the map in [layers](https://ggplot2.tidyverse.org/reference/layer_geoms.html#:~:text=In%20ggplot2%2C%20a%20plot%20in,displayed%2C%20not%20what%20is%20displayed.). This led me to approach the process like I would for an art project[^4] to identify what I needed to do to adequately visualise both types of data that I wanted to include in the map.\n\n[^4]: Throughout the journey I have realised that thinking about the problem like an artist has been very helpful, because it allows me to use a similarly iterative approach. I wanted my choropleth maps to show both the population density and the underlying terrain when superimposed. To do this, I used the [colorbrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) tool to test out different colour palettes, and changed the opacity and terrain to identify which colours would clearly to show the population data and the terrain underneath. The tool let me test this on an example map and showed me the hexadecimal colour codes for the colours in the palettes. Once I had found some combinations that would likely work for my particular map, I then iteratively adjusted the aesthetics in my R code until I found a combination that worked for my data. \n\n{fig-alt=\"Textile art piece showing a map with the letter R - for decorative purposes only\" fig-align=\"center\" width=\"384\"}\n\n10. **I started learning about how to stay involved in the community**\n\nAs I write this, it has been over a year since I re-started my R learning journey in earnest. Early on in the journey, I remember feeling overwhelmed by the kindness and helpfulness of the community. I decided to channel these feelings into learning as best I could, so that I could then pass the learning on. I was reminded of this when I attended the most recent [RPYSOC conference](https://nhsrcommunity.com/conference24.html) where I once again experienced the warm sense of collaboration and community in NHS-R and NHS.pycom. Therefore my aim for 2025 and beyond is to continue my R learning journey (and become more familiar with GitHub), so that I can give back to the wonderful communities that helped me to find my way.\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-02-01_what-is-data-science/index/execute-results/html.json b/_freeze/presentations/2023-02-01_what-is-data-science/index/execute-results/html.json
new file mode 100644
index 00000000..ec469248
--- /dev/null
+++ b/_freeze/presentations/2023-02-01_what-is-data-science/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "1f49c60f0198f354a7603787001870e0",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: Everything you ever wanted to know about data science\nsubtitle: but were too afraid to ask\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-08-02\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## What is data science?\n\n* \"A data scientist knows more about computer science than the average statistician, and more about statistics than the average computer scientist\"\n\n## Drew Conway's famous Venn diagram\n\n\n\n[Source](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)\n\n## Around the web...\n\n:::: {.columns}\n\n::: {.column width=\"50%\"}\n\n* [The difference between a statitician and a data scientist? About $30,000](https://www.reddit.com/r/datascience/comments/ndpft6/comment/gyccuo7/?utm_source=share&utm_medium=web2x&context=3)\n* [... an actual definition of data science. Taking a database and making it do something else.](https://chrisbeeley.net/?p=1495) (warning: this quote is me! :wink:)\n* Statistics done on a Mac\n\n:::\n\n::: {.column width=\"50%\"}\n\n\n\n:::\n\n::::\n\n## What are the skills of data science?\n\n* Analysis\n * ML\n * Stats\n * Data viz\n* Software engineering\n * Programming\n * SQL/ data\n * DevOps\n * RAP\n\n## What are the skills of data science?\n \n* Domain knowledge\n * Communication\n * Problem formulation\n * Dashboards and reports\n \n## ML\n\n\n\n[Source](http://www.nlcssteam.com/blog/machine-learning)\n\n## Inevitable XKCD\n\n:::: {.columns}\n\n::: {.column width=\"50%\"}\n\n\n\n[Source](https://xkcd.com/1838/)\n\n:::\n\n::: {.column width=\"50%\"}\n\n* [Google flu trends](https://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/)\n\n:::\n\n::::\n\n\n## Stats and data viz\n\n* ML leans a bit more towards atheoretical prediction\n* Stats leans a bit more towards inference (but they both do both)\n* Data scientists may use different visualisations\n * Interactive web based tools\n * Dashboard based visualisers e.g. [{stminsights}](https://github.com/cschwem2er/stminsights)\n \n## Software engineering\n\n* Programming\n * No/ low code data science?\n* SQL/ data\n * Tend to use reproducible automated processes\n* DevOps\n * Plan, code, build, test, release, deploy, operate, monitor\n* RAP\n * I will come back to this\n\n## Domain knowledge\n\n* Do stuff that matters\n * The best minds of my generation are thinking about how to make people click ads. That sucks. [Jeffrey Hammerbacher](https://www.fastcompany.com/3008436/why-data-god-jeffrey-hammerbacher-left-facebook-found-cloudera)\n* Convince other people that it matters\n* This is the hardest part of data science\n* Communicate, communicate, communicate!\n* Many of you are expert at this\n\n## Reproducibility\n\n* Reproducibility in science\n* The [$6B spreadsheet error](https://baselinescenario.com/2013/02/09/the-importance-of-excel/)\n* [George Osbourne's austerity was based on a spreadsheet error](https://www.theguardian.com/politics/2013/apr/18/uncovered-error-george-osborne-austerity)\n* For us, reproducibility also means we can do the same analysis 50 times in one minute\n * Which is why I started down the road of data science\n \n## What is RAP\n\n* a process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\n* RAP should be:\n\n> the core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n:::{.footer}\n[Goldacre review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis)\n:::\n\n## Levels of RAP- Baseline\n\n* Data produced by code in an open-source language (e.g., Python, R, SQL).\n* Code is version controlled (see Git basics and using Git collaboratively guides).\n* Repository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\n* Code has been peer reviewed.\n* Code is published in the open and linked to & from accompanying publication (if relevant).\n\n:::{.footer}\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n:::\n\n## Levels of RAP- Silver\n\n* Code is well-documented...\n* Code is well-organised following standard directory format\n* Reusable functions and/or classes are used where appropriate\n* Pipeline includes a testing framework\n* Repository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\n* Data is handled and output in a [Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) format\n\n:::{.footer}\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n:::\n## Levels of RAP- Gold\n\n* Code is fully packaged\n* Repository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\n* Process runs based on event-based triggers (e.g., new data in database) or on a schedule\n* Changes to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n## The data science \"Unicorn\"\n\n* The maybe-mythical data science \"Unicorn\" has mastered:\n * Domain knowledge\n * Stats and ML\n * Software engineering\n \n## Data science is a team sport\n\n* In my extended DS team I have:\n* Stats and DevOps (and rabble rousing) [this one is me :wink:]\n* SQL, data, and training\n* DevOps and programming\n* Text mining, Python, and APIs\n* Bilingual R/ Python, Shiny dashboards\n\n## Data science is an MMO\n\n* Data scientists need help with:\n * Stakeholder communication and engagement\n * Qualitative analysis\n * Translating models and prediction into the real world\n * Evidence review and problem definition\n\n## Data science is an MMO\n\n* Data scientists are an excellent help when you:\n * Need a lot of pretty graphs\n * Need the same analysis done 50+ times with different data\n * Have too much text and not enough time to analyse it\n * Want to carefully document your analysis and make it reproducible\n * Have a hideously messy, large dataset that you can't hack together yourself\n \n## The team\n\n* We will be organising code review and pair coding sessions\n* We will be running coffee and coding sessions\n* We can be relied on to get very excited about thorny data problems, especially if they involve:\n * Drawing pretty graphs\n * NHS-R and other communities and events\n * Spending long hours in a bunker writing open source code\n * Processing text\n * Documenting and version controlling analyses\n\n## Note\n\nAll copyrighted material is reused under [Fair Dealing](https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing)\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-02-23_coffee-and-coding/index/execute-results/html.json b/_freeze/presentations/2023-02-23_coffee-and-coding/index/execute-results/html.json
new file mode 100644
index 00000000..cea0bdce
--- /dev/null
+++ b/_freeze/presentations/2023-02-23_coffee-and-coding/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "0f874769ceba05f26d0461e9d465e78a",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: Coffee and coding\nsubtitle: Intro session\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-02-23\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## Welcome to coffee and coding\n\n* Project demos, showcasing work from a particular project \n* Method demos, showcasing how to use a particular method/tool/package \n* Surgery and problem solving sessions\n* Defining code standards and SOP\n\n## What are we trying to achieve?\n\n* Legibility\n* Reproducibility\n* Accuracy\n* Laziness\n\n## What are some of the fundamental principles?\n\n* Predictability, reducing mental load, and reducing truck factor\n* Making it easy to collaborate with yourself and others on different computers, in the cloud, in six months' time...\n* DRY\n\n## What is RAP\n\n* a process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\n* RAP should be:\n\n> the core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n[Goldacre review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis)\n\n## The road to RAP\n\n* We're roughly using NHS Digital's RAP stages\n* There is an incredibly large amount to learn!\n* Confession time! (everything I do not know...)\n* You don't need to do it all at once\n* You don't need to do it all at all ever\n* Each thing you learn will incrementally help you\n* Remember- that's why we learnt this stuff. Because it helped us. And it can help you too\n\n## Levels of RAP- Baseline\n\n* Data produced by code in an open-source language (e.g., Python, R, SQL).\n* Code is version controlled (see Git basics and using Git collaboratively guides).\n* Repository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\n* Code has been peer reviewed.\n* Code is published in the open and linked to & from accompanying publication (if relevant).\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n## Levels of RAP- Silver\n\n* Code is well-documented...\n* Code is well-organised following standard directory format\n* Reusable functions and/or classes are used where appropriate\n* Pipeline includes a testing framework\n* Repository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\n* Data is handled and output in a [Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) format\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n## Levels of RAP- Gold\n\n* Code is fully packaged\n* Repository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\n* Process runs based on event-based triggers (e.g., new data in database) or on a schedule\n* Changes to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n\n## A learning journey to get us there\n\n* Code style, organising your files\n* Functions and iteration\n* Git and GitHub\n* Packaging your code\n* Testing\n* Package management and versioning\n\n## How we can help each other get there\n\n* Work as a team!\n* Coffee and coding!\n* Ask for help!\n* Do pair coding!\n* Get your code reviewed!\n* Join the NHS-R/ NHSPycom communities\n\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-03-09_midlands-analyst-rap/index/execute-results/html.json b/_freeze/presentations/2023-03-09_midlands-analyst-rap/index/execute-results/html.json
new file mode 100644
index 00000000..e8e1b884
--- /dev/null
+++ b/_freeze/presentations/2023-03-09_midlands-analyst-rap/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "50cdc4f7ea1b7e3900f63feeab13a16e",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: RAP\nsubtitle: what is it and how can my team start using it effectively?\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-03-09\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## What is RAP\n\n* a process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\n* RAP should be:\n\n> the core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n[Goldacre review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis)\n\n## What are we trying to achieve?\n\n* Legibility\n* Reproducibility\n* Accuracy\n* Laziness\n\n## What are some of the fundamental principles?\n\n* Predictability, reducing mental load, and reducing truck factor\n* Making it easy to collaborate with yourself and others on different computers, in the cloud, in six months' time...\n* DRY\n\n## The road to RAP\n\n* We're roughly using NHS Digital's RAP stages\n* There is an incredibly large amount to learn!\n* Confession time! (everything I do not know...)\n* You don't need to do it all at once\n* You don't need to do it all at all ever\n* Each thing you learn will incrementally help you\n* Remember- that's why we learnt this stuff. Because it helped us. And it can help you too\n\n## Levels of RAP- Baseline\n\n* Data produced by code in an open-source language (e.g., Python, R, SQL).\n* Code is version controlled (see Git basics and using Git collaboratively guides).\n* Repository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\n* Code has been peer reviewed.\n* Code is published in the open and linked to & from accompanying publication (if relevant).\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n## Levels of RAP- Silver\n\n* Code is well-documented...\n* Code is well-organised following standard directory format\n* Reusable functions and/or classes are used where appropriate\n* Pipeline includes a testing framework\n* Repository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\n* Data is handled and output in a [Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) format\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n## Levels of RAP- Gold\n\n* Code is fully packaged\n* Repository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\n* Process runs based on event-based triggers (e.g., new data in database) or on a schedule\n* Changes to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n\n\n## A learning journey to get you there\n\n* Code style, organising your files\n* Functions and iteration\n* Git and GitHub\n* Packaging your code\n* Testing\n* Package management and versioning\n\n## How we can help each other get there\n\n* Work as a team!\n* Coffee and coding!\n* Ask for help!\n* Do pair coding!\n* Get your code reviewed!\n* Join the NHS-R/ NHSPycom communities\n\n## HACA\n\n* The first national analytics conference for health and care\n* Insight to action!\n* July 11th and 12th, University of Birmingham\n* Accepting abstracts for short and long talks and posters\n* Abstract deadline 27th March\n* Help is available (with abstract, poster, preparing presentation...)!\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-03-23_collaborative-working/index/execute-results/html.json b/_freeze/presentations/2023-03-23_collaborative-working/index/execute-results/html.json
new file mode 100644
index 00000000..7f01e66f
--- /dev/null
+++ b/_freeze/presentations/2023-03-23_collaborative-working/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "079480e23c4477f75a48ee14704b94a3",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Collaborative working\"\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-03-23\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## Introduction\n\n* This is definitely an art and not a science\n* I do not claim to have all, or even most of, the answers\n* How you use these tools is way more important than the tools themselves\n* This is a culture and not a technique\n\n## Costs\n\n* Delay and time\n* Stress and disagreement\n* Committee thinking\n* Learning and effort\n\n## Benefits\n\n* \"From each according to their ability\"\n* Learning\n* Reproducibility and reduced truck factor\n* Fun!\n\n## GitHub as an organising principle behind work\n\n* A project is just a set of milestones\n* A milestone is just a set of issues\n* An issue is just a set of commits\n* A commit is just text added and removed\n\n## The repo owner\n\n* Review milestones\n* Review issues\n * Discuss the issue on the issue- NOT on email!\n* Review pull requests and get your pull requests reviewed!\n\n## Asynchronous communication\n\n* Involve others *before* you pull request\n* Involve others *when* you pull request\n* Read issues!\n* Comment on issues!\n* File issues- suggestions/ bug reports/ questions\n * NOT in emails\n\n## Asynchronous work\n\n* Every piece of work has an issues associated with it\n* Every piece of work associated with an issue lives on its own branch\n* Every branch is incorporated to the main repo by a pull request\n* Every pull request is reviewed\n\n## Iteration and documentation\n\n* Analyse early, analyse often (using RAPs!)\n* Write down what you did\n* Write down what you did but then changed your mind about\n* Favour Quarto/ RMarkdown\n * Clean sessions\n * Documentation and graphics\n\n## Data and .gitignore\n\n* Your repo needs to be reproducible but also needs to be safe\n* The main branch should be reproducible by anyone at any time\n * Document package dependencies (using renv)\n * Document data loads if the data isn't in the repo\n\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-05-15_text-mining/index/execute-results/html.json b/_freeze/presentations/2023-05-15_text-mining/index/execute-results/html.json
new file mode 100644
index 00000000..8286685c
--- /dev/null
+++ b/_freeze/presentations/2023-05-15_text-mining/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "9c8dbc2c89696f1394402c4e580d33fd",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Text mining of patient experience data\"\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-05-15\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## Patient experience\n\n* The NHS collects a lot of patient experience data\n* Rate the service 1-5 (Very poor... Excellent) but also give written feedback\n * \"Parking was difficult\"\n * \"Doctor was rude\"\n * \"You saved my life\"\n* Many organisations lack the staffing to read all of the feedback in a systematic way\n\n## Text mining\n\n* We have built an algorithm to read it\n * Theme\n * \"Criticality\"\n* Fits alongside other work happening within NHSE\n * A framework for understanding patient experience\n\n## Patient experience 101\n\n* Tick box scoring is not useful (or accurate)\n* Text based data is _complex_ and built on _human experience_\n* We're not making word clouds!\n* We're not classifying movie reviews or Reddit posts\n* The tool should enhance, not replace, human understanding\n* \"A recommendation engine for feedback data\"\n\n## Everything open, all the time\n\n* This project was [coded in the open](https://transform.england.nhs.uk/key-tools-and-info/digital-playbooks/open-source-digital-playbook/cdu-data-science-team/) and is MIT licensed\n* Engage with the organisations as we find them\n * Do they want code or a docker image?\n * Do they want to fetch their own themes from an API?\n * Do they want to use our dashboard?\n \n## Phase 1\n\n* 10 categories and moderate performance on criticality analysis\n* scikit-learn\n* Shiny\n* Reticulate\n* R package of Python code\n\n## Golem all the things!\n\n* Opinionated way of building Shiny\n* Allows flexibility in deployed versions using YAML\n* Agnostic to deployment\n* Emphasises dependency management and testing\n* Separate \"reactive\" and \"business\" logic (see the [accompanying book](https://engineering-shiny.org/))\n\n## Phase 2\n\n* 30-50 categories and excellent criticality performance\n* scikit-learn/ BERT\n* More Shiny\n* Separate the code bases\n* FastAPI\n* Inspired by the [Royal College of Paediatrics and Child Health API](https://www.rcpch.ac.uk/resources/growth-charts/digital)\n* [Documentation, documentation, documentation](https://github.com/CDU-data-science-team/PatientExperience-QDC)\n\n## Making it useful\n\n* Accurately rating low frequency categories\n* Per category precision and recall\n* Speed versus accuracy\n* Representing the thematic structure\n\n## The future\n\n* Off the shelf, proprietary data collection systems dominate\n* They often offer bundled analytic products of low quality\n* The DS time can't and doesn't want to offer a complete data system\n* How can we best contribute to improving patient experience for patients in the NHS?\n * If the patient experience data won't come to the mountain...\n\n## Open source FTW!\n\n* Often individuals in the NHS don't want private companies to \"benefit\" from open code\n* But if they make their products better with open code the patients win\n* [Best practice as code](https://www.rcpch.ac.uk/news-events/news/royal-colleges-30-best-practice-code)\n\n## The projects\n\n* https://github.com/CDU-data-science-team/pxtextmining\n* https://github.com/CDU-data-science-team/experiencesdashboard\n* https://github.com/CDU-data-science-team/PatientExperience-QDC\n\n## The team\n\n* YiWen Hon (Python & Machine learning)\n* Oluwasegun Apejoye (Shiny)\n\nContact:\n\n* chris.beeley1@nhs.net\n* https://fosstodon.org/@chrisbeeley\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-05-23_data-science-for-good/index/execute-results/html.json b/_freeze/presentations/2023-05-23_data-science-for-good/index/execute-results/html.json
new file mode 100644
index 00000000..f8582ee0
--- /dev/null
+++ b/_freeze/presentations/2023-05-23_data-science-for-good/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "5c6d09c617b0b6b60d7e196c6062f13d",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"What good data science looks like\"\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-05-23\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n width: 1920\n height: 1080\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/\n\n## Patient experience\n\n* The NHS collects a lot of patient experience data\n* Rate the service 1-5 (Very poor... Excellent) but also give written feedback\n * \"Parking was difficult\"\n * \"Doctor was rude\"\n * \"You saved my life\"\n* Many organisations lack the staffing to read all of the feedback in a systematic way\n* Produce an algorithm to rate theme and \"criticality\"\n\n## Help people to do their jobs\n\n* Text based data is _complex_ and built on _human experience_\n* The tool should enhance, not replace, human understanding\n* Enhancing search and filtering\n * If they read 100 comments today, which should they read?\n* \"A recommendation engine for feedback data\"\n\n## Reflect what users want\n\n* I have worked with this data since before it existed \n* I came to realise that people were struggling to read all of their data \n* Fits alongside other work happening within NHSE\n * A framework for understanding patient experience\n\n## Useful\n\n* A fundamental principle is that everyone can use\n* If you can run the code, run it\n* If you can use the API, use it \n* If you just want the dashboard, use it\n* Credit to the [growth charts API](https://www.rcpch.ac.uk/resources/growth-charts/digital/about)\n\n## Understandable\n\n* Tuned to the users needs\n* Not simply tuning accuracy scores\n* Look at the type of mistake the model is making\n* Look at the category it's predicting\n * We can lose a few of common unimportant categories \n * We need to get every rare and important category\n \n## Iterative\n\n* Year one\n * 10 categories\n * Moderate criticality performance\n * No deep learning\n * Weak dashboard\n * Positive evaluation\n \n## Iterative\n\n* Year two\n * 30-50 categories\n * Strong criticality performance\n * Deep learning\n * Improved dashboard\n * WIP\n* Overall five minor versions of algorithm and seven of dashboard\n\n## Documented\n\n* We've documented in the way you usually would\n* We were asked in year 1 to provide plain English documentation\n* We made [a website](https://cdu-data-science-team.github.io/PatientExperience-QDC/) with all the product details\n\n## Develop skills of the staff, technical and otherwise\n\n* Year one created a Python programmer\n* Year two created an R/ Shiny programmer\n* The team has learned: \n * Static website generation\n * Text cleaning/ searching/ mining\n * Collaborative coding practices\n * Working with and communicating with users\n * Linux, databases, APIs...\n \n## Benefits from, and benefits, the community\n\n](golem.png)\n\n## Benefits from, and benefits, the community\n\n* We benefit and benefit from\n * NHS-R\n * NHS-Pycom\n * Government Digital Service\n * Colleagues and friends\n\n## Open and reproducible\n\n* Off the shelf, proprietary data collection systems dominate\n* They often offer bundled analytic products of low quality\n* The DS time can't and doesn't want to offer a complete data system\n* How can we best contribute to improving patient experience for patients in the NHS?\n * If the patient experience data won't come to the mountain...\n\n## Open source FTW!\n\n* Often individuals in the NHS don't want private companies to \"benefit\" from open code\n* But if they make their products better with open code the patients win\n* [Best practice as code](https://www.rcpch.ac.uk/news-events/news/royal-colleges-30-best-practice-code)\n\n## Fun!\n\n* Combing through spreadsheets looking for one comment is not fun\n* Doing things the same way you did them last year is not fun\n* Trying to implement a project that is too complicated is not fun\n\n \n\n* Working with a diverse team with different skills is fun\n* Accessing high quality documentation to understand a project better is fun*\n\n## Team and code\n\n* Andreas Soteriades (Y1)\n* YiWen Hon, Oluwasegun Apejoye (Y2)\n\n \n\n* [pxtextmining](https://github.com/CDU-data-science-team/pxtextmining)\n* [experiencesdashboard](https://github.com/CDU-data-science-team/experiencesdashboard)\n* [Documentation](https://github.com/CDU-data-science-team/PatientExperience-QDC)\n\n
\n\n* chris.beeley1@nhs.net\n* https://fosstodon.org/@chrisbeeley\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-6-1.png b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-6-1.png
new file mode 100644
index 00000000..7825e69f
Binary files /dev/null and b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-6-1.png differ
diff --git a/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-7-1.png b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-7-1.png
new file mode 100644
index 00000000..2423f59b
Binary files /dev/null and b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-7-1.png differ
diff --git a/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-8-1.png b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-8-1.png
new file mode 100644
index 00000000..c55c7d49
Binary files /dev/null and b/_freeze/presentations/2023-07-11_haca-nhp-demand-model/index/figure-html/unnamed-chunk-8-1.png differ
diff --git a/_freeze/presentations/2023-08-02_mlcsu-ksn-meeting/index/execute-results/html.json b/_freeze/presentations/2023-08-02_mlcsu-ksn-meeting/index/execute-results/html.json
new file mode 100644
index 00000000..cc03aca7
--- /dev/null
+++ b/_freeze/presentations/2023-08-02_mlcsu-ksn-meeting/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "3af94e31c4288f3e980d7b976637a5d9",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: Travels with R and Python\nsubtitle: the power of data science in healthcare\nauthor: \"[Chris Beeley](mailto:chris.beeley1@nhs.net)\"\ndate: 2023-08-02\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n footer: |\n view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]\n preview-links: auto\n slide-number: false\n auto-animate: true\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/presentations/2023-08-02_mlcsu-ksn-meeting\n\n## What is data science?\n\n* \"A data scientist knows more about computer science than the average statistician, and more about statistics than the average computer scientist\"\n\n[(Josh Wills, a former head of data engineering at Slack)](https://medium.com/odscjournal/data-scientists-versus-statisticians-8ea146b7a47f)\n\n## Drew Conway's famous Venn diagram\n\n{fig-alt=\"Data science Venn diagram\" fig-align=\"center\"}\n\n[Source](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)\n\n## What are the skills of data science?\n\n* Analysis\n * ML\n * Stats\n * Data viz\n* Software engineering\n * Programming\n * SQL/ data\n * DevOps\n * RAP\n\n## What are the skills of data science?\n \n* Domain knowledge\n * Communication\n * Problem formulation\n * Dashboards and reports\n\n## Stats and data viz\n\n* ML leans a bit more towards atheoretical prediction\n* Stats leans a bit more towards inference (but they both do both)\n* Data scientists may use different visualisations\n * Interactive web based tools\n * Dashboard based visualisers e.g. [{stminsights}](https://github.com/cschwem2er/stminsights)\n \n## Software engineering\n\n* Programming\n * No/ low code data science?\n* SQL/ data\n * Tend to use reproducible automated processes\n* DevOps\n * Plan, code, build, test, release, deploy, operate, monitor\n* RAP\n * I will come back to this\n\n## Domain knowledge\n\n* Do stuff that matters\n * The best minds of my generation are thinking about how to make people click ads. That sucks. [Jeffrey Hammerbacher](https://www.fastcompany.com/3008436/why-data-god-jeffrey-hammerbacher-left-facebook-found-cloudera)\n* Convince other people that it matters\n* This is the hardest part of data science\n\n## RAP\n\n* Data science isn't RAP\n* RAP isn't data science\n* They are firm friends\n\n## Reproducibility\n\n* Reproducibility in science\n* The [$6B spreadsheet error](https://baselinescenario.com/2013/02/09/the-importance-of-excel/)\n* [George Osbourne's austerity was based on a spreadsheet error](https://www.theguardian.com/politics/2013/apr/18/uncovered-error-george-osborne-austerity)\n* For us, reproducibility also means we can do the same analysis 50 times in one minute\n * Which is why I started down the road of data science\n \n## What is RAP\n\n* a process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\n* RAP should be:\n\n> the core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n:::{.footer}\n[Goldacre review](https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis)\n:::\n\n## Levels of RAP- Baseline\n\n* Data produced by code in an open-source language (e.g., Python, R, SQL)\n* Code is version controlled\n* Repository includes a README.md file that clearly details steps a user must follow to reproduce the code\n* Code has been peer reviewed\n* Code is published in the open and linked to & from accompanying publication (if relevant)\n\n:::{.footer}\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n:::\n\n## Levels of RAP- Silver\n\n* Code is well-documented...\n* Code is well-organised following standard directory format\n* Reusable functions and/or classes are used where appropriate\n* Pipeline includes a testing framework\n* Repository includes dependency information (e.g. requirements.txt, PipFile, environment.yml)\n* Data is handled and output in a [Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) format\n\n:::{.footer}\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n:::\n\n## Levels of RAP- Gold\n\n* Code is fully packaged\n* Repository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\n* Process runs based on event-based triggers (e.g., new data in database) or on a schedule\n* Changes to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n:::{.footer}\n[Source: NHS Digital RAP community of practice](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/)\n:::\n\n## Data science in healthcare\n\n* Forecasting\n * Stats versus ML\n* Text mining\n * R versus Python\n* Demand modelling\n * DevOps as a way of life\n \n## Get involved!\n\n* [NHS-R community](https://nhsrcommunity.com/)\n * Webinars, training, conference, Slack\n* [NHS Pycom](https://nhs-pycom.net/)\n * ditto...\n* MLCSU GitHub?\n* Build links with the other CSUs\n\n## Contact\n\n::: {.columns}\n\n:::: {.column}\n::: {.no-bullets}\n- {{< fa envelope >}} [strategy.unit@nhs.net](mailto:strategy.unit@nhs.net)\n- {{< fa brands github size=1x >}} [The-Strategy-Unit](https://github.com/The-Strategy-Unit)\n:::\n::::\n\n:::: {.column}\n::: {.no-bullets}\n- {{< fa envelope >}} [chris.beeley1@nhs.net](mailto:chris.beeley1@nhs.net)\n- {{< fa brands github size=1x >}} [chrisbeeley](https://github.com/ChrisBeeley)\n:::\n::::\n\n:::\n\n\n\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-metrics-1.png b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-metrics-1.png
new file mode 100644
index 00000000..83e7f090
Binary files /dev/null and b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-metrics-1.png differ
diff --git a/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-plot-1.png b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-plot-1.png
new file mode 100644
index 00000000..19066e33
Binary files /dev/null and b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/icb-plot-1.png differ
diff --git a/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/london-icbs-1-1.png b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/london-icbs-1-1.png
new file mode 100644
index 00000000..1c8636fd
Binary files /dev/null and b/_freeze/presentations/2023-08-24_coffee-and-coding_geospatial/index/figure-html/london-icbs-1-1.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-1.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-1.png
new file mode 100644
index 00000000..806e095f
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-1.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-2.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-2.png
new file mode 100644
index 00000000..528fa13a
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-2.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-3.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-3.png
new file mode 100644
index 00000000..28deb11e
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-3.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-4.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-4.png
new file mode 100644
index 00000000..0d4ca455
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-4.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-5.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-5.png
new file mode 100644
index 00000000..8dd8e53d
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-5.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-6.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-6.png
new file mode 100644
index 00000000..f612070e
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-11-6.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-3-1.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-3-1.png
new file mode 100644
index 00000000..806e095f
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-3-1.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-4-1.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-4-1.png
new file mode 100644
index 00000000..806e095f
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-4-1.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-6-1.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-6-1.png
new file mode 100644
index 00000000..806e095f
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-6-1.png differ
diff --git a/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-9-1.png b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-9-1.png
new file mode 100644
index 00000000..528fa13a
Binary files /dev/null and b/_freeze/presentations/2023-09-07_coffee_and_coding_functions/index/figure-html/unnamed-chunk-9-1.png differ
diff --git a/_freeze/presentations/2023-10-17_conference-check-in-app/index/execute-results/html.json b/_freeze/presentations/2023-10-17_conference-check-in-app/index/execute-results/html.json
new file mode 100644
index 00000000..bf77202f
--- /dev/null
+++ b/_freeze/presentations/2023-10-17_conference-check-in-app/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "9c9d75d48c388ca7332f5f3afb62a6d7",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Conference Check-in App\"\nsubtitle: \"NHS-R/NHS.pycom 2023\"\nauthor: \"[Tom Jemmett](mailto:thomas.jemmett@nhs.net)\"\ndate: 2023-10-17\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n preview-links: auto\n slide-number: false\n auto-animate: true\n footer: |\n view slides at [https://tinyurl.com/nhsr23tj][ds_presentations]\n---\n\n\n\n\n[ds_presentations]: https://the-strategy-unit.github.io/data_science/presentations/2023-10-17_conference-check-in-app\n\n# or, on why you should [ignore]{.yellow} your boss, [play]{.yellow} about, and [have fun]{.yellow} {.inverse}\n\n## {background-image=\"1.jpg\"}\n\n:::{.footer}\n[digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/](https://digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/)\n:::\n\n:::{.notes}\n Clark, Junebug. [Registration Desk for the LPC Conference], photograph, 2016-03-17/2016-03-19; (https://digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/: accessed October 16, 2023), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Special Collections. \n:::\n\n# {background-image=\"2.jpg\"}\n\n:::{.footer}\n[unsplash.com/photos/MldQeWmF2_g](https://unsplash.com/photos/MldQeWmF2_g)\n:::\n\n# Can we not do better?\n\n## QR codes are great\n\n{fig-align=\"center\"}\n\n## and can be easily generated in R\n\n```r\ninstall.packages(\"qrcode\")\nlibrary(qrcode)\n\nqr_code(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")\n```\n\n# But can we build a shiny app to read QR codes? {.inverse}\n\n# No (probably)\n\n## Why not?\n\n- `{shiny}` would be doing all the processing on the server side\n- we would need to read from a camera client side\n- then stream video to the server for `{shiny}` to detect and decode the QR codes\n\n# Well, can we do it client side? {.inverse}\n\n# Yes\n\n. . .\n\n[github.com/nhs-r-community/conf-23-checkin](https://github.com/nhs-r-community/conf-23-checkin)\n\n## How does this work?\n\n:::{.columns}\n\n::::{.column width=70%}\n### Front-end\n:::{.incremental}\n- uses the [React](https://react.dev) JavaScript framework\n- [@yidel/react-qr-scanner](https://github.com/yudielcurbelo/react-qr-scanner)\n- App scan's a QR code, then sends this to our backend\n- A window pops up to say who has checked in, or shows an error message\n:::\n::::\n\n::::{.column width=30%}\n\n::::\n\n:::\n\n## How does this work?\n\n### Back-end\n\nUses the `{plumber}` R package to build the API, with endpoints for\n\n- getting the list of all of the attendees for that day\n- uploading a list of attendees in bulk\n- adding an attendee individually\n- getting an attendee\n- checking the attendee in\n\n\n## How does this work?\n\n### More Back-end Stuff\n\n- uses a simple SQLite DB that will be thrown away at the end of the conference\n- we send personalised emails using `{blastula}` to the attendees with their QR codes\n- the QR codes are just random ids (UUIDs) that identify each attendee\n- uses websockets to update all of the clients when a user checks in (to update the list of attendees)\n\n# I've wanted to play about with React for a while... {.inverse}\n\n. . .\n\nThis was a silly, inconsequential project to get to grips with something new\n\n# Was it worth it? {.inverse}\n\n. . .\n\nYes!\n\n# {background-image=5.jpg}\n\n:::{.footer}\n[unsplash.com/photos/WfUxLpncYwI](https://unsplash.com/photos/WfUxLpncYwI)\n:::\n\n## Learning different tools can show you the light {background-image=6.jpg}\n\n:::{.footer}\n[unsplash.com/photos/tMGMINwFOtI](https://unsplash.com/photos/tMGMINwFOtI)\n:::\n\n# Go away, learn something new {.inverse}\n\nThanks!\n\n[thomas.jemmett@nhs.net](mailto:thomas.jemmett@nhs.net)\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2024-05-16_store-data-safely/index/execute-results/html.json b/_freeze/presentations/2024-05-16_store-data-safely/index/execute-results/html.json
new file mode 100644
index 00000000..a66e1306
--- /dev/null
+++ b/_freeze/presentations/2024-05-16_store-data-safely/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "e044f1078f8a7b59a99581dcbf0e13de",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Store Data Safely\"\nsubtitle: \"Coffee & Coding\"\nauthor:\n - \"[YiWen Hon](mailto:yiwen.hon1@nhs.net)\"\n - \"[Matt Dray](mailto:matt.dray@nhs.net)\"\ndate: 2024-05-16\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n preview-links: auto\n slide-number: false\n auto-animate: true\n footer: |\n Learn more about [Data Science at The Strategy Unit ](https://the-strategy-unit.github.io/data_science/)\n---\n\n\n\n# Avoid storing data on GitHub {.inverse}\n\n## Why?\n\nBecause:\n\n* data may be sensitive\n* GitHub was designed for source control of _code_\n* GitHub has repository file-size limits\n* it makes data independent from code\n* it prevents repetition\n\n## Other approaches\n\nTo prevent data commits:\n\n* use [a .gitignore file](https://github.com/github/gitignore/blob/main/R.gitignore) (*.csv, etc)\n* use [Git hooks](https://www.atlassian.com/git/tutorials/git-hooks)\n* avoid 'add all' (`git add .`) when staging\n* ensure thorough reviews of (small) pull-requests\n\n## What if I committed data?\n\n'It depends', but if it's sensitive:\n\n* 'undo' the commit with [git reset](https://www.atlassian.com/git/tutorials/undoing-changes/git-reset)\n* use a [tool like BFG](https://rtyley.github.io/bfg-repo-cleaner/) to expunge the file from Git history\n* delete the repo and restart 🔥\n\nA data security breach [may have to be reported](https://csucloudservices.sharepoint.com/SitePages/Report-a-breach.aspx).\n\n## Data-hosting solutions\n\nWe'll talk about two main options for The Strategy Unit: \n\n1. Posit Connect and the {pins} package\n2. Azure Data Storage\n\nWhich to use? It depends.\n\n# {pins} 📌 {.inverse}\n\n## A platform by Posit\n\n\n\n:::{.footer}\n[https://connect.strategyunitwm.nhs.uk/](https://connect.strategyunitwm.nhs.uk/)\n:::\n\n## A package by Posit\n\n\n\n:::{.footer}\n[https://pins.rstudio.com/](https://pins.rstudio.com/)\n:::\n\n## Basic approach\n\n```r\ninstall.packages(\"pins\")\nlibrary(pins)\n\nboard_connect()\npin_write(board, data, \"pin_name\")\npin_read(board, \"user_name/pin_name\")\n```\n\n## Live demo\n\n1. Link RStudio to Posit Connect (authenticate)\n1. Connect to the board\n1. Write a new pin\n1. Check pin status and details\n1. Pin versions\n1. Use pinned data\n1. Unpin your pin\n\n## Should I use it?\n\n:::: {.columns}\n\n::: {.column width='50%'}\n⚠️ {pins} is not great because:\n\n* you should not upload sensitive data!\n* there's a file-size upload limit\n* pin organisation is a bit awkward (no subfolders)\n:::\n\n::: {.column width='50%'}\n{pins} is helpful because:\n\n* authentication is straightforward\n* data can be versioned\n* you can control permissions\n* there are R and Python versions of the package\n:::\n\n::::\n\n# Azure Data Storage 🟦 {.inverse}\n\n## What is Azure Data Storage?\n\nMicrosoft cloud storage for unstructured data or 'blobs' (Binary Large Objects): data objects in binary form that do not necessarily conform to any file format.\n\nHow is it different?\n\n* No hierarchy – although you can make pseudo-'folders' with the blobnames.\n* Authenticates with your Microsoft account.\n\n## Authenticating to Azure Data Storage\n\n* You are all part of the “strategy-unit-analysts” group; this gives you [read/write access to specific Azure storage containers](https://portal.azure.com/#view/Microsoft_AAD_IAM/GroupDetailsMenuBlade/~/SubscriptionResources/groupId/7d67c846-34cf-4a1f-97d1-fcd40c6b3a86).\n* You can store sensitive information like the container ID in a local .Renviron or .env file that should be ignored by git.\n* Using {AzureAuth}, {AzureStor} and your credentials, you can connect to the Azure storage container, upload files and download them, or read the files directly from storage!\n\n## Step 1: load your environment variables\n\nStore sensitive info in an .Renviron file that's kept out of your Git history! The info can then be loaded in your script.\n\n.Renviron: \n\n```\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\n```\n\nScript: \n\n```r\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\n```\n\nTip: reload .Renviron with `readRenviron(\".Renviron\")`\n\n## Step 1: load your environment variables\n\nIn the demo script we are providing, you will need these environment variables:\n\n```r\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")\n```\n\n## Step 2: Authenticate with Azure\n\n\n\n:::: {.columns}\n\n::: {.column width='70%'}\n```r\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\n```\n\nThe first time you do this, you will have link to authenticate in your browser and a code in your terminal to enter. Use the browser that works best with your \\@mlcsu.nhs.uk account!\n:::\n\n::: {.column width='30%'}\n{fig-alt=\"A Microsoft authentication screen asking if the user is trying to sign into a named Azure container.\"}\n:::\n\n::::\n\n## Step 3: Connect to container\n\n```r\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\n```\n\nIf you get 403 error, delete your token and re-authenticate, try a different browser/incognito, etc.\n\nTo clear Azure tokens: `AzureAuth::clean_token_directory()`\n\n## Interact with the container\n\nIt’s possible to interact with the container via your browser!\n\nYou can upload and download files using the Graphical User Interface (GUI), login with your \\@mlcsu.nhs.uk account: [https://portal.azure.com/#home](https://portal.azure.com/#home)\n\nAlthough it’s also cooler to interact via code… 😎\n\n## Interact with the container\n\n```r\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(\n container,\n \"LOCAL_FOLDERNAME/*\",\n \"FOLDERNAME_ON_AZURE\"\n)\n\n# Upload specific file to container\nAzureStor::storage_upload(\n container,\n \"data/ronald.jpeg\",\n \"newdir/ronald.jpeg\"\n)\n```\n\n## Load csv files directly from Azure container\n\n```r\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by storing it in memory)\n\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\n\nparq_df <- arrow::read_parquet(parquet_in_memory)\n```\n\n## Interact with the container\n\n```r\n# Delete from Azure container (!!!)\nAzureStor::delete_storage_file(container, BLOB_NAME)\n```\n\n## What does this achieve?\n\n* Data is not in the repository, it is instead stored in a secure location\n* Code can be open – sensitive information like Azure container name stored as environment variables\n* Large filesizes possible, other people can also access the same container.\n* Naming conventions can help to keep blobs organised (these create pseudo-folders)",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2024-05-23_github-team-sport/index/execute-results/html.json b/_freeze/presentations/2024-05-23_github-team-sport/index/execute-results/html.json
new file mode 100644
index 00000000..e2a32333
--- /dev/null
+++ b/_freeze/presentations/2024-05-23_github-team-sport/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "70852f095b399b9dcd255d10de2e33cb",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"GitHub as a team sport\"\nsubtitle: \"DfT QA Month\"\nauthor: \"[Matt Dray](mailto:matt.dray@nhs.net)\"\ndate: 2024-05-23\ndate-format: \"MMM D, YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n preview-links: auto\n slide-number: false\n auto-animate: true\n footer: |\n Learn more about [The Strategy Unit](https://www.strategyunitwm.nhs.uk/)\n---\n\n\n\n\n## tl;dr\n\n:::: {.columns}\n\n::: {.column width='60%'}\n\n* 'Quality' isn't just good code\n* _Teamwork makes the dream work_\n* GitHub is a communication tool\n:::\n\n:::{.column width='40%'}\n\n{width=\"100%\"}\n:::\n\n::::\n\n\n::: {.notes}\n\n* 'Too long; didn't read'.\n* GitHub isn't just a dumping ground for code and Git history.\n* It's a platform for working with teammates to get things done.\n* Quality is improved by good communication, organisation and reduction of something called the 'bus factor' that I'll get to in a minute.\n\n:::\n\n# About us {.inverse}\n\n::: {.notes}\n\n* I'll start with some context about the organisation so you understand how we work.\n* There's a link to the website at the bottom of each slide.\n\n:::\n\n## The Strategy Unit (SU)\n\n:::: {.columns}\n\n::: {.column width='60%'}\n\n* An 'internal consultancy'\n* Hosted by [NHS Midlands and Lancashire](https://www.midlandsandlancashirecsu.nhs.uk/)\n* Growing in size and reputation\n\n:::\n\n::: {.column width='40%'}\n\n{width=\"100%\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* Initially a 'start-up' style operation that has expanded to 70+ staff.\n* 'We produce high-quality, multi-disciplinary analytical work – and we help people apply the results.'\n* A lot of our work is on the important New Hospital Programme (NHP).\n* 'Our proposition is simple: better evidence, better decisions, better outcomes.'\n* Expansion is tricky; how can we maintain quality?\n\n:::\n\n## The [Data Science](https://the-strategy-unit.github.io/data_science/) Team\n\n{width=\"15%\"}\n{width=\"15%\"}\n{width=\"15%\"}\n{width=\"15%\"}\n{width=\"15%\"}\n{width=\"15%\"}\n\n* Expanded to 6, all remote\n* Modelling, Quarto, Shiny\n* [New Hospital Programme](https://connect.strategyunitwm.nhs.uk/nhp/project_information/) (NHP)\n\n\n::: {.notes}\n\n* A new team, expanding rapidly from 2 to 6 in about a year.\n* Remote across England.\n* Experience from across the NHS and consultancy. I spent a decade in five central government departments before this.\n* We're helping to model and design apps for the NHP to help build hospitals.\n* So: growing team, different experiences, important work, but few standardised processes. What to do?\n\n:::\n\n## GitHub at the SU\n\n{width=\"100%\"}\n\n* We should be exemplars\n* Aiming for open by default\n* GitHub is [on the homepage](https://www.strategyunitwm.nhs.uk/) and there's [a Data Science site](https://the-strategy-unit.github.io/data_science/)\n\n::: {.notes}\n\n* It's not just the DS team.\n* We have many other analysts eager to learn and contribute.\n* How can we set good standards and encourage use across the organisation?\n* We're running Coffee & Coding sessions, teaching and encouraging talks and blogs on our site.\n* We want to drive up quality by making code open too.\n* It's a statement of intent that the SU homepage links to our GitHub organisation.\n\n:::\n\n# This talk {.inverse}\n\n::: {.notes}\n\n* So that's the context: we're experienced, but fledgling as a team, and keen to do things well.\n* These slides are some of the ways we've been working so far with a focus on GitHub, specifically.\n\n:::\n\n## What this is\n\n* Low-tech, no code\n* Tips and etiquette, not directives\n* What's been working for us\n\n::: {.notes}\n\n* But this is not a technical talk about how to use Git for version control. \n* Mostly it's about planning, workflows, standards and communication.\n* It's things that our team have been doing and the ideas are evolving.\n* I've worked mostly alone on GitHub projects in my career and never worked in a data science team of even this size. So at worst these slides are a way for me to write down what I'm learning.\n\n:::\n\n## The '[bus factor](https://en.wikipedia.org/wiki/Bus_factor)' 🚍\n\n* We should maintain quality\n* We need redundancy\n* Standardised processes can help\n\n::: {.notes}\n\n* Why do we care about discussing and 'formalising' these ideas?\n* We should encourage standard practices in case someone is ill or away.\n* This also makes it easier when new team members join.\n* This helps us maintain quality.\n\n:::\n\n## 'Rules'\n\n* It's the spirit that counts\n* Do as I say, not as I do\n* Know _why_ you're breaking the rules\n\n::: {.notes}\n\n* To be clear though, nothing here is etched into stone.\n* There will be times where rules can be broken.\n* But we shouldn't be complacent.\n\n:::\n\n# What we do {.inverse}\n\n::: {.notes}\n\n* So, let's move onto some specific examples of how we've been using GitHub for our work.\n* I haven't included everything. \n* Many of our best examples are currently on closed repos that will be opened with time, due to various sensitivities.\n* Some ideas are literal features of the platform, others are more like 'suggested best practice'.\n* Hopefully there'll be at least one thing that's new to you and that you might want to use in your own team.\n\n:::\n\n## GitHub flow\n\n1. Create a repository\n1. Write issues\n1. Plan \n1. Create a branch\n1. Make a pull request\n1. Review\n1. Release\n\n::: {.notes}\n\n* This is a fairly generic GitHub flow.\n* I'll talk through a few things in each of these categories.\n\n:::\n\n## Repositories\n\n* Assign '[owner](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners)' and 'deputy' roles\n* Add [README](https://www.makeareadme.com/) and [.gitignore](https://github.com/github/gitignore)\n* Store data elsewhere\n\n::: {.notes}\n\n* Easy starter: tell people what the purpose of the repo is and how to use it. This is what a README is for. This is an absolute must to lower the bus factor.\n* We should be prevent accidental file upload immediately. Use a .gitignore to exclude likely data files (as well as other unnecessary files). We're thinking about common templates/cookiecutters.\n* Communicative files (README, .gitignores) are good, but so is vigilance (code review).\n* Owners/deputies are in charge of 'GitHub gardening' (keeping issues in order, labelling, milestones, etc). \n* Deputies help with bus factor.\n* The owner can be auto-selected as the reviewer. We're experimenting with this for repos with external contributors, especially.\n* Data is stored elsewhere, on Azure or Posit Connect, due to sensitivity and size. This should be planned before you begin and recorded in the README.\n\n:::\n\n## Issues\n\n:::: {.columns}\n\n::: {.column width='40%'}\n\n* Aren't just 'problems'\n* Use [labels](https://docs.github.com/en/issues/using-labels-and-milestones-to-track-work/managing-labels), including [MoSCoW](https://en.wikipedia.org/wiki/MoSCoW_method)\n* Explain the need, be informative\n\n:::\n\n::: {.column width='60%'}\n\n\n\n:::\n\n::::\n\n::: {.notes}\n\n* Issues can be reminders or questions for further discussion, not just features to build.\n* Tickets should get two labels. We use a topic like 'enhancement', 'bug', 'documentation', 'techdebt', etc, _plus_ MoSCoW (must, should, could, won't) to help prioritisation.\n* Issue templates can ensure certain info is provided, which is especially good for external contributors.\n* Refer to other related commits by number (e.g. `#1`), which stops you repeating the same information.\n* Prefer to reopen an issue if it doesn't actually work.\n* Issues can track separate sub-issues.\n* You can add checklists with markdown checkbox: `- [ ]` (these appear in the issue preview).\n* You can 'hide' comments if they're out of date, etc.\n\n:::\n\n## Plan\n\n\n\n* Talk, review and reflect\n* Use labels to prioritise\n* Sort into [milestones](https://docs.github.com/en/issues/using-labels-and-milestones-to-track-work/about-milestones)\n\n::: {.notes}\n\n* We have a repo and issues, what do we do now? Where to start?\n* We've begun working in sprints of about 4 weeks. We have sprint planning meetings to plan things out.\n* Consider what _needs_ to be done in the sprint period, what other issues support those goals?\n* Is there time for other tasks, like clearing techdebt? \n* All issues should be assigned to a milestone.\n* Issues in milestones should be sorted in priority order/order of expected completion (MoSCoW labels will help with this).\n* This helps focus the goals of the sprint and keep us on track.\n\n:::\n\n## Branches\n\n\n\n* [One issue, one branch](https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-a-branch-for-an-issue), one [assigned person](https://docs.github.com/en/issues/tracking-your-work-with-issues/assigning-issues-and-pull-requests-to-other-github-users)\n* Name them sensibly\n* [Burn them](https://docs.github.com/en/issues/tracking-your-work-with-issues/closing-an-issue)\n\n::: {.notes}\n\n* Only one person works on a branch at a time. This person is the one assigned to the relevant issue.\n* Branch names should be numbered to match their issue, e.g. '123-add-filter'. This makes it obvious what issue is being fixed by that branch and should help identify if more than one person has a branch open for the same issue.\n* If commits from someone else are required, then all parties must communicate about the current state of the branch to ensure they pull changes and avoid merge conflicts.\n* Branches are ephemeral and die when the PR is merged. They should be deleted (this can be done automatically).\n* The only branches to exist at all times should be main and a deployment branch, if necessary. All others should be active branches so it's clear what's being worked on.\n\n:::\n\n## Commits\n\n\n\n* Don't commit to main!\n* 'Small, early and often'\n* Make messages meaningful\n\n::: {.notes}\n\n* There's not a lot of earth-shattering advice to give here; this stuff is fairly standard.\n* Do not commit directly to main. Your work must be independently checked first to limit the chance of mistakes.\n* Make your commits small in terms of code and files touched, if possible. This makes the Git history easier to read and makes reviews easier too.\n* Commit and push early and often into your branch. This can help others see progress and helps reduce the bus factor.\n* Don't dump your work into a commit because it's the end of the day.\n* Make your commit messages meaningful. What does the commit do? Start with a verb in present tense ('adds', not 'added'). Or maybe use 'conventional' commits.\n\n:::\n\n## Pull requests (PRs)\n\n:::: {.columns}\n\n::: {.column width='50%'}\n\n* Small and [closes an issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)\n* Select the [assignee](https://docs.github.com/en/issues/tracking-your-work-with-issues/assigning-issues-and-pull-requests-to-other-github-users) and reviewer\n* The assignee merges\n\n:::\n\n::: {.column width='50%'}\n\n\n\n:::\n\n::::\n\n::: {.notes}\n\n* PRs should solve the issue they're related to. Occasionally one fix may solve another.\n* They should be named to explain what they do. The issue might be 'the red button doesn't work'; the PR might be 'fix the red button'.\n* They should be small in terms of lines of code and files touched. This will make it easier and faster to understand and assess the changes.\n* The submitter should mark themself as the 'assignee' and choose a reviewer. You may want to chat with the reviewer to let them know if they have time.\n* For context, link to the issue(s) being closed with the magic words ('closes', 'fixes', etc), which will also close those issues as completed.\n* Include a short explanation or bullet-points of what the PR does. Provide any extra information to make the reviewer's life easier (areas of focus, maybe) or to ask a question about some aspect of what you've written.\n* The PR submitter is the one who clicks the merge button. This is in case the submitter realises there's something they need to add or change before the merge.\n\n:::\n\n## Reviewing PRs\n\n:::: {.columns}\n\n::: {.column width='45%'}\n\n* Be helpful, be kind\n* Use [GitHub suggestions](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request)\n* Discuss if unclear\n\n:::\n\n::: {.column width='55%'}\n\n\n\n:::\n\n::::\n\n::: {.notes}\n\n* The reviewer should typically check that the changes result in the issue being fixed. This may require pulling the branch and then testing it, but may not be necessary for small changes.\n* The reviewer should seek clarification and add comments where something isn't clear.\n* Use 'suggestions' as a reviewer rather than committing to someone else's branch.\n* When working at pace (when aren't we?), we should err towards approval if the issue is completed rather than an endless cycle of asking for small changes. The submitter and reviewer should decide whether smaller things like code style or change in approach should be added as a new issue with a 'techdebt' label.\n\n:::\n\n## Releases\n\n* Use [semantic](https://semver.org/) versioning (1.2.3)\n* [Autofill notes](https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes) with PR names\n* Don't release on a Friday 🙃\n\n::: {.notes}\n\n* Tag the history and release on GitHub concurrently to keep them in sync (this is done automatically if the release is done from the GitHub interface).\n* Semantic (x.y.z where x is breaking, y is new features and z is patches for bugs).\n* We typically just autofill the release description with the constituent PR titles. Which means it's important to give them meaningful names.\n* We align releases with sprints, though patches may occur more frequently.\n* We link releases to deployment in many cases. Don't release to prod on a Friday, lol.\n\n:::\n\n# Surprise twist... {.inverse}\n\n## GitHub is a team member\n\n\n\n* Automate with [Actions](https://docs.github.com/en/actions/learn-github-actions)\n* Provide [issue](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository) and [repo](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-template-repository) templates\n* An all-in-one [planner](https://docs.github.com/en/issues/planning-and-tracking-with-projects)?\n\n::: {.notes}\n\n* I lied: we have 6 _human_ team members. GitHub itself has features that can automate away some boring things and help prevent accidents or forgetfulness.\n* GitHub Actions for continuous integration. R-CMD check at least for R projects. Start with r-lib examples as a basis.\n* We're looking towards things like templates at the issue and repo levels; again to remove drudgery.\n* We use Trello to plan things and have to link to GitHub repos and issues in Trello cards. Can we use GitHub as our planner across multiple repos instead? Seems possible.\n\n:::\n\n# Be a good sport {.inverse}\n\n## Are we [curling](https://en.wikipedia.org/wiki/Curling)? 🥌\n\n:::: {.columns}\n\n::: {.column width='50%'}\n\nWe:\n\n* are a small team\n* assume specialist roles\n* work in sync\n\n:::\n\n::: {.column width='50%'}\n\n\n\n:::\n\n::::\n\n::: {.notes}\n\n* You have been wondering: if this is a 'team sport', what sport is it?\n* This is a terrible metaphor. _But think about it._\n\n:::\n\n## The bottom line, actually\n\n:::: {.columns}\n\n::: {.column width='70%'}\n\n{width='100%'}\n\n:::\n\n::: {.column width='30%'}\n\n1. Communicate\n2. Help each other\n3. Be kind\n\n:::\n\n::::\n\n::: {.notes}\n\n* The ideas in this talk are things that have helped us, and could help you, to drive up and maintain quality. Some were obvious, some were specific features you might not have known about.\n* But none of these are replacements for being good team members.\n* GitHub just provides some affordances to help you.\n* I am the guy falling over, the stones are tasks, my team mates are picking me up and dusting me off. \n* Did you learn at least one thing? What has your team been doing? What works for you?\n\n:::\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/presentations/2024-10-10_what-is-ai-chris/index/figure-html/unnamed-chunk-1-1.png b/_freeze/presentations/2024-10-10_what-is-ai-chris/index/figure-html/unnamed-chunk-1-1.png
new file mode 100644
index 00000000..9cb581a3
Binary files /dev/null and b/_freeze/presentations/2024-10-10_what-is-ai-chris/index/figure-html/unnamed-chunk-1-1.png differ
diff --git a/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-1-1.png b/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-1-1.png
new file mode 100644
index 00000000..401614f0
Binary files /dev/null and b/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-1-1.png differ
diff --git a/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-2-1.png b/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-2-1.png
new file mode 100644
index 00000000..a30a5613
Binary files /dev/null and b/_freeze/presentations/2024-10-10_what-is-ai-tom/index/figure-html/unnamed-chunk-2-1.png differ
diff --git a/_freeze/presentations/2024-11-22_github-team-sport-rpysoc/index/execute-results/html.json b/_freeze/presentations/2024-11-22_github-team-sport-rpysoc/index/execute-results/html.json
new file mode 100644
index 00000000..3f77a7fe
--- /dev/null
+++ b/_freeze/presentations/2024-11-22_github-team-sport-rpysoc/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "75e8b9a87e71c6f6dd5d3ffb4f65b24b",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"GitHub as a team sport\"\nsubtitle: \"NHS RPySOC 2024\"\nauthor: \"[Matt Dray](mailto:matt.dray@nhs.net)\"\ndate: 2024-11-22\ndate-format: \"D MMMM YYYY\"\nformat:\n revealjs:\n theme: [default, ../su_presentation.scss]\n transition: none\n chalkboard:\n buttons: false\n preview-links: auto\n slide-number: false\n auto-animate: true\n footer: |\n Learn more about [The Strategy Unit](https://www.strategyunitwm.nhs.uk/)\n---\n\n\n\n\n## tl;dr\n\n:::: {.columns}\n\n::: {.column width='60%'}\n\n* GitHub organises code\n* GitHub can help organise _people_\n* We're learning as we go\n\n:::\n\n:::{.column width='40%'}\n\n{width=\"100%\" fig-alt=\"The GitHub logo, which is the silhouette of a cat-octopus hybrid.\"}\n:::\n\n::::\n\n::: {.notes}\n\n* 'Too long; didn't read'.\n* GitHub isn't just a dumping ground for code and version histories.\n* There are features that can help with communication and collaboration.\n\n* We've been learning what works for us as our team continues to grow.\n\n:::\n\n# Context {.inverse}\n\n::: {.notes}\n\n* I'll start with background and motivation.\n* What's the problem we're trying to solve?\n\n:::\n\n## The [Data Science](https://the-strategy-unit.github.io/data_science/) Team\n\n{width=\"11%\" fig-alt=\"Profile photo of Chris, bearded and jacketed.\"}\n{width=\"11%\" fig-alt=\"Profile photo of Tom with a natty jumper and an intense gaze.\"}\n{width=\"11%\" fig-alt=\"Profile photo of YiWen in a sea of books.\"}\n{width=\"11%\" fig-alt=\"Profile photo of Rhian, smiling in a liminal space.\"}\n{width=\"11%\" fig-alt=\"Profile photo of Matt, seemingly on his first day of school.\"}\n{width=\"11%\" fig-alt=\"The helmet of the blue Power Ranger, which represents Ozayr.\"}\n{width=\"11%\" fig-alt=\"The helmet of the red Power Ranger, which represents a new team member.\"}\n{width=\"11%\" fig-alt=\"The helmet of the yellow Power Ranger, which represents a new team member.\"}\n\n* Expanding to 8, all remote\n* Complex [New Hospital Programme](https://connect.strategyunitwm.nhs.uk/nhp/project_information/) (NHP)\n* How should we work together?\n\n::: {.notes}\n\n* We're a growing team (soon to be 8).\n* We've got different backgrounds and experiences.\n* We do modelling, data pipelines, apps, etc.\n* We work largely on a big, complicated project with lots of stakeholders and tasks.\n* We want to bring other teams in the SU along with us.\n* Are there tools or approaches we can use to help us?\n\n:::\n\n## The dream\n\n:::: {.columns}\n\n::: {.column width='40%'}\n\n* Order from chaos\n* Good communication\n* '[Bus factor](https://en.wikipedia.org/wiki/Bus_factor)' reduction\n\n:::\n\n::: {.column width='60%'}\n\n{width=\"100%\" fig-alt=\"The 'this is fine' meme. A cartoon dog in a little hat is sat in a room that's on fire, saying 'this is fine'.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* We have a big project with lots of repositories. We have lots of different tasks and goals.\n* We want to improve clarity and reduce the chance of misunderstanding and error.\n* We don't want information locked up in one person's brain.\n\n:::\n\n## Living the dream\n\n:::: {.columns}\n\n::: {.column width='40%'}\n\n* This works (for now)\n* New folks are joining\n* Things ~~can~~ will change\n\n:::\n\n::: {.column width='60%'}\n\n{fig-alt=\"The 'this is fine' meme but in reverse. Normally the meme is a cartoon dog sat in a room that's on fire, saying 'this is fine'. In this version, a cartoon flame says 'this is fine' surrounded by a room full of dogs.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* We've been slowly changing how we work and the tools we use.\n* Our standards will make it easier for new starters, but they should also have an influence on how we do things.\n* Nothing is set in stone. We're continually thinking about what works and what doesn't.\n\n:::\n\n# So, GitHub {.inverse}\n\n::: {.notes}\n\n* This is a talk about a widely used tool and how we're making use of its features to meet our needs.\n* I'll give a few examples of some of the things we're doing.\n* I'll start broad and get narrower.\n* Examples, because there's not enough time to talk about everything.\n* We follow some basic GitHub like tenets 'one issue, one branch, one pull request' and 'commits should be small', but there's some other things I wanted to mentionin particular.\n* A lot of this will apply to other tools, like GitLab.\n* I hope there's one thing, big or small, that you might consider for your next project.\n\n:::\n\n## GitHub [Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects)\n\n:::: {.columns}\n\n::: {.column width='50%'}\n\n* We're '[agile](https://en.wikipedia.org/wiki/Agile_software_development)'\n* Many tasks/respositories\n* We want to show progress\n\n:::\n\n::: {.column width='50%'}\n\n{width=\"100%\" fig-alt=\"An excerpt from the side-panel of a GitHub issue. It'sa box showing how the issue fits into the project. There are labels to show the status, the sprint it belongs to, its planning state, due date, priority, level and size.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* We work in sprints.\n* There's lots to keep track of: the model, a couple of apps, a documentation site, etc.\n* We want to show others how things are progressing.\n* GitHub Projects helps us by arranging individual tasks from across lots of different repositories.\n* We can also add custom labelling to help us organise and track.\n\n:::\n\n## {background-image=\"projects.png\" alt=\"A kanban-style board made of tasks. We're in a tab named 'curent sprint' and there are tasks on cards with labels and an assigned person's avatar. There are columns for 'backlog', 'in progress' and 'in review'.\"}\n\n::: {.notes}\n\n* We can show the tasks in kanban style, or as a list or as a calendar.\n* We can filter down to show only certain labels, statuses or assigned people.\n* This is helps us find, organise and focus during sprint planning and weekly sprint catch-ups.\n\n:::\n\n## Division of labour\n\n:::: {.columns}\n\n::: {.column width='50%'}\n\n* The '[scrum](https://en.wikipedia.org/wiki/Scrum_(software_development)) master'\n* [Owners and deputies](https://the-strategy-unit.github.io/data_science/style/git_and_github.html#repository-organisation) ([CODEOWNERS](https://docs.github.com/en/enterprise-cloud@latest/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners))\n* Issue and pull-request [assignees](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/assigning-issues-and-pull-requests-to-other-github-users)\n\n:::\n\n::: {.column width='50%'}\n\n{width=\"100%\" fig-alt=\"A PR request showing one person labelled as the assignee and one person identified as the reviewer.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* At the level of the sprint, we have a scrum master that oversees the movement of tasks from the backlog and takes us through the GitHub Project in weekly sprint catch-ups.\n* Within each repository we have an owner and deputy on each repository, with the goal of keeping the it shipshape (e.g. good docs, no stale branches, PRs are reviewed).\n* And we have people assigned to issues and PRs, which signals the tasks that people are working on.\n* Having an identifiable person in charge makes it easier to identify ownership and for others to talk to the right person.\n\n:::\n\n## Task sorting\n\n:::: {.columns}\n\n::: {.column width='33%'}\n\n* [MoSCoW method](https://en.wikipedia.org/wiki/MoSCoW_method)\n* Release-aligned [milestones](https://docs.github.com/en/issues/using-labels-and-milestones-to-track-work/about-milestones)\n* Efficient triage\n\n:::\n\n::: {.column width='67%'}\n\n{width=\"100%\" fig-alt=\"Examples of repository labels and their descriptions: 'bug', 'could', 'documentation'. Each has a colour to help identify it.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* Organising repositories at a higher level doesn't preclude organisation at the repository level, which is foundational.\n* We typically include the labels Must, Should, Could, Won't (MoSCoW) to filter tasks and to help assess importance.\n* The issues associated with the current sprint are added to a milestone with the upcoming version number. This makes it easier to focus, but also release the code with auto-generated notes.\n* These approaches signal intent and help the team to more efficiently decide what to do next.\n\n:::\n\n## Pull requests (PRs)\n\n:::: {.columns}\n\n::: {.column width='40%'}\n\n* Talk!\n* Use [suggestions](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request)\n* The assignee merges the PR\n\n:::\n\n::: {.column width='60%'}\n\n{width=\"100%\" fig-alt=\"Rhian has used the GitHub suggestions feature to fix a typo, which Matt can commit as part of his pull request. The fix removes a rogue letter 'e' from the end of the word mitigator. Matt suggests that 'mitigator' with an 'e' is probably Italian. What a joker.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* Respect each others' time. 'Closes #10' isn't always enough.\n* GitHub comments don't replace talking. Discuss if unclear.\n* Suggestions are efficient and respect the submitter. \n* The submitter owns the PR. They're responsible for closing it.\n\n:::\n\n# Surprise twist... {.inverse}\n\n## GitHub is a team member\n\n{fig-alt=\"Confirmation that a GitHub Action workflow has completed successfully. In this case, it was to build and deploy a website.\"}\n\n* Automate with [Actions](https://docs.github.com/en/actions/learn-github-actions)\n* [Issue templates](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository)\n* [Repo templates](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-template-repository)\n\n::: {.notes}\n\n* Most of the team is human.\nGitHub itself has features that can automate away some boring things and help prevent accidents or forgetfulness.\n* GitHub Actions for continuous integration. R-CMD check at least for R projects. Start with r-lib examples as a basis.\n* We're looking towards things like templates at the issue and repo levels; again to remove drudgery.\n\n:::\n\n# Be a good sport {.inverse}\n\n## Are we [curling](https://en.wikipedia.org/wiki/Curling)? 🥌\n\n:::: {.columns}\n\n::: {.column width='45%'}\n\nWe:\n\n* are a small team\n* assume specialist roles\n* work in sync\n\n:::\n\n::: {.column width='55%'}\n\n{fig-alt=\"Terrible puns in the comments of a pull request. Rhian says 'you're still pushing curling then' (emphasis on 'pushing'). Chris responds 'as analogies go, I think it's nice' (emphasis on 'ice'). Matt mentions 'sweeping' statements.\"}\n\n:::\n\n::::\n\n::: {.notes}\n\n* You have been wondering: if this is a 'team sport', what sport is it?\n* This is a terrible metaphor. _But think about it._\n\n:::\n\n## The bottom line, actually\n\n:::: {.columns}\n\n::: {.column width='70%'}\n\n{width=\"100%\" fig-alt=\"A curling stone heads rapidly across the ice towards some stationary stones. A ricochet knocks a competitor over onto the ice. Teammates rush in to help.\"}\n\n:::\n\n::: {.column width='30%'}\n\n1. Communicate\n2. Help each other\n3. Be kind\n\n:::\n\n::::\n\n::: {.notes}\n\n* The features of GitHub should help you do the things you should already be doing.\n* I am the guy falling over, the stones are tasks, my team mates are picking me up and dusting me off. \n* What has your team been doing? What works for you?\n\n:::\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/about.qmd b/about.qmd
index de691954..fa75e57b 100644
--- a/about.qmd
+++ b/about.qmd
@@ -7,18 +7,46 @@ The Data Science team at the Strategy Unit comprises the following team members:
- [Chris Beeley](https://github.com/ChrisBeeley)
- [Claire Welsh](https://github.com/DCEW)
- [Francis Barton](https://github.com/francisbarton)
-- [Matt Dray](https://github.com/matt-dray)
- [Rhian Davies](https://github.com/statsRhian)
-- [Tom Jemmett](https://github.com/tomjemmett)
+- [Matt Dray](https://github.com/matt-dray)
- [YiWen Hon](https://github.com/yiwen-h)
+- [Tom Jemmett](https://github.com/tomjemmett)
- [Eirini Komninou](https://github.com/ai-mindset)
+- [Ozayr Mohammed](https://github.com/O-Mohammed)
- [Natasha Stephenson](https://github.com/Nat-Stephenson)
- [Zoë Turner](https://github.com/Lextuga007)
-The team has a wealth of experience in deploying models and other products to the cloud for use by a wide range of users across health and care. This is particularly demonstrated in our work with the New Hospital Programme, where we built and deployed a sophisticated probabilistic demand and capacity model and supported the use of its outputs across the decision-making stages that lead to the construction of a new hospital. The data science team possesses expertise across the breadth of data science activity — for example, statistics, machine learning, natural language processing, and real-time evidence mapping. We also have significant experience in sharing our methods and code as open-source, as well as in training others to use these tools and understand foundational data science concepts and practices. Our experience in developing scalable data science solutions, and open-sourcing them for the benefit of users across health and care, enables us to contribute meaningfully at every stage of a project’s life cycle — from design through to deployment and adoption.
+The team has a wealth of experience in deploying models and other products to
+the cloud for use by a wide range of users across health and care.
+This is particularly demonstrated in our work with the New Hospital Programme,
+where we built and deployed a sophisticated probabilistic demand and capacity
+model and supported the use of its outputs across the decision-making stages
+that lead to the construction of a new hospital.
+
+The data science team possesses expertise across the breadth of data science
+activity — for example, statistics, machine learning, natural language
+processing, and real-time evidence mapping.
+We also have significant experience in sharing our methods and code as
+open-source, as well as in training others to use these tools and understand
+foundational data science concepts and practices.
+
+Our experience in developing scalable data science solutions, and
+open-sourcing them for the benefit of users across the health and care sector,
+enables us to contribute meaningfully at every stage of a project’s life cycle —
+from design through to deployment and adoption.
+
+Current and past projects of note include:
+
+- Work supporting the New Hospitals Programme, including building a
+ [model for predicting demand and capacity requirements for hospitals][nhpinfo]
+ in the future, and [a tool for mapping the evidence on this topic][ev_maps].
+- The [Patient Experience Qualitative Data Categorisation project][peqdc]
+- Work supporting the wider analytical community, through events/communities
+ such as [NHS-OA][nhsoa] and [HACA][haca].
-Current and previous projects of note include:
-- Work supporting the New Hospitals Programme, including building [a model for predicting the demand and capacity requirements of hospitals in the future](https://connect.strategyunitwm.nhs.uk/nhp/project_information/), and [a tool for mapping the evidence on this topic](https://github.com/The-Strategy-Unit/nhp_evidence_maps).
-- The [Patient Experience Qualitative Data Categorisation project](https://the-strategy-unit.github.io/PatientExperience-QDC/)
-- Work supporting the wider analytical community, through events/communities such as [NHS-R](https://nhsrcommunity.com/) and [HACA](https://haca-conference.nhs.uk/).
+[nhpinfo]: https://connect.strategyunitwm.nhs.uk/nhp/project_information/
+[ev_maps]: https://github.com/The-Strategy-Unit/nhp_evidence_maps
+[peqdc]: https://the-strategy-unit.github.io/PatientExperience-QDC/
+[nhsoa]: https://nhsrcommunity.com/
+[haca]: https://haca-conference.nhs.uk/
diff --git a/blogs/posts/2025-12-31_getting-my-head-round-mocking/index.qmd b/blogs/posts/2025-12-31_getting-my-head-round-mocking/index.qmd
new file mode 100644
index 00000000..42b70e0f
--- /dev/null
+++ b/blogs/posts/2025-12-31_getting-my-head-round-mocking/index.qmd
@@ -0,0 +1,273 @@
+---
+title: Getting my head round 'mocking'
+author:
+ - name: Fran Barton
+ orcid: 0000-0002-5650-1176
+ email: francis.barton@nhs.net
+ affiliation:
+ - name: The Strategy Unit
+ url: https://strategyunitwm.nhs.uk/
+date: 2025-12-31
+categories: [testing, software development, mocking, learning, reflection]
+execute:
+ enabled: false
+---
+
+My neurons feel tangled.
+
+There's something about the concept of 'mocking' in software development that
+just isn't clicking with me.
+I've had very capable people generously explain it to me.
+And I've read the docs and watched the videos.
+
+No matter.
+It just doesn't seem to make sense to my brain in the right way.
+
+But I know that it can click - and I know that I have been here before.
+
+
+
+
+A frame from Taylor Swift's Anti-Hero video with the lyric "It's me, hi,
+I'm the problem, it's me"
+
+
+
+
+### Grasping functions
+
+When I was starting out learning R, I just wrote everything in long scripts,
+copying and pasting code as I went.
+
+I read more experienced analysts talking about the virtues of writing functions,
+but I didn't really know or understand what a function was, it just sounded like
+something _unnecessarily_ advanced and complicated.
+I had very humble needs, and my code worked for me, so why would I need to leap
+into the hyperspace of writing functions and even _developing a package_?
+All far too fussy and excessive.
+
+And yet.
+Eventually I got tired of copying and pasting and noticed - with a kind of
+igniting excitement in my mind - that the way to do the same thing multiple
+times with slightly different parameters was to _turn it into a function_.
+
+OK, fine.
+But now I have to understand how a function works.
+
+> "Which bits of information do I need to provide as arguments, and which things
+> can just be in the script?"
+
+> "What if I need to do something inside the function to the variable passed in
+> as `x`; should I still call it `x` or do I then need to call it something
+> else?"
+
+> "Does the argument `x` need to have the same name as the name of the variable
+> I am planning to pass in from my environment, for it to work?
+> Or is the opposite true - in order to avoid confusion, you should _never_ call
+> your function argument the same as the variable you're going to pass in?"
+
+Well, these are all very entertaining newbie questions.
+
+When you're conceptually out of your depth, you ask questions that don't make
+any sense!
+But at least you're asking questions.
+
+Once I understood what I was doing with functions, all of these conceptual
+confusions, about what I can now call the evaluation environment of the function
+and what it means to write a pure function and so on, seemed so misguided and
+the right way of thinking about things seemed so obvious.
+
+*** Once you know what you're doing, it can be hard to remember what was like
+not to know.
+
+But I have to remember - and I do remember, quite well! - that is doesn't feel
+like that when you're in the swamp of learning, and oscillating between
+conscious and unconscious incompetence (TODO add link to definition here?).
+
+When you're new, and overawed, and uncertain, and _learning_, you don't know if
+you're doing it right or wrong.
+
+(TODO maybe use callouts or quotes to highlight particular lines)
+
+You're often in the dark, and when things don't work as you expect, you don't
+know how to tell the difference between "I'm stupid and I am out of my depth and
+this is never going to work" and "I just need to change a little thing, I'm
+nearly there."
+
+It's so time-consuming!
+
+(And don't get me started on when I then started trying to learn how to use
+`purrr::map()` and friends!)
+
+From my position as a relatively experienced R coder and developer in 2025,
+I can look back on the most frustrating bits of my R learning journey with a
+mixture of fondness, empathy and horror.
+
+It all feels like so much water that I am glad is under the bridge.
+
+But, now, here I am again, trying to learn something new, and experiencing
+almost exactly the same kinds of brainaches as I did back in 2020.
+
+It's easy to feel anger generated by frustration, alongside my determination.
+
+> "**Why** does it do that?"
+> "**What** does that word mean?"
+> "**WHY** are people who write tutorials ever think it's OK to use the words
+ 'just' or 'simply'?"
+> "**Slow down!** I'm lost _already_"
+
+### Mocks
+
+Given all the above, I _know_ I can get my head around mocking.
+It is just going to take some time.
+And, at some point, a breakthrough moment when it finally clicks.
+
+There's something in my mind - my own conceptual model - of what we are doing
+with mocking that is constantly _wrong_.
+When I read a tutorial, the next line of code or the next sentence is never what
+I expect it to be, in the way that iot would be be if I were reading about a
+technique that better fits my mental model(s).
+
+It's nice when you read a line of code and can sense what its output is going to
+be.
+
+When I read about mocking, the output or the effect of the code I read is always
+different to what I thought they are doing or expected to see.
+That jolt of surprise comes with its own little emotional punch of confusion and
+inadequacy.
+
+There's an _opacity_ to it, for me currently.
+
+
+I wonder what learning techniques I can use to help me.
+
+* Writing things down?
+* Making notes on tutorials?
+* Looking for analogies?
+* Trying to reframe the language and core concepts of mocking into a different
+ metaphor that better conforms to the shape of my brain and the assumptions I
+ am bringing?
+* Just doing it myself repeatedly until it clicks?
+
+
+
+### Here's some resources
+
+
+### Here's what I currently know about mocking
+
+Off the top of my head:
+
+OK, so sometimes when you are testing a function that receives input from an
+external source, like data from an API, or a great big (or sensitive) dataset,
+it's not practical or ethical to run your tests for that function against the
+actual data.
+It's slow, and unreliable, or it involves revealing aspects of the dataset that
+are confidential, or it involves pinging an API endpoint over the internet that
+might not aalways be available.
+Or even if you can run those tests locally on your own machine, you can't
+share that data, or the access key/token for it, so anyone else that needs to
+test your package can't run your tests.
+And you can't then run automated tests via GitHub Actions, either.
+
+OK so that's one set of scenarios at last that explains the need to mock up the
+data or the response.
+
+_I get it:_ you just need to test what your functions do, not whether the
+external data source is present and functioning.
+That makes perfect sense.
+
+OK, so how do I replace the external data in my tests?
+
+Here's where I'm confused.
+
+If I have a function:
+
+```r
+double <- \(x) x * 2
+```
+
+and I do something like
+
+```r
+x <- 2
+
+local_mocked_bindings(
+ double = function(...) 4
+)
+```
+
+then I haven't tested if `double()` actually works, I've just stipulated that it
+does by providing what I expect its answer to be.
+
+Or at the other end of the spectrum, if I need to mock up a large data frame as
+a function input, do I need to just create a synthetic replacement out of my own
+imagination, or take the real data and kind of run some functions over it to
+mangle and obscure the values?
+
+Let's say I have a function:
+
+```r
+process_data <- function(dat) {
+ dat |>
+ dplyr::filter(.data$year == 2025) |>
+ dplyr::mutate(mean = mapply(mean, .data$value_x, .data%value_y))
+}
+```
+
+and then I want to do:
+
+```r
+test_that("process_data works as expected", {
+ actual <- process_data(my_secret_df)
+ expect_identical(nrow(actual), 1500)
+ expect_identical(ncol(actual), 8)
+ expect_true(!anyNA(actual$mean))
+})
+
+```
+
+I don't understand how to replace that with a mocked value.
+
+I think I have to do something like: ?
+
+```r
+test_that("process_data works as expected", {
+ # having created a snapshot (?) called my_fake_data that has the correct
+ # dimensions and characteristics that the "real" actual would have???
+ with_mocked_bindings(process_data(my_secret_dataframe), actual = my_fake_data)
+ expect_identical(nrow(actual), 1500)
+ expect_identical(ncol(actual), 8)
+ expect_true(!anyNA(actual$mean))
+})
+```
+
+Creating `my_fake_data` sounds like a right pain, and also a massive data
+protection risk if I don't adequately obscure the values in `my_secret_df`.
+
+I must be missing something.
+
+The other thing that is on my mind is: these aren't real tests.
+I'm just stipulating what the mocked value should be.
+I'm creating my own output and then saying I've tested the function.
+But I haven't really.
+So what's the point of having these tests?
+
+Sometimes an API might be inaccessible, or we might not be able to properly test
+our data pipeline, and that's just the way the world is.
+So it feels like people are using mocks just to pretend they are testing things
+in order to try to get to 100% test coverage, but actually it is all a facade.
+
+I must be missing something.
+
+### Here's what I've learnt since I started recording things in this document
+
+
+### Conclusions
+
+I expect that at some future time I will be able to look back at this blog post
+with that funny feeling of wincing and puzzlement - "How could I ever have not
+grasped this very simple concept?"
+
+the path from unknowing to knowing, from being all at sea to grasping how to
+use a set of tools.
diff --git a/blogs/posts/2025-12-31_getting-my-head-round-mocking/taylor_problem.jpg b/blogs/posts/2025-12-31_getting-my-head-round-mocking/taylor_problem.jpg
new file mode 100644
index 00000000..751594d5
Binary files /dev/null and b/blogs/posts/2025-12-31_getting-my-head-round-mocking/taylor_problem.jpg differ
diff --git a/blogs/posts/2026-04-16_lazy-data-reading-with-duckdb-or-polars/index.qmd b/blogs/posts/2026-04-16_lazy-data-reading-with-duckdb-or-polars/index.qmd
new file mode 100644
index 00000000..a9c39501
--- /dev/null
+++ b/blogs/posts/2026-04-16_lazy-data-reading-with-duckdb-or-polars/index.qmd
@@ -0,0 +1,38 @@
+---
+title: "Lazily working with large data on Azure storage, with DuckDB or polars"
+author:
+ - name: Fran Barton
+ orcid: 0000-0002-5650-1176
+ email: francis.barton@nhs.net
+ affiliation:
+ - name: The Strategy Unit
+ url: https://strategyunitwm.nhs.uk/
+date: today
+categories: [Python, learning, Azure, polars, howto]
+execute:
+ enabled: false
+---
+
+(#lede)
+(use callout?)
+
+I experimented with various Python tools to work with a large amount of data on
+Azure storage.
+Here's what I learned.
+
+
+### Intro
+
+So you've got a load of data on a storage server.
+And you need to do something with it.
+
+But you don't need - or want - to just download it all and do your thing
+on your laptop.
+There's GBs of it, for one thing, and for another it just feels _wrong_.
+
+And anyway, there are tools to help you work with data on the server, and those
+have been built for a reason, right?
+And it's just not good practice to be downloading data - which might contain
+or constitute sensitive information - onto your machine.
+
+We use the server for a reason.
diff --git a/index.qmd b/index.qmd
index f5bbf098..27026757 100644
--- a/index.qmd
+++ b/index.qmd
@@ -5,7 +5,8 @@ sidebar: false
This is the home of Data Science activities at [The Strategy Unit][su_web].
-Here, we host information about how we work, links to presentations, and blogposts relating to how we utilise data science tools.
+Here, we host information about how we work, links to presentations, and
+blog posts relating to how we utilise data science tools.
All members of the Strategy Unit are welcome to contribute.
diff --git a/presentations/2025-06-19_cartograms_intro/staffs_geo_545px.png.png b/presentations/2025-06-19_cartograms_intro/staffs_geo_545px.png.png
new file mode 100644
index 00000000..fb1d09fe
Binary files /dev/null and b/presentations/2025-06-19_cartograms_intro/staffs_geo_545px.png.png differ
diff --git a/presentations/2025-09-11_coffee-and-coding/pyproject.toml b/presentations/2025-09-11_coffee-and-coding/pyproject.toml
new file mode 100644
index 00000000..32db19db
--- /dev/null
+++ b/presentations/2025-09-11_coffee-and-coding/pyproject.toml
@@ -0,0 +1,17 @@
+[build-system]
+requires = ["setuptools>=80.9.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "pres"
+version = "0.1.0"
+description = "Dependencies for Efficient Coding Quarto presentation"
+requires-python = ">=3.12"
+dependencies = [
+ "jupyter>=1.0.0",
+ "numpy>=1.20.0",
+ "pandas>=1.3.0",
+ "numba>=0.56.0",
+ "jupyter-cache>=0.5.0",
+ "jupyterlab-rise",
+]
diff --git a/presentations/2025-09-11_coffee-and-coding/slides.qmd b/presentations/2025-09-11_coffee-and-coding/slides.qmd
new file mode 100644
index 00000000..46f11372
--- /dev/null
+++ b/presentations/2025-09-11_coffee-and-coding/slides.qmd
@@ -0,0 +1,439 @@
+---
+title: "Efficient Coding"
+subtitle: "Principles and Practices for Performant Code"
+author: "Eirini, Rhian & YiWen, DS @ SU"
+format:
+ revealjs:
+ theme: dark
+ code-fold: true
+ code-overflow: wrap
+ font-size: 0.7em
+jupyter: python3
+---
+
+## Agenda
+
+- **Measuring Performance**: Time and profile your code
+- **Common Performance Tweaks**: Easy wins for faster code
+- **Loops vs. Vectorisation vs. Functional**: Choose the right approach
+- **Optimising Loops**: When you must use them
+- **Beyond Basics**: Tools for serious optimisation
+
+## Measuring Performance: Timing
+
+```{python}
+#| echo: true
+from timeit import timeit # Python's precise timing module
+
+# Function to measure
+def my_function():
+ return sum(i**2 for i in range(1_000_000)) # Sum of squares
+
+# Time the function execution
+# number=100: run multiple times for statistical significance
+# globals=globals(): access functions defined in current scope
+execution_time = timeit('my_function()',
+ globals=globals(),
+ number=100)
+
+# Calculate and display average execution time per call
+print(f"Average execution time: {execution_time/1000:.6f}s")
+```
+
+## Measuring Performance: Profiling (1)
+
+```{python}
+#| echo: true
+import cProfile, pstats
+from io import StringIO
+
+def sum_of_squares(n):
+ return sum(i * i for i in range(n))
+
+# Create a StringIO object to capture output
+pr = cProfile.Profile()
+pr.enable()
+sum_of_squares(1_000_000)
+pr.disable()
+
+# Display results
+s = StringIO()
+ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
+ps.print_stats(5)
+print(s.getvalue())
+```
+
+## Measuring Performance: Profiling (2)
+
+**Understanding the output:**
+
+- `ncalls`: The function was called directly 1 time but expanded to 1M total calls
+- `tottime`: sum_of_squares consumed 0.000 seconds in its own code
+- `cumtime`: Total time including nested calls was 0.141 seconds
+- **Last column**: The bottleneck is split between `{built-in method builtins.sum}` and the generator expression (`i * i for i in range(n)`)
+
+**Key insight**: The overhead is split almost evenly between sum() and the generator expression when processing 1M items.
+
+## Performance Tweak: String Concatenation
+
+```{python}
+#| echo: true
+from timeit import timeit # Import timing functionality
+
+# Fast - O(n) complexity with clear intent
+def join_method():
+ return "".join(str(i) for i in range(100_000)) # Single efficient operation
+
+# Slow - O(n²) complexity
+def concat_method():
+ result = ""
+ for i in range(100_000):
+ result += str(i) # Creates new string each time
+ return result
+
+t1 = timeit(join_method, number=100) # Time the fast method
+t2 = timeit(concat_method, number=100) # Time the slow method
+print(f"join: {t1:.6f}s\n+=: {t2:.6f}s")
+print(f"Speedup: {t2/t1:.1f}x faster")
+```
+
+## Performance Tweak: Appropriate Data Structures
+
+```{python}
+#| echo: true
+from timeit import timeit # For timing the operations
+import random # To select a random lookup value
+
+# Setup test data
+data = list(range(100_000)) # A list with 10,000 items
+lookup_val = random.choice(data) # Random value to find
+lookup_set = set(data) # Same data as a set for O(1) lookup
+
+# Time list lookup (O(n) - must check each element)
+t1 = timeit(lambda: lookup_val in data, number=100)
+
+# Time set lookup (O(1) - constant time hash table)
+t2 = timeit(lambda: lookup_val in lookup_set, number=100)
+
+print(f"List lookup: {t1:.6f}s\nSet lookup: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Performance Tweak: Pre-allocating Arrays
+
+```{python}
+#| echo: true
+from timeit import timeit # For timing operations
+import numpy as np # For pre-allocated arrays
+
+size = 100_000 # Number of elements to process
+
+# Growing a list dynamically (expensive resizing operations)
+def growing_list():
+ result = [] # Empty list that will grow
+ for i in range(size):
+ result.append(i*2) # Potentially triggers resize
+ return result
+
+# Pre-allocated array (fixed size from beginning)
+def preallocated_array():
+ result = np.zeros(size, dtype=int) # Create array of right size first
+ for i in range(size):
+ result[i] = i*2 # No resizing needed
+ return result
+
+t1 = timeit(growing_list, number=100)
+t2 = timeit(preallocated_array, number=100)
+print(f"Growing list: {t1:.6f}s\nPre-allocated: {t2:.6f}s")
+print(f"Speedup: {t2/t1:.1f}x faster")
+```
+
+## Loops vs. Vectorisation vs. Functional
+
+| Approach | Best For | Example Use Case |
+|---|---|---|
+| **Loops** | Complex logic, small data | Custom algorithms |
+| **Vectorisation** | Numerical operations | Data science, numpy |
+| **Functional** | Data transformations | Pipelines, map/reduce |
+
+## Loops
+
+```{python}
+#| echo: true
+from timeit import timeit # For timing code execution
+
+size = 100_000 # Number of elements to process (reduced for demo)
+
+# Standard loop approach (explicit iteration)
+def standard_loop():
+ result = []
+ for i in range(size):
+ result.append(i * 2 + 5) # Each operation done sequentially
+ return result
+
+def list_comprehension():
+ return [i * 2 + 5 for i in range(size)]
+
+t1 = timeit(standard_loop, number=100)
+t2 = timeit(list_comprehension, number=100)
+
+print(f"Standard loop: {t1:.6f}s\nList comprehension: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Vectorisation with NumPy
+
+```{python}
+#| echo: true
+from timeit import timeit # For timing code execution
+import numpy as np # NumPy for vectorised operations
+
+size = 100_000 # Number of elements to process (reduced for demo)
+
+# Pure Python approach (loops through each element)
+def python_way():
+ return [i * 2 + 5 for i in range(size)]
+
+# NumPy vectorised approach (operates on entire array at once)
+def numpy_way():
+ return np.arange(size) * 2 + 5 # Uses C implementation
+
+# Compare execution times
+t1 = timeit(python_way, number=100)
+t2 = timeit(numpy_way, number=100)
+
+print(f"Python: {t1:.6f}s\nNumPy: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Vectorisation with Pandas
+
+```{python}
+#| echo: true
+import pandas as pd # For DataFrame operations
+import numpy as np # For random data generation
+from timeit import timeit # For timing operations
+
+# Create sample dataframe with 10,000 random values (reduced for demo)
+df = pd.DataFrame({'value': np.random.rand(100)})
+
+# Slow: Using apply (runs Python function on each row)
+def apply_method():
+ return df['value'].apply(lambda x: x * 2 + 5)
+
+# Fast: Vectorised operations (C implementation)
+def vector_method():
+ return df['value'] * 2 + 5
+
+# Compare execution times
+t1 = timeit(apply_method, number=100)
+t2 = timeit(vector_method, number=100)
+
+print(f"apply: {t1:.6f}s\nvectorised: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Functional Programming
+
+```{python}
+#| echo: true
+from timeit import timeit
+
+# Compare map vs list comprehension
+t1 = timeit(lambda: list(map(lambda x: x**2, range(100_000))), number=100)
+t2 = timeit(lambda: [x**2 for x in range(100_000)], number=100)
+
+print(f"map: {t1:.6f}s\ncomprehension: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Speed Comparison Summary
+
+```{python}
+#| echo: false
+import pandas as pd
+
+# Create a summary table of typical performance results
+data = {
+ 'Approach': ['Standard loop', 'List comprehension', 'Python list ops', 'NumPy vectorised',
+ 'Pandas apply', 'Pandas vectorised', 'Map function', 'List comprehension'],
+ 'Category': ['Loops', 'Loops', 'Vectorisation', 'Vectorisation',
+ 'Vectorisation', 'Vectorisation', 'Functional', 'Functional'],
+ 'Typical Time (sec)': [0.049893, 0.027239, 0.054816, 0.002517, 0.104524, 0.001358, 0.040885, 0.025535]
+}
+
+df = pd.DataFrame(data)
+print(df.to_string(index=False))
+```
+
+## When to Use Each Approach
+
+- **Vectorisation**: Large numerical datasets (NumPy/Pandas)
+- **List Comprehensions**: Simple transformations on sequences
+- **Generators**: Large datasets, memory efficiency
+- **Functional**: Complex pipelines, data transformations
+- **Loops**: Complex logic, small datasets, or when readability matters
+
+## Loop Optimisation Techniques
+
+```{python}
+#| echo: true
+from timeit import timeit # For timing execution
+import math # For sqrt function
+
+# Create test data - objects with a 'value' attribute
+data = [type('obj', (), {'value': i}) for i in range(100_000)]
+
+# Regular approach: recalculating len() and looking up math.sqrt each time
+def regular_loop():
+ total = 0
+ for i in range(len(data)): # len() called with every iteration
+ total += math.sqrt(data[i].value) # Function lookup each time
+ return total
+
+# Optimised approach: pre-compute length and bind function locally
+def optimised_loop():
+ total = 0
+ n = len(data) # Calculate length once
+ sqrt = math.sqrt # Local reference to function
+ for i in range(n):
+ total += sqrt(data[i].value)
+ return total
+
+t1 = timeit(regular_loop, number=100)
+t2 = timeit(optimised_loop, number=100)
+print(f"Regular: {t1:.6f}s\nOptimised: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Best Practices Summary
+
+1. **Measure first** - profile before optimising
+2. **Use appropriate data structures** for the task
+3. **Vectorise numerical operations** when possible
+4. **Avoid premature optimisation** - readable code first
+5. **Know when to use loops, comprehensions, or functional styles**
+
+# Appendix: Beyond Basics
+
+## Just-In-Time Compilation (1)
+
+```{python}
+#| echo: true
+from numba import jit
+import numpy as np
+from timeit import timeit
+
+def slow_func(x):
+ total = 0
+ for i in range(len(x)):
+ total += np.sin(x[i]) * np.cos(x[i])
+ return total
+
+@jit(nopython=True)
+def fast_func(x):
+ total = 0
+ for i in range(len(x)):
+ total += np.sin(x[i]) * np.cos(x[i])
+ return total
+
+x = np.random.random(10_000)
+t1 = timeit(lambda: slow_func(x), number=100)
+t2 = timeit(lambda: fast_func(x), number=100)
+print(f"Python: {t1:.6f}s\nNumba: {t2:.6f}s")
+print(f"Speedup: {t1/t2:.1f}x faster")
+```
+
+## Just-In-Time Compilation (2)
+
+**What is JIT?**
+
+> [JIT (Just-In-Time) compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation) translates code into machine code at runtime to improve execution speed. This approach can improve performance by optimising the execution of frequently run code segments.
+
+**Key Benefits:**
+- Can provide 10-100x speed-ups for numerical code
+- Works especially well with NumPy operations
+- Requires minimal code changes (just add decorators)
+
+## Cython - Basic Example
+
+**Pure Python version (slow.py):**
+```python
+def calculate_sum(n):
+ """Sum the squares from 0 to n-1"""
+ total = 0
+ for i in range(n):
+ total += i * i
+ return total
+```
+
+**Cython version (fast.pyx):**
+```python
+def calculate_sum_cy(int n):
+ """Same function with static typing"""
+ cdef int i, total = 0 # Static type declarations
+ for i in range(n):
+ total += i * i
+ return total
+```
+
+**Result**: Typically 20-100x faster performance
+
+## Cython - Best Practices
+
+**Key techniques for maximum performance:**
+
+```python
+# 1. Declare types for all variables
+cdef:
+ int i, n = 10_000 # Integer variables
+ double x = 0.5 # Floating point
+ int* ptr # C pointer
+
+# 2. Use typed memoryviews for arrays (faster than NumPy)
+def process(double[:] arr): # Works with any array-like object
+ cdef int i
+ for i in range(arr.shape[0]):
+ arr[i] = arr[i] * 2 # Direct memory access
+
+# 3. Move Python operations outside loops
+cdef double total = 0
+py_func = some_python_function # Store reference outside loop
+for i in range(n):
+ total += c_only_operations(i)
+
+# 4. Use nogil for parallel execution with OpenMP
+cpdef process_parallel(double[:] data) nogil: # No Python GIL
+ # Can now use OpenMP for parallelism
+```
+
+## Compiling Cython Code
+
+**Option 1: Using setuptools (recommended for projects)**
+```python
+# Create setup.py in your project directory:
+from setuptools import setup, Extension
+from Cython.Build import cythonize
+
+setup(
+ ext_modules = cythonize([
+ Extension("fast", ["fast.pyx"]),
+ ])
+)
+
+# Then compile: python setup.py build_ext --inplace
+```
+
+**Option 2: Quick development with pyximport**
+```python
+import pyximport
+pyximport.install() # Automatically compiles .pyx files
+import fast # Will compile fast.pyx on first import
+```
+
+**Option 3: Direct compilation**
+```bash
+cython -a fast.pyx # Generates fast.c and HTML report
+gcc -shared -fPIC -o fast.so fast.c \
+ $(python3-config --includes) $(python3-config --ldflags)
+```
diff --git a/presentations/2025-11-13_error_messages_with_cli/data_science.code-workspace b/presentations/2025-11-13_error_messages_with_cli/data_science.code-workspace
new file mode 100644
index 00000000..32a5083a
--- /dev/null
+++ b/presentations/2025-11-13_error_messages_with_cli/data_science.code-workspace
@@ -0,0 +1,42 @@
+{
+ "folders": [
+ {
+ "path": "../.."
+ }
+ ],
+ "settings": {
+ "projectColors.mainColor": "#b7b927",
+ "window.title": "SU data_science repo",
+ "workbench.colorCustomizations": {
+ "editorRuler.foreground": "#ff4081",
+ "statusBarItem.warningBackground": "#b7b927",
+ "statusBarItem.warningForeground": "#000000",
+ "statusBarItem.warningHoverBackground": "#b7b927",
+ "statusBarItem.warningHoverForeground": "#00000090",
+ "statusBarItem.remoteBackground": "#c4c634",
+ "statusBarItem.remoteForeground": "#000000",
+ "statusBarItem.remoteHoverBackground": "#d1d341",
+ "statusBarItem.remoteHoverForeground": "#00000090",
+ "statusBar.background": "#b7b927",
+ "statusBar.foreground": "#000000",
+ "statusBar.border": "#b7b927",
+ "statusBar.debuggingBackground": "#b7b927",
+ "statusBar.debuggingForeground": "#000000",
+ "statusBar.debuggingBorder": "#b7b927",
+ "statusBar.noFolderBackground": "#b7b927",
+ "statusBar.noFolderForeground": "#000000",
+ "statusBar.noFolderBorder": "#b7b927",
+ "statusBar.prominentBackground": "#b7b927",
+ "statusBar.prominentForeground": "#000000",
+ "statusBar.prominentHoverBackground": "#b7b927",
+ "statusBar.prominentHoverForeground": "#00000090"
+ },
+ "projectColors.name": "SU data_science repo",
+ "projectColors.isActivityBarColored": false,
+ "projectColors.isTitleBarColored": false,
+ "projectColors.isStatusBarColored": true,
+ "projectColors.isProjectNameColored": true,
+ "projectColors.isActiveItemsColored": false,
+ "projectColors.setWindowTitle": true
+ }
+}
diff --git a/presentations/2025-11-13_error_messages_with_cli/index.qmd b/presentations/2025-11-13_error_messages_with_cli/index.qmd
index 8ca0d5f7..d64570fe 100644
--- a/presentations/2025-11-13_error_messages_with_cli/index.qmd
+++ b/presentations/2025-11-13_error_messages_with_cli/index.qmd
@@ -16,7 +16,7 @@ format:
view slides at [the-strategy-unit.github.io/data_science/presentations][ds_presentations]
---
-[ds_presentations]: https://tinyurl.com/zhuzhing-custom-error-messages/
+[ds_presentations]: https://tinyurl.com/zhuzhing-error-messages
## Error messages can be cool 😎🤔
@@ -192,8 +192,8 @@ check_string(NA)
::: {.notes}
You can use cli for more than just error messages!
-Can be helpful for providing semantically-formatted and nicely-presented
-info messages to package users.
+Can be helpful for providing semantically-formatted and nicely-presented info
+messages to package users.
:::
@@ -265,12 +265,16 @@ than (or as well as, if you insist) the specific text of the error message.
## Summary
+::: {.notes}
+
+
+:::
+
* Take control of the error messages your users might see
-* Let your users know you've thought about them?
-* Help yourself have a tiny bit more fun writing input checking code??
+* Let your users know you've thought about them (?)
* Help yourself with writing tests and debugging your code
## Thank you
-and have fun!
+### ...and have fun! {.r-fit-text .center}
diff --git a/presentations/2026-03-18_digital_exclusion/data_sshot.png b/presentations/2026-03-18_digital_exclusion/data_sshot.png
new file mode 100644
index 00000000..1ce95f32
Binary files /dev/null and b/presentations/2026-03-18_digital_exclusion/data_sshot.png differ
diff --git a/presentations/2026-03-18_digital_exclusion/map-1.png b/presentations/2026-03-18_digital_exclusion/map-1.png
new file mode 100644
index 00000000..b9aba4e2
Binary files /dev/null and b/presentations/2026-03-18_digital_exclusion/map-1.png differ