Introduction to the Course

This course teaches machine learning concepts and techniques through practical tutorials and plain-English explanation, culminating in projects focused on cyber security. Projects such as: email malware collection, facial recognition for OSINT and threat prediction and others. Much of the course content is freely accessible, learners can support content development and gain access to screencast walkthroughs, PDF workbooks and references, Q&A and more through Patreon.

.

Beta diagram of an email malware collection system we build eventually (coming soon).

Designed to Reduce the Learning Curve

The course approaches machine learning in a reader-friendly manner, designed to reduce the learning curve - you don’t need to be good at math to learn here. We consider the stages of a machine learning project: designing a system, data gathering, choosing an algorithm, training a model, evaluating and tuning the model to obtain the best performance. The aim of this is to help learners into a project-oriented state of mind as you learn to improve the chances of success beyond the course. Additionally, the course considers how people learn and implements multimodal learning through a mixture of text, graphics, video, code and Q&A, designed to help students learn efficiently. As time goes by, quizzes, live walkthroughs, more videos and among other additions will be available.

If you are new to programming, python or security, this is not a barrier which should stop you. I encourage you to try and experiment. The course provides working code and practical workflows - you don’t need to be able to program to copy. However, the greater your understanding of code, the greater your ability to move beyond the examples and apply the ideas to your projects. A page in this introduction section is dedicated to a short Python Refresher if you need it.

Why “course” and not “book” or “site”? I want the learner to actively engage in learning, as you would in a course taught in a classroom. Here, unlike a book, you have access to a tutor (for lack of a better word) - I encourage you to send me any questions you have. Learning materials such as video walkthroughs, downloadable workshops and Q&A are available to Patreon supporters, in addition to votes on content and the access to the community. Once Security Kiwi becomes sustainable, I plan to add live walkthroughs, animation and more (keep reading).

Let Me Introduce Myself

Hi, I’m Kris. I was fortunate to study computer science and cyber security at undergraduate and graduate level respectively. Both of my dissertations explored machine learning; one seeking to use an academic natural language processing (NLP) algorithm in the then new TensorFlow; and the other, on the feasibility of using indicators of compromise (IoC) for projecting attacker behaviour within networks. I am not an expert on machine learning, however, I can share what I have learned over my years of education and tinkering.

I taught myself machine learning starting with my undergraduate project. I found the information in the public domain was limited; it had a steep learning curve and much of the information was confined to the academic journals only accessible to students and researchers. While some of this has changed - high-quality books now exist - much of the innovation in machine learning relating to cyber security is still confined to academic journals (and private companies) and barriers around ease of understanding still exist (i.e lots of math). This course is derived from academic research, books, my education and my own learning with the aim of being as easy to understand for those with limited math skill. The result is a course anyone can follow and self-teach machine learning for security from scratch.

What’s in the Course?

The structure of the course follows three conceptual areas:

  • Foundations, we learn foundational theory and implement tutorials to understand,
  • Practical projects we create interesting security-related projects, such as email malware collection systems, intrusion detection systems and more to come,
  • Resources, information to help you to move beyond the course content and provide detailed information on specific algorithms, datasets and subjects such as how to conduct research.

As time goes by the content will be further refined and more intermediate and advanced material will be added. New content is added twice per month, on the first and third thursday of the month. For the month of January (2021) new content will be added weekly.

The course is freely accessible, I don’t like the idea of putting heafty financial barriers in front of knowledge. The course is an on-going effort, made financially viable through Patreon. In return readers gain access to material to increase the efficiency of learning; screencast walkthroughs, downloadable workshops, Q&A etc. Eventually, live walkthroughs and more. Typically, programming books cost $40 USD and video courses cost from $90 USD onwards. To obtain all of the materials for this course on Patreon it costs $10 /mo. Downloadable content is DRM free, so you can keep it if you decide to stop supporting. For supporters who don’t want video or downloads, just to say thanks for the free content, gain access to the Q&A and a few patreon-only perks the first tier is $7 /mo.

Course Content

Below is an overview of the course content by page/section title. New content is added once every two weeks; major updates on the first and minor updates third Thursday of the month. See the Road Map for detail on things to come.

Current content:

  • Introduction
    • Introduction to the Course
    • What is Machine Learning?
    • Types of Machine Learning
    • Machine Learning Project Stages
    • Machine Learning in Security
    • How to Approach Learning
    • Python Refresher
    • Environment Setup
  • Datasets & Data Collection
    • Considering Data Collection
    • Collecting Data
    • Existing Datasets & Data Sources
    • Challenges with Datasets
    • Exploring Datasets
    • Dataset preparation
  • Training Models
    • Introduction to Training
    • Training Regression Models
    • Training Neural Networks
  • Algorithms & Techniques
    • Introduction to Algorithms and Techniques
  • Open Datasets & Analysis
    • Introduction to Open Datasets and Analysis
  • Resources
    • Glossary

Content coming at the next update 28th January:

  • Training Models (Continued)
    • Neural Network Activation Functions
  • Algorithms & Techniques
    • Convolutional Neural Networks

The Future of the Course

As times goes by more types of content will be available, and previous content will be upgraded where necessary (e.g better video, animation).

Everyone:

  • Animation - commission animation to explain concepts visually.
  • Interactive visuals - interactive JavaScript visuals to explain and show how different value changes effect aspects under discussion.
  • Topic quizes - sections will end with quizes so learners can see which areas you need to work on.
  • Live streamed tutorials - Live streamed walkthroughs of tutorials.
  • Cool swag development - not random merch. Unique, well-done AI/ML designs on things which further support content development. Plus AI/ML themed stickers. Hackers like stickers.

Patreon members only:

  • More downloadable material - More PDFs expanding on content to help learners actively engage and references on specific issues providing easy access to solutions.
  • More screencast walkthroughs - video walkthroughs for code examples and technical concepts.
  • Live walkthroughs - Live streamed walkthroughs of concepts, projects, tutorials etc.
  • Higher production quality videos - Outsource video editing.
  • Monthly book give-aways - books on machine learning, artificial intelligence, python, programming (e.g. Superintelligence, Learn Python 3 the Hard Way)

Design, Conventions and Elements

A great deal of effort has gone into the creation of the course from a design point of view, both in terms architecture of the course and the aesthetic of the website. Both of these elements alter how effective the learning experience is.

The architecture of the course draws on research to consider how humans learn to ensure learners gain as much as possible as they learn. We start with plain English, I attempt to use simple English, without ‘dumbing down’, to reduce fatigue and make the content friendly to non-native English speakers. The concept of spaced retention is used to refer back to foundational knowledge as learners progress.

Good design aids multimodal learning - that is the combination of a number of learning ‘styles’. Text is augmented by graphics, videos and code. I intend to add live video walkthroughs, quizzes, animation and other helpful elements over time. Colour-blind users benefit from a research-derived colour palette. I intend to further improve accessibility, using CSS and text rather than images where possible, and to make all pages available as audio recordings, for example. Please get in touch with accessibility suggestions and feedback.

  • Light blue underlined words are hyperlinks.
  • White itallics are used for emphasis, to show new words, or the description of an image.
  • Green-coloured itallics are used for emphasis.
  • Yellow dotted underlined words are hyperlinks to entries in the glossary (coming soon).
  • Dark grey block backgrounds contain code (shown below).
for i in code:
    show_an_example
    return show_an_example

Email Updates

Sign up for email updates on new content. No spam, 1 email per month. You can find me @krisbolton.

Feedback

If you have feedback, questions or find a typo please reach out at securitykiwi [ at ] protonmail [dot] com. Seriously, if you think a section isn’t clear or have constructive feedback, drop me an email.


Share a page with a friend!


What is Machine Learning?