Skip to main content
how AI sees images and reads sequences — two tools, one post
  1. posts/

how AI sees images and reads sequences — two tools, one post

·600 words·3 mins·
Author
Virtue of Vague
Table of Contents
AI Series · post 9 of 12 series index →

how AI sees images and reads sequences — two tools, one post
#

not all data looks the same. images are grids. logs are sequences. deep learning has a different tool for each.

feed an image into a standard neural network — it loses all spatial information. feed a log sequence into one — it loses all temporal context. wrong tool, wrong result.

two architectures were built to solve this. CNNs for grids. RNNs for sequences.


CNNs — convolutional neural networks
#

built for grid-like data. images, screenshots, binary visualisations.

three layer types working together:

  • convolutional layers — slide a small filter across the input. detect local patterns. edges, textures, shapes. each filter produces a feature map highlighting where that pattern appears.
  • pooling layers — shrink the feature maps. keep the important signal, reduce noise and computation.
  • fully connected layers — take the extracted features, make the final classification decision.

hierarchical learning:

  • early layers detect edges and textures
  • middle layers detect shapes and patterns
  • deeper layers detect complex structures and objects

security application — malware visualisation. convert a binary file to a grayscale image. different malware families produce visually distinct patterns. CNN classifies the family from the image. fast, effective, no code execution required.


RNNs — recurrent neural networks
#

built for sequential data. text, logs, network traffic, time series.

standard networks process inputs independently. RNNs maintain a hidden state — memory of previous inputs. each step considers current input and what came before.

security application — log sequence analysis. user behaviour over time. network connection patterns. the sequence matters as much as the individual event.


the vanishing gradient problem
#

standard RNNs struggle with long sequences. gradients shrink as they travel back through time steps. early inputs stop influencing the model. long term dependencies get lost.

two solutions were built:

LSTM — long short term memory
#

three gates control information flow:

  • input gate — what new information to store
  • forget gate — what old information to discard
  • output gate — what to pass to the next step

memory cell persists important context across long sequences. good for long documents, extended user sessions, prolonged attack sequences.

GRU — gated recurrent unit
#

simpler version of LSTM. two gates instead of three. faster to train. comparable performance in most tasks.

  • update gate — how much previous state to keep
  • reset gate — how much previous state to combine with current input

two tools for two types of dataCNNfor grids and imagesfilterconv → pool → classifymalware binary → imageCNN classifies the familyno code execution requiredRNN / LSTMfor sequences and logst=1logint=2accesst=3exfil?hidden state (memory)user behaviour over timesequence matters as muchas the individual eventwrong tool, wrong result. CNNs for grids. RNNs for sequences.

why this matters in security
#

CNNs power malware image classification and screenshot analysis tools. RNNs and LSTMs power user behaviour analytics, log anomaly detection, and sequence based threat detection.

when your UBA tool flags a behavioural sequence — there’s likely an LSTM underneath tracking that session over time.


next up — generative AI and LLMs. the AI everyone is talking about. let’s open the hood.

which feels more relevant to your current SOC environment — image based analysis or sequence based detection?

took ai help to clean up typos. my brain works faster than my fingers. xd


next up: AI Series #10 — “the AI everyone is talking about — how does it actually work” back to series index