Featured image of post CS 146

CS 146

Algorithm Design and Data Abstraction

This course is the advanced follow-up to CS 145, taught by Brad Lushman. Similar to his CS 246E style: dense material, challenging assignments, but incredibly rewarding. The course takes you on a journey from high-level functional programming all the way down to machine code, teaching you how computers actually execute programs.

Course Overview

The central theme is imperative programming and side effects — understanding how programs “do things” that change the state of the world. The course bridges functional programming (from CS 145) with systems-level concepts, covering everything from Racket mutation to assembly language.

Part I: Impurity in Racket

The course begins by introducing side effects to the pure functional world of Racket:

  • Mutation fundamentals: set!, set-box!, mutable structs
  • I/O modeling: Output as growing character sequences, input as consumption from a stream
  • Boxes and pointers: Understanding indirect references and aliasing
  • Vectors: Fixed-size, O(1) access data structures

The parallel track introduces C programming:

  • Variables, expressions, statements, functions
  • Pointers (*, &), pointer arithmetic
  • Arrays vs pointers (the confusing rule: array name = pointer to first element)
  • Memory layout: code, static, heap, stack
  • malloc/free and manual memory management
  • Linked lists, hash tables implementation
  • Common pitfalls: dangling pointers, memory leaks, buffer overflows

Key insight: Racket structs automatically box their fields (pointers), while C requires explicit pointer management.

Part II: SIMPL

Build your own imperative language! SIMPL (Simple Imperative Language) has:

  • Variable declarations, assignment, sequencing
  • Conditionals (iif), loops (while), print statements
  • Arithmetic and boolean expressions

Interpreter in Haskell:

  • AST data types for expressions and statements
  • State as a map from variables to values
  • Adding output via the Mio monad pattern
  • Introduction to >>= (bind) and the IO monad

Hoare Logic for program correctness:

  • Triples {P} statement {Q} — preconditions and postconditions
  • Rules for assignment, sequencing, conditionals, loops
  • Loop invariants for proving while correctness

Part III: PRIMPL

“SIMPL is all lies” — now we see what programs really look like at a lower level.

PRIMPL (Primitive Imperative Language):

  • Programs as vectors of instructions, not trees
  • Program Counter (PC) tracks execution position
  • No named variables — only memory locations
  • Single operations: add, sub, mul, move, jump, branch
  • Operands: immediate values 5 vs memory references (5)

A-PRIMPL (Assembly PRIMPL):

  • Labels, constants, data declarations
  • Pseudo-instructions for readability
  • Two-pass assembler: build symbol table, then generate code

Compiling SIMPL to A-PRIMPL:

  • Stack-based temporary storage with stack pointer (SP)
  • Translating iif and while into jumps and branches
  • Adding arrays with bounds checking
  • Function calls: jsr instruction, stack frames, frame pointer (FP)
  • Tail call optimization

Part IV: MMIX

Donald Knuth’s educational 64-bit RISC architecture:

  • Registers: 256 general-purpose ($0-$255), 32 special-purpose
  • Memory: 64-bit words, big-endian, byte-addressable
  • 2’s complement for signed integers
  • Instruction encoding: 32-bit instructions (opcode + operands)

Key instructions:

  • Arithmetic: ADD, SUB, MUL, DIV (remainder in rR)
  • Comparison: CMP (returns -1, 0, or 1)
  • Branches: BN, BZ, BP, BNN, BNZ, BNP
  • Memory: LDO/STO (load/store octabyte)
  • Control: JMP, GO (for subroutine calls)

Practical concerns:

  • Register allocation and spilling to RAM
  • Memory access is slow — registers are fast
  • Overflow handling strategies

Part V: Implementing Functional Languages

How does Racket actually work on a real machine?

Data representation:

  • Runtime environment maps identifiers to heap locations
  • All data tagged with type information
  • Cons cells: 3 words (type, first pointer, rest pointer)
  • Boxes: 2 words (type, pointer to value)

Shared structure:

  • Immutable data allows safe sharing (e.g., append shares the second list)
  • BST insertion shares unchanged branches
  • Only observable through mutation

Closures: Functions capture their environment — can’t use a simple stack model

Garbage collection: Enables safe sharing without manual deallocation

Part VI: Control

Making interpreters efficient and tail-recursive:

Zippers:

  • Data structure for efficient traversal: (current focus, context)
  • Move forward/backward in O(1)
  • For trees: context records descent path and siblings

Continuations:

  • Zipper for AST = continuation (“what to do next”)
  • Refactoring interpreter to be tail-recursive
  • interp descends, applyCont ascends
  • Trampolining: bounce function pointers back to a loop

Part VII: Continuations

First-class continuations as a programming construct:

Continuation Passing Style (CPS):

  • Pass “the rest of the computation” as an extra parameter
  • Every function call becomes tail-recursive

call/cc (call-with-current-continuation):

  • Capture current continuation as a first-class value
  • (let/cc k exp) — Racket’s friendlier form
  • Invoking k discards remaining computation (non-local exit)

C equivalent: setjmp/longjmp for saving/restoring execution context

Additional C Topics

  • Strings: Null-terminated char arrays, strlen, strcpy, strcmp
  • Buffer safety: strncpy, avoiding scanf("%s", ...)
  • Heterogeneous data: void *, unions, enums for type tags
  • 2D arrays: Row-major order, spatial locality matters
  • Variadic functions: va_list, va_arg for variable arguments

Key Takeaways

After this course, you understand:

  1. Why C has pointers and manual memory management
  2. How garbage collection works
  3. What compilers actually do (high-level → assembly → machine code)
  4. The fundamental difference between functional and imperative paradigms
  5. How to reason about program correctness with Hoare logic
  6. The elegance of continuations for control flow

Recommendations

  • Take CS 241E instead of CS 241 — this course covers similar compiler content
  • The assignments are essential for understanding — don’t skip them
  • Draw memory diagrams to understand pointers and aliasing
  • Hand-write the interpreters to truly grasp the concepts
Licensed under CC BY-NC-SA 4.0
Notes taking with heart
Built with Hugo
Theme Stack designed by Jimmy