\( \newcommand\D{\mathrm{d}} \newcommand\E{\mathrm{e}} \newcommand\I{\mathrm{i}} \newcommand\bigOh{\mathcal{O}} \newcommand{\cat}[1]{\mathbf{#1}} \newcommand\curl{\vec{\nabla}\times} \newcommand{\CC}{\mathbb{C}} \newcommand{\NN}{\mathbb{N}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \)
UP | HOME

C Programming Language

Table of Contents

1. Overview

C is arguably the lingua franca of programming.

The "C Abstract Machine" is the hypothetical computer which is described by the C standard (C17 5.1.2.3 "Program execution"), which states:

The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

This abstract machine may, or may not, have any relation to actual hardware.

The basic model is that there are two memory locations:

  • The stack stores local variables and is cleaned up when the function call terminates
  • The heap stores malloc() allocated results, must be freed, and is accessed by pointers. Loosely, this intuitively corresponds to RAM.

2. Dereferencing NULL Pointer

This leads to "undefined behaviour" according to the C Standard. This means it depends on the CPU and other sordid details.

2.1. On x86 Family Architecture

But that doesn't stop us from writing some code like:

#include <stdlib.h>

int main() {
    int *x = NULL;
    *x = 5;
    return 0;
}

We then check this compiles (using GCC 8.3.0) to:

; gcc -fverbose-asm -S null.c -o null.s -O1
main:
    subq    $40, %rsp    ;,
    .seh_stackalloc 40
    .seh_endprologue
 ; null.c:3: int main() {
    call    __main   ;
 ; null.c:5:     *x = 5;
    movl    $5, 0    ;, MEM[(int *)0B]
 ; null.c:7: }
    movl    $0, %eax     ;,
    addq    $40, %rsp    ;,
    ret     

The key line of code is the movl $5, 0 which tries to store in the address at 0 the literal value 5.

  • The pointers refer to a "virtual address", which is translated into "physical addresses" via the operating system (in most modern operating systems); this is done through paging
  • When the translation fails, the CPU raises a page fault exception
    • This triggers a transition from "user mode" to a specific location in the OS kernel's code, as defined by the interrupt descriptor table
  • The Operating System kernel regains control and must determine what to do based on the information from the exception and the process's page table.

Last Updated 2023-09-02 Sat 07:34.