date: 2024-11-11

Understanding the Lisp Syntax

TL;DR Lisp syntax is the simplified format of XML, JSON, representing tree-like data in textual form.

List of Things

In set notation,

{1, 2, 3}

In English,

one two three

In JSON,

[1, 2, 3]

Approximately in XML

<set>
    <item> 1 </item>
    <item> 2 </item>
    <item> 3 </item>
</set>

Powerset: Lists of list

In set notation,

{{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}

In JSON,

[[], [1], [2], [3], [1, 2], [1, 3], [2, 3], [1, 2, 3]]

Approximately in XML

<set>
    <set></set>
    <set>
        <item> 1 </item>
    </set>
    <set>
        <item> 2 </item>
    </set>
    <set>
        <item> 3 </item>
    </set>
    <set>
        <item> 1 </item>
        <item> 2 </item>
    </set>
    <set>
        <item> 1 </item>
        <item> 3 </item>
    </set>
    <set>
        <item> 2 </item>
        <item> 3 </item>
    </set>
    <set>
        <item> 1 </item>
        <item> 2 </item>
        <item> 3 </item>
    </set>
</set>

A Simplification

Say we decide that use of comma is too verbose and use space to seperate things, then we get

[[] [1] [2] [3] [1 2] [1 3] [2 3] [1 2 3]]

Since English words are also separated by space

[Since English words are also seperated by space]

I shall refer the list space separated as symbolic expression or s-expression.

Parse Tree

Adapted from https://en.wikipedia.org/wiki/Parse_tree

Given the English sentence "Reimu hit the ball" and suppose that it can have the parse tree as follows

<Sentence> Reimu hit the ball </Sentence>
<Sentence>
    <Noun> Reimu </Noun>
    <Verb_Phase>
    <Verb> hit </Verb>
    <Noun_Phase>
        <Determiner> the </Determiner>
        <Noun> ball </Noun>
    </Noun_Phase>
    </Verb_Phase>
</Sentence>

In JSON

{
    "original sentence": "Reimu hit the ball",
    "sentence": {
        "Noun": "Reimu",
        "Verb Phase": {
            "Verb": "hit",
            "Noun Phase": {
                "Determiner": "the",
                "Noun": "ball"
            }
        }
    }
}

In YAML

original sentence:  Reimu hit the ball
sentence:
    Noun: Reimu
    Verb Phase:
        Verb: hit
        Noun Phase:
            Determiner: the
            Noun: ball

Equivalent s-expression

(Reimu hit the ball)
(Sentence
    (Noun Reimu)
    (Verb-Phase
        (Verb hit)
        (Noun-Phase
            (Determiner the)
            (Noun ball))))

Program in YAML

Below is a python program calculating fibonacci number

def fib(n):
    if n <= 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

Equivalent scheme

(define (fib n)
  (if (<= n 1)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

Say we reprogram them in YAML and JSON for fun

In YAML

define:
    name: fib
    params:
        - n
body:
    if:
        le:
            left: n
            right: 1
    consequent: n
    alternative:
        add:
            left:
                call:
                    operator: fib
                    operand:
                        sub:
                            left: n
                            right: 1
            right:
                call:
                    operator: fib
                    operand:
                        sub:
                            left: n
                            right: 2

Equivalently in JSON

{
    "define": {
        "name": "fib",
        "params": [
            "n"
        ]
    },
    "body": {
        "if": {
            "le": {
                "left": "n",
                "right": 1
            }
        },
        "consequent": "n",
        "alternative": {
            "add": {
                "left": {
                    "call": {
                        "operator": "fib",
                        "operand": {
                            "sub": {
                                "left": "n",
                                "right": 1
                            }
                        }
                    }
                },
                "right": {
                    "call": {
                        "operator": "fib",
                        "operand": {
                            "sub": {
                                "left": "n",
                                "right": 2
                            }
                        }
                    }
                }
            }
        }
    }
}

Or YAML without using associative array

- define
- - fib
  - n
- - if
  - - "<="
    - n
    - 1
  - n
  - - "+"
    - - fib
      - "-"
      - n
      - 1
    - - fib
      - "-"
      - n
      - 2

Observation about YAML is that it uses identation to represent nesting structure. JSON and s-experssion use brackets to represent nesting whereas their identation is merely for visual clue. The consequence of using identation is that we cannot freely reorder the program.

For example,

(define (fib n)
  (if (<= n 1)
      n
      (+ (fib (- n 1))
         (fib (- n 2)))))
{ "define": { "name": "fib", "params": ["n"]},
  "body": {
      "if": { "le": { "left": "n", "right": 1}},
      "consequent": "n",
      "alternative": {
          "add": { "left": { "call": { "operator": "fib",
                                       "operand": { "sub": { "left": "n", "right": 1 }}}},
                   "right": { "call": { "operator": "fib",
                                        "operand": { "sub": { "left": "n", "right": 2}}}}}}}}

These are same as previous when parsed but YAML cannot do that. Though we may write JSON in YAML to reorder the things.

Unfortunately, expressions or programs themselves are inherently nested, tree-like and recursive. Using identation to represent nesting would require a lot of columns.

One Problem

Lisp has its root from artifical intelligence to analyze natural languages. The problem is that the sentence length is arbitrary.

There are at least 2 solution: linked list and dynamic array. However, the solution at the time is to use pair represent sentence. Though I could not figure why pair is chosen rather than dynamic array. Even in the paper "The History of Lisp", John Mc Carthy say list structure is apporiate representing sentence but without explaining further why

Here are few reasons I thought of

Pair is a data structure stored only two things. We can represent pair in textual form using dot notation as follows

(first . second)

To represent list, we use pair and nil

(since . (english . (words . (are . (also . (seperate . (by . (space . NIL))))))))

And we decide below is a sugar syntax of above

(since english words are also seperate by space)

Extend from that, this is how we represent empty list

()

But we do not have this in dot notation. Or is it? We may reuse the empty list to represent nil rather than using additional symbol nil.

(since . (english . (words . (are . (also . (seperate . (by . (space . ()))))))))

In reality, John McCarthy can't reallly remember why adopt parenthesized list notation as external form of LISP data.

Another question arises is why designing external or textual form for list structure at the first place?

Imagine that we are using a stack machine to construct the list structure (0 1 2 3) without having textual form

push 0
push 1
push 2
push 3
push NIL
pair
pair
pair
pair

It is hard to visualize the list structure by looking at the program alone. Secondly, it is not convenient when have to write the construction sequence for every list structure. Implement a parser that take external form and construct list interanlly is better, hence the design of textual form.

Accidental Conveniency

So LISP system include a reader that parse list external form construct list internally. To complement it, LISP system include a printer that print internal list to external form on screen.

Rather than designing new syntax for primitive operations, just reuse existing reader

For example

(cons 3 3)       ; construct pair
(car (cons 3 3)) ; 3
(cdr (cons 3 3)) ; 3
(+ 9 16)         ; 25
(quote (+ (* 3 3) (* 4 4)))  ; not to calculate (+ (* 3 3) (* 4 4)), rather just construct it

This suggests another perspective of reading Lisp program

The Lisp programs are list structure data

For example,

(+ 9 16)

It is same as

(cons + (cons 9 (cons 16 '())))

Or it is a list of 3 things which are + symbol, 9 and 6 numbers.

To make it more concrete,

(car (quote (+ 9 16)))       ; +
(cdr (quote (+ 9 16)))       ; (9 16)
(car (cdr (quote (+ 9 16)))) ; 9
xs = ['+', 9, 16]
print(xs[0]) # +
print(xs[1]) # 9

In other word, rather than reading Lisp programs as they are streams of words like other languages, we can read Lisp program like reading YAML and JSON.

Take a further step, we can even imagine Lisp system indeed operates on list structure that is same as the input program.

For example,

(define (eval x)
  (cond
   ((number? x) x)
   ((eq? (car x) (quote +))
    (+ (eval (car (cdr x))) (eval (car (cdr (cdr x))))))
   ((eq? (car x) (quote *))
    (* (eval (car (cdr x))) (eval (car (cdr (cdr x))))))))

(eval (quote (+ (* 3 3) (* 4 4))))

Reference