This document describes the design and implementation of the Rugo programming language β a Ruby-inspired language that compiles to native binaries via Go.
Rugo's compilation pipeline transforms .rugo source files into native binaries through a series of well-defined stages:
.rugo source
β
βΌ
Strip comments
β
βΌ
Preprocess (desugar, shell fallback, paren-free calls)
β
βΌ
Parse (LL(1) grammar β flat AST)
β
βΌ
Walk (flat AST β typed AST nodes)
β
βΌ
Resolve (imports & requires)
β
βΌ
Semantic checks (validate before codegen)
ββ UndefinedIdentCheck β catch undefined variables and functions
β
βΌ
Transform chain (immutable AST rewrites)
ββ ConcurrencyLowering β desugar spawn/parallel/try into lowered nodes
ββ ImplicitReturnLowering β convert last-expr-as-return into explicit nodes
β
βΌ
Type inference (fixed-point analysis)
β
βΌ
Code generation (AST β Go AST β Go source)
β
βΌ
go build β native binary
The compiler is orchestrated by compiler.Compiler, which chains these stages together. The run, build, and emit CLI subcommands each exercise different parts of this pipeline.
After resolving imports and requires, the AST passes through a chain of semantic checks (ast/check.go). Checks implement the Check interface and are composed via CheckChain, which runs them in order and stops at the first error. Unlike transforms, checks validate the AST without modifying it.
UndefinedIdentCheck (compiler/check_idents.go): Catches undefined variable and function references before code generation. It uses a two-pass approach: first collecting all globally visible names (top-level assignments, function definitions, use/import/require namespaces, builtins), then walking the AST with a scope stack to verify that every IdentExpr resolves to a known binding. For namespaced calls (ns.func()), it validates that the function exists in the require namespace, stdlib module, or Go bridge package. Local variables shadow namespaces, matching codegen behavior.
After semantic checks, the AST passes through a chain of immutable transforms (ast/transform.go). Transforms implement the Transform interface and are composed via Chain(), which runs them left-to-right. Each transform receives the output of the previous one and must not mutate its input β a copy-on-write helper (mapSlice) only allocates new slices when children actually change.
ConcurrencyLowering (ast/lower.go): Replaces high-level concurrency constructs (SpawnExpr, ParallelExpr, TryExpr) with lowered equivalents (LoweredSpawnExpr, LoweredParallelExpr, LoweredTryExpr) that carry pre-processed information β for example, extracting the last expression in a spawn body into a dedicated ResultExpr field, or pre-categorizing parallel branches as expression vs. statement. This pass also rewrites return statements inside spawn blocks and try handlers into SpawnReturnStmt and TryHandlerReturnStmt respectively.
ImplicitReturnLowering (ast/implicit_return.go): Converts last-expression-as-return-value patterns into explicit AST nodes. A trailing ExprStmt in a FuncDef or FnExpr body becomes an ImplicitReturnStmt; in a try handler it becomes a TryResultStmt. When the last statement is an IfStmt or CaseStmt, the transform recurses into each branch.
The Factory (ast/factory.go) centralizes AST node construction for transform passes, ensuring consistent creation and providing a hook point for future enhancements.
After transforms, compiler.Infer() (compiler/infer.go) runs a fixed-point type inference pass (up to 10 rounds). It walks all expressions and statements to resolve variable and function return types. Anything that can't be proven typed remains TypeDynamic (interface{}). The resulting TypeInfo feeds codegen, allowing it to emit unboxed Go types where possible instead of wrapping everything in interface{}.
During compilation, Rugo creates a temporary directory under ~/.cache/rugo/build/ to hold the generated Go source and go.mod before invoking go build. Each build gets its own uniquely-named subdirectory (rugo-*), which is automatically removed after compilation completes.
Rugo is dynamically typed. All values at runtime are Go interface{}. The generated Go code uses a small set of runtime helper functions (rugo_to_bool, rugo_to_int, rugo_to_float, rugo_to_string) to coerce values at the boundaries where Go requires concrete types.
Supported value types:
| Rugo type | Go representation |
|---|---|
| Integer | int |
| Float | float64 |
| String | string |
| Bool | bool |
| Nil | nil |
| Array | []interface{} |
| Hash | map[interface{}]interface{} |
Rugo follows Ruby-like truthiness rules: nil and false are falsy, everything else (including 0 and "") is truthy. This is enforced by the rugo_to_bool runtime function, which is used in all conditional contexts (if, while, &&, ||).
Arithmetic and comparison operators are dispatched dynamically through runtime helpers:
- Arithmetic:
+(rugo_add),-(rugo_sub),*(rugo_mul),/(rugo_div),%(rugo_mod) - Comparison:
==,!=,<,>,<=,>=(all viarugo_compare) - Logical:
&&,||(short-circuit, return values like Ruby β not booleans) - Unary:
-(rugo_negate),!(rugo_not)
The + operator supports string concatenation: when the left operand is a string, the right operand is automatically coerced to string. The one exception is nil, which raises an error rather than coercing to "nil" (use string interpolation #{x} if you want a possibly-nil value to render as "nil").
Logical operator semantics (Ruby-like):
a || bβ returnsaifais truthy, otherwise returnsba && bβ returnsaifais falsy, otherwise returnsb
This enables the common default-value idiom:
name = input || "default"
config = load_config() || {}Gotcha: Don't use &&/|| for flow control with void-returning functions like puts. Since puts returns nil, the || branch always fires:
# BAD β prints both "yes" AND "no" when x is truthy
x && puts("yes") || puts("no")
# GOOD β use if/else instead
if x
puts "yes"
else
puts "no"
endComparison semantics:
- Equality (
==,!=): Numeric coercion applies β1 == 1.0istrue. Non-numeric types use strict equality. - Ordering (
<,>,<=,>=): Supports both numeric and string operands. Strings are compared lexicographically. Comparing incompatible types (e.g., string vs int) panics.
Variables are implicitly declared on first assignment. The codegen tracks declared variables per scope and emits := for first assignment and = for subsequent ones. There are no explicit type annotations.
x = 42 # declares x
x = x + 1 # reassigns xCompound assignment operators (+=, -=, *=, /=, %=) are preprocessor sugar:
x += 1 # desugared to: x = x + 1
arr[0] += 5 # desugared to: arr[0] = arr[0] + 5Bare append is also preprocessor sugar β the assignment is implicit:
append fruits, "date" # desugared to: fruits = append(fruits, "date")Array destructuring unpacks an array into multiple variables:
a, b, c = [10, 20, 30] # desugared to: __destr__ = [10, 20, 30]; a = __destr__[0]; ...This is preprocessor sugar. The right-hand side must be a single expression returning an array. Works with Go bridge multi-return functions:
import "strings"
before, after, found = strings.cut("key=value", "=")Identifiers starting with an uppercase letter are constants (Ruby convention). They can be assigned once but never reassigned β attempting to do so is a compile-time error.
PI = 3.14 # constant (uppercase)
MAX_RETRIES = 5 # constant
name = "mutable" # variable (lowercase) β can be reassigned
PI = 99 # compile error: cannot reassign constant PIConstants are scoped: a constant defined inside a function is independent from one with the same name in another function or at the top level.
MAX = 100 # top-level constant
def limit()
MAX = 50 # separate constant, local to this function
return MAX
endHash and array bindings declared as constants protect the binding (you can't point the name at a different value) but their contents can still be mutated:
Config = {"host" => "localhost"}
Config["port"] = 8080 # OK β mutates contents, not the binding
Config = {} # compile error β reassigns the bindingDifferent blocks create different scoping boundaries:
| Block | Own scope? | Sees outer vars? | Vars leak out? |
|---|---|---|---|
| Top-level | Yes (root) | β | β |
def function |
Yes | Yes (read-only) | No |
fn lambda |
Yes | Yes (captures outer) | No |
if/elsif/else |
No (transparent) | Yes | Yes |
case/of (statement) |
No (transparent) | Yes | Yes |
case/of (expression) |
Yes (IIFE) | Yes | No |
while loop |
Yes | Yes (read + modify) | No |
for..in loop |
Yes | Yes (read + modify) | No |
spawn block |
Yes | Yes (shared) | No |
rats block |
Yes | No (isolated) | No |
Functions can read top-level variables but assigning inside a function creates a local shadow β the top-level value is not modified. Top-level variables referenced by def functions are promoted to package-level declarations so they are accessible. This is a key difference from lambdas, which capture the surrounding scope by reference.
rats blocks are fully isolated β they cannot see any top-level variables or constants. Use environment variables to share state between setup hooks and test blocks.
if blocks are transparent β they share the parent scope. Variables created inside an if block are accessible after the block ends. Statement-form case/of blocks have the same transparent scoping. However, when case is used as an expression (assigned to a variable), it compiles to an IIFE with its own scope β variables assigned inside branches do not leak out.
Loops create their own scope β while and for loops can read and modify outer variables, but variables first assigned inside the loop body are local to that iteration scope. The for loop variable is also local.
Lambdas capture outer scope β they can read and modify variables from the enclosing scope. Variables assigned inside the lambda don't leak out.
Control flow uses Ruby-style end-delimited blocks:
if condition
# body
elsif other_condition
# body
else
# body
end
while condition
# body
end
for item in collection
# body β item is value for arrays, key for hashes
end
for key, value in hash
# body
end
for index, value in array
# body
end
# Integer ranges
for i in 10 # i = 0, 1, ..., 9
end
for i in range(5, 10) # i = 5, 6, ..., 9
endbreak and next are supported inside loops, compiling directly to Go break and continue.
A statement can be conditionally executed using postfix if (Ruby-style statement modifier):
puts "big" if x > 10
x = 42 if ready
greet "world" if name != nilThis is preprocessor sugar β STMT if COND is rewritten to if COND\n STMT\nend. It only applies when if appears mid-line (not at the start), outside strings and brackets.
The case/of/elsif/else/end construct provides multi-branch matching against a subject expression (similar to switch in other languages, or case in Ruby and Nim):
case status
of "ok"
puts "all good"
of "error", "fail"
puts "something went wrong"
else
puts "unknown"
endSemantics:
- The subject expression is evaluated once into a temporary variable.
- Each
ofbranch compares the temp using==. Multiple comma-separated values are OR'd together. - Optional
elsifbranches provide boolean conditions (not compared to the subject). elseis a catch-all default.- No match and no
elseevaluates tonil. - No fallthrough β the first matching branch wins.
ofbranches must come beforeelsif;elsemust be last.
Arrow form β for single-expression branches, use ->:
case status
of "ok" -> "all good"
of "error", "fail" -> "something went wrong"
else -> "unknown"
endArrow form takes a single expression (not an assignment). Both forms can be mixed:
case code
of 200 -> "success"
of 404
log("not found")
"not found"
else -> "other"
endCase as expression β case can be used anywhere an expression is expected, including assignment position and function arguments. Each branch's last expression becomes the result:
# Assignment position
label = case status
of "ok" -> "success"
of "error" -> "failure"
else -> "unknown"
end
# Multi-line branches work too
message = case code
of 200
puts("ok")
"all good"
of 404
puts("missing")
"not found"
else -> "other"
endWhen used as an expression (e.g., assigned to a variable), case compiles to a Go IIFE (immediately-invoked function expression) with a named return variable. Variables assigned inside expression branches are local to the IIFE and do not leak to the parent scope β unlike statement-form case, which has transparent scoping.
Implicit return β inside functions, a trailing case expression is implicitly returned:
def grade(letter)
case letter
of "A" -> "excellent"
of "B" -> "good"
of "C" -> "average"
else -> "unknown"
end
endElsif integration β boolean conditions can follow of branches for Nim-style flexibility:
case score
of 100 -> "perfect"
of 0 -> "zero"
elsif score >= 90
"A"
elsif score >= 80
"B"
else
"C"
endScoping β statement-form case blocks are transparent, like if. Variables assigned inside branches leak to the parent scope. Expression-form case (assigned to a variable) uses an IIFE, so branch variables are local.
Codegen note: Statement-form case compiles to a Go if/else chain (not a Go switch). The subject is stored in a temp variable (__case_N). Each of becomes rugo_to_bool(rugo_eq(__case_N, value)) conditions OR'd together. Expression-form case compiles to a Go IIFE with a named return (r interface{}) β each branch assigns its result to r.
Functions are defined with def/end and always return interface{} in the generated Go. The last expression in a function body is implicitly returned (like lambdas). Use explicit return for early exits:
def greet(name)
puts "Hello, #{name}!"
end
def add(a, b)
a + b
end
def classify(x)
if x > 10
return "big"
end
"small"
endFor functions with no parameters, the parentheses are optional:
def say_hello
puts "Hello!"
endParameters can have default values using = expr syntax. Parameters with defaults must come after all required parameters. When a caller omits trailing arguments, the defaults are evaluated at call time:
def greet(name, greeting = "Hello")
puts "#{greeting}, #{name}!"
end
greet("Alice") # Hello, Alice!
greet("Alice", "Hey") # Hey, Alice!Multiple defaults are allowed, and any expression (including nil, booleans, arithmetic) can be used as a default:
def connect(host, port = 8080, tls = true)
# port defaults to 8080, tls defaults to true
end
connect("example.com") # port=8080, tls=true
connect("example.com", 443) # port=443, tls=true
connect("example.com", 443, false) # port=443, tls=falseA function with all-optional parameters can be called with zero arguments:
def label(text = "default", color = nil)
# ...
end
label() # both default
label("hello") # color defaults to nil
label("hello", "red") # no defaults usedCodegen note: Functions with default parameters compile to a variadic Go signature (_args ...interface{}). A preamble unpacks arguments and fills defaults for any omitted parameters. Functions without defaults are unchanged. Arity is checked as a range: min_required..max_total. Required parameters after a default parameter is a compile error.
Functions are hoisted to the Go package level during codegen. Inside function bodies, all function names are visible (forward references work). At the top level, function names are only recognized after their def line (positional resolution).
Function parameters and return types can carry optional type annotations using the form name : type (note the space before : β it is required so the preprocessor's hash-colon sugar doesn't rewrite the line) and : type after the parameter list:
def add(a : Integer, b : Integer) : Integer
return a + b
end
def greet(name : String) : String
return "hello, " + name
end
# Mix annotated and unannotated freely
def scale(factor : Float, x)
return factor * x
end
# Return-only annotation
def label(x) : String
return "value: " + x
endAnnotations are optional everywhere β adding them is purely additive and never required. Lambdas use the same syntax:
square = fn(n : Integer) : Integer
return n * n
endThe recognised type names mirror exactly what type_of() returns at runtime β annotations and runtime types share one vocabulary:
| Annotation | Meaning |
|---|---|
Integer |
64-bit integer |
Float |
64-bit float |
String |
Go string |
Bool |
Go bool |
Array |
Rugo array ([]interface{}) |
Hash |
Rugo hash (map[interface{}]interface{}) |
Nil |
Always nil |
Any |
Explicit dynamic (interface{}) |
Annotations are case-sensitive. Unknown names produce a compile-time error pointing at the offending position. The v0.29.0/v0.29.1 lowercase forms (int, float, β¦) are no longer accepted β the compiler suggests the canonical capitalised name with a "did you mean Integer?" hint to make migration painless.
Annotations have five effects:
- Compile-time validation. The annotation name must be recognised; misspellings (
integer,Boolean) and the legacy lowercase forms (int,bool, β¦) fail at compile time with a targeted hint. - Seeded type inference.
Infer()plants the annotated types intoFuncTypeInfo.ParamTypes/ReturnTypebefore walking the body. The inferrer treats annotated params as ground truth and will not widen them tointerface{}if a later assignment is dynamic. Annotated returns are not overwritten by the inferred return type. - Typed Go signatures. When the annotated type is a primitive (
Integer,Float,String,Bool), codegen emits a typed Go signature (func rugofn_add(a int, b int) int) instead of the defaultfunc(... interface{}) interface{}. The return path inserts arugo_to_*coercion if the body produced a dynamic value, so calls into runtime helpers (e.g.math.sqrt) still work without manual casts. - Body/annotation mismatch detection. After inference, the compiler walks every annotated function body and flags two patterns the inferrer can prove are wrong: reassigning an annotated parameter to a value of a concretely conflicting type (e.g.
a = "hello"insidedef f(a : Integer)) and returning a value whose inferred type conflicts with the annotated return type. Errors point at the rugo source line with a structured message instead of a Go-level compiler error. Assignment is strict (the generated Go has a concrete variable, no coercion at the reassignment site); returns are permissive in the numeric family and forString/Bool/Any(the codegen inserts coercion on the return path). Pass--no-inferto skip the check. - Call-site and return-site flow validation. Beyond literal arguments, the compiler also flags variable arguments and variable return values when the inferrer can prove a concrete type conflict at that program point.
- Literal arguments. When a literal (number, string, bool, nil, array, hash, or
-N/!bover a literal) is passed to an annotated parameter with a concretely-conflicting type, the call is rejected (e.g.f("oops")wherefisdef f(a : Integer)). - Variable arguments (Tier 3 flow-sensitive). Each identifier read site carries a flow-sensitive per-use type (
TypeInfo.VarUseTypes) that reflects the variable's type at exactly that program point β not the conservative storage union of every value it ever held. Sequential reassignments narrow the per-use type (y = "h"; y = 42; f(y)passes because at the call siteyis provablyInteger); union outcomes fromifwithoutelse, loops that may not run, orcasewithoutelsekeep the union and produce a precise error message. - Variable returns (Tier 3 flow-sensitive). Symmetrically,
return xis checked against the flow-sensitive type ofxat the return site, not the storage union. Returning a variable that was reassigned back to a compatible type is permitted even when its history includes incompatible values. - fn lambda call sites (Tier 4 flow-sensitive). When an annotated
fnlambda is bound to a variable (f = fn(n : Integer) ... end), every call through that variable (f(...)) is checked against the lambda's annotated parameters using the same compatibility rules as adefcall. The binding is tracked flow-sensitively: aliasing (g = f) propagates the signature, reassigning to a different annotated lambda uses the new signature, reassigning to a non-fn value clears the binding, and any merge across branches with different bindings drops the binding. Higher-order use (passing a lambda to another function, storing it in an array/hash, returning it from a function) is silent β Tier 4 only fires for direct identifier-named calls of variables that hold an annotated lambda in the current scope. - The compatibility rule at call sites and for parameter defaults is strict-with-numeric-carve-out: same-type matches, plus the numeric family (
Integer,Float,Bool) flows freely between numeric annotations because codegen insertsrugo_to_int/rugo_to_floatwrappers at the call boundary.String,Bool,Array,Hash, andNilannotations only accept their own type (this matches the strict variable-annotation rule βx : String = 42andf(x : String); f(42)both error).Anyaccepts anything. Module-stylens.f(...)calls are skipped. Dynamic or unresolved expressions are silent β annotations stay user assertions where inference cannot decide. Pass--no-inferto disable. - Returns remain permissive (numeric family mutually compatible,
String/Bool/Anyaccept anything) because codegen inserts coercion on the return path.
- Literal arguments. When a literal (number, string, bool, nil, array, hash, or
Local variables can also carry type annotations on their first assignment using the same name : type = expr shape:
x : Integer = 42
name : String = "world"
items : Array = [1, 2, 3]Variable annotations are sticky: once x is bound as Integer, every later assignment to x in the same scope is checked against that annotation. Reassigning to a concretely-conflicting type fails at compile time:
x : Integer = 42
x = "oops" # compile error: cannot assign String value to variable 'x' declared as IntegerRe-annotating the same name in the same scope (x : Integer = ...; x : Integer = ...) is also rejected. The annotation lives until the enclosing function or block returns, so two functions can each declare their own x : T independently.
Use : Any to opt out of the check while keeping the annotation as documentation:
x : Any = 0
x = "h" # allowed
x = [1, 2] # allowedCoverage is reported by rugo emit --stats:
Params: 13 typed: 10 (76.9%) dynamic: 3 annotated: 9 (69.2%)
Returns: 7 typed: 4 (57.1%) dynamic: 3 annotated: 5 (71.4%)
The typed column reflects what inference resolved (with or without annotations); annotated counts only positions where the user wrote an explicit annotation.
Limitations and caveats:
- Local variable annotations apply only to first assignment (
x : T = expr). There is no syntax for re-annotating, nor for annotating index assignments (a[i] : T = ...) or dot assignments (o.f : T = ...). - Functions with default parameter values compile to a variadic shape; on such functions the annotations act as documentation only, since the runtime signature is dynamic.
Array,Hash,Nil,Anyare accepted but do not produce typed Go signatures (the corresponding runtime shapes are alreadyinterface{}-typed).- Mismatch detection only fires when the inferrer can prove a conflict (the value's type is concrete and not in the compatibility set). Dynamic / unknown values are silent β annotations stay user assertions where inference cannot decide.
- A space is required before
:(x : Integer, notx:Integer) because the preprocessor would otherwise interpretx:as the start of a hash literal.
Rugo supports anonymous functions (lambdas) using fn(params) body end syntax. Lambdas are first-class values β they can be stored in variables, passed as arguments, returned from functions, and stored in data structures.
# Basic lambda
double = fn(x) x * 2 end
puts double(5) # 10
# Multi-line lambda
classify = fn(x)
if x > 0
return "positive"
end
"non-positive"
end
# Pass lambda to function
def my_map(f, arr)
result = []
for item in arr
result = append(result, f(item))
end
return result
end
my_map(fn(x) x * 2 end, [1, 2, 3])
# Return lambda from function (closure)
def make_adder(n)
return fn(x) x + n end
end
add5 = make_adder(5)
puts add5(10) # 15
# Lambdas in data structures
ops = {"add" => fn(a, b) a + b end}
puts ops["add"](2, 3) # 5Lambdas compile to Go variadic anonymous functions: func(_args ...interface{}) interface{} { ... }. Parameters are unpacked from the variadic args. The last expression in a lambda body is implicitly returned. Closures capture variables by reference, so mutations to captured variables are visible outside the lambda.
Lambdas also support default parameter values, with the same semantics as def functions:
transform = fn(x, factor = 2) x * factor end
puts transform(5) # 10
puts transform(5, 3) # 15When a variable holding a lambda is called, the codegen emits a runtime type assertion: variable.(func(...interface{}) interface{})(args...). Calling a non-function variable produces a friendly compile error: cannot call x β not a function.
Lambdas stored as hash values can be called via dot access, just like index access:
ops = {
add: fn(a, b) a + b end,
mul: fn(a, b) a * b end
}
puts ops["add"](2, 3) # 5 (index access)
puts ops.add(2, 3) # 5 (dot access)At runtime, rugo_dot_call looks up the key in the hash, type-asserts the value to a callable lambda, and invokes it. If the key doesn't exist or the value isn't a function, a friendly error is produced.
The do...end syntax provides a concise way to pass a no-argument lambda as the last argument to a function call. It is preprocessor sugar:
# These are equivalent:
vbox(fn()
label("Hello")
end)
vbox do
label("Hello")
endThe preprocessor rewrites CALL do BODY end to CALL(fn() BODY end) (or appends fn() as the last argument if the call already has arguments):
# Bare call
vbox do ... end # β vbox(fn() ... end)
# Call with existing args
button("Click") do ... end # β button("Click", fn() ... end)
# Paren-free with args
styled "bold" do ... end # β styled("bold", fn() ... end)
# Assignment
result = make do ... end # β result = make(fn() ... end)Nesting works naturally β each end matches its closest do:
outer do
inner("hello") do
puts "deep"
end
endKey rules:
domust appear at the end of a line, separated from the preceding expression by whitespace.doinside strings (e.g.,"I do this") is not affected.do...endblocks always create a parameterlessfn(). For lambdas that need parameters, usefn(params) ... enddirectly.dois a reserved keyword β it cannot be used as a variable or function name.
Rugo provides three levels of error handling via try/or:
# Level 1: Silent recovery (returns nil on failure)
result = try some_expression
# Level 2: Default value on failure
result = try some_expression or "default"
# Level 3: Handler block with error variable
result = try some_expression or err
puts "caught: " + err
"fallback"
endUnder the hood, try compiles to a Go IIFE (immediately invoked function expression) with defer/recover. The error is caught by Go's panic/recover mechanism, and the error message is made available as a string in the handler block.
One of Rugo's distinctive features is shell fallback: unknown identifiers at the top level are treated as shell commands rather than producing compile errors.
ls -la # runs as: sh -c "ls -la"
echo "hello" # runs as: sh -c "echo hello"
uname -a # runs as: sh -c "uname -a"The preprocessor rewrites these to __shell__("...") calls, which the codegen translates to exec.Command("sh", "-c", ...). Shell commands inherit stdin/stdout/stderr from the parent process. Non-zero exit codes cause a panic with rugoShellError.
Backtick expressions capture command output instead of printing it:
name = `whoami` # captures output, strips trailing newlineThese are rewritten to __capture__("...") calls. String interpolation works inside backticks:
name = "world"
greeting = `echo hello #{name}` # captures "hello world"The pipe operator | connects expressions left-to-right, passing the output of the left side to the right side:
- Shell command on left β stdout is captured (like backticks)
- Function/expression on left β return value is used
- Function on right β piped value becomes the first argument
- Shell command on right β piped value is fed to stdin
# Shell output β function
echo "hello world" | puts # puts receives "hello world"
# Chaining: shell β module β builtin
echo "hello" | str.upper | puts # prints "HELLO"
# Expression β function
len("hello") | puts # prints 5
# Value β shell stdin β function
"hello" | tr a-z A-Z | puts # prints "HELLO"
# Assignment with pipe
name = echo "rugo" | str.upper # name = "RUGO"
# Piped value prepended before existing args
echo "world" | puts "hello" # prints "world hello"Key rules:
- When all segments are shell commands (e.g.
ls | grep foo), the line is left as a native shell pipe β backward compatible. - Only when at least one segment is a Rugo construct (builtin, user function, module function, or expression) does pipe expansion activate.
- The
||logical OR operator is never confused with the pipe|. - Pipes inside strings (
"a | b") are not expanded. - The pipe passes return values, not stdout output.
putsandprintreturnnil, so using them as a non-final segment in a pipe chain is a compile-time error:
ls | puts | head # β compile error β puts returns nil, breaks the chain
ls | head | puts # β puts at the end, receives head's captured outputThe preprocessor rewrites pipe expressions before parsing. For example, echo "hello" | str.upper | puts becomes puts(str.upper(__capture__("echo \"hello\""))).
String interpolation uses #{expr} syntax inside double-quoted strings:
name = "World"
puts "Hello, #{name}!"
puts "1 + 2 = #{1 + 2}"The preprocessor handles the #{...} extraction, and the codegen compiles interpolated strings to fmt.Sprintf calls. Interpolated expressions are fully parsed through the Rugo parser to support arbitrary expressions.
Limitation: Nested double quotes inside interpolation are not supported. Use a variable instead:
# This will NOT work:
# puts "#{h["foo"]}"
# Use a variable instead:
x = h["foo"]
puts "#{x}"Single-quoted strings are raw literals where no escape processing or interpolation happens (like Ruby's single-quoted strings):
puts 'hello\nworld' # prints: hello\nworld (literal backslash-n)
puts '\x1b[32mgreen' # prints: \x1b[32mgreen (no ANSI processing)
puts 'no #{interpolation}' # prints: no #{interpolation} (no interpolation)Only two escape sequences are recognized in raw strings: \\ (literal backslash) and \' (literal single quote). All other backslash sequences are kept as-is.
Raw strings are parsed by a separate raw_str_lit lexer rule in the grammar and produce StringLiteral nodes with Raw: true. The codegen emits these strings directly to Go string literals with appropriate escaping, bypassing the interpolation pipeline.
The preprocessor (ast/preprocess.go) runs before parsing and performs line-level source transformations. It operates in multiple passes:
Desugars +=, -=, *=, /=, %= for both simple variables and index targets:
x += 1 β x = x + 1
arr[0] -= 3 β arr[0] = arr[0] - 3
Desugars bare append statements into explicit assignments. Only applies when
append( starts the line and the first argument is a valid assignment target:
append(arr, val) β arr = append(arr, val)
This pass runs after paren-free call expansion, so append arr, val is first
converted to append(arr, val), then desugared to arr = append(arr, val).
Converts backtick expressions to capture calls:
`hostname` β __capture__("hostname")
Expands single-line try forms into multi-line block form that the parser understands:
# try EXPR or DEFAULT expands to:
try
EXPR
or _err
DEFAULT
end
# try EXPR (no or) expands to:
try
EXPR
or _err
nil
end
This expansion also tracks a line map so error messages reference the original source line.
Each line is classified and transformed:
- Pipe expansion β lines with top-level
|(not||) are split into segments. If at least one segment is a Rugo construct (function/builtin/dotted ident/expression), the pipe is expanded into nested calls. All-shell pipes are left for the shell to handle natively. - Keywords (
if,def,while, etc.) β left untouched. - Assignments (
x = ...) β left untouched. - Parenthesized calls (
func(...)) β left untouched. - Known function, paren-free (
puts "hi") β rewritten toputs("hi"). - Unknown identifier β rewritten to shell fallback:
__shell__("...").
Function name resolution is positional at the top level: a def must appear before its paren-free usage. Inside function bodies, all function names are visible (allowing forward references).
The preprocessor produces a line map that tracks the correspondence between preprocessed line numbers and original source line numbers. This is threaded through the walker and codegen so that //line directives and error messages reference the correct .rugo source location.
The parser is generated from an LL(1) grammar defined in parser/rugo.ebnf using the egg parser generator tool:
egg -o parser.go -package parser -start Program -type Parser -constprefix Rugo rugo.ebnf
Important:
parser/parser.gois generated code and must never be hand-edited. All grammar changes go throughrugo.ebnf.
The grammar defines a standard expression language with precedence levels:
Program = { Statement }
Statement = UseStmt | ImportStmt | RequireStmt | SandboxStmt | FuncDef | TestDef
| IfStmt | WhileStmt | ForStmt
| BreakStmt | NextStmt | ReturnStmt
| AssignOrExpr
Expr = OrExpr
OrExpr = AndExpr { "||" AndExpr }
AndExpr = CompExpr { "&&" CompExpr }
CompExpr = AddExpr [ comp_op AddExpr ]
AddExpr = MulExpr { ('+' | '-') MulExpr }
MulExpr = UnaryExpr { ('*' | '/' | '%') UnaryExpr }
UnaryExpr = '!' Postfix | '-' Postfix | Postfix
Postfix = Primary { Suffix }
Suffix = '(' [ ArgList ] ')' | '[' Expr [ ',' Expr ] ']' | '.' ident
Primary = ... | CaseExpr | ...
CaseExpr lives in Primary rather than Statement to avoid an LL(1) conflict β both assignment and standalone case start with the "case" token. Standalone case (not assigned to a variable) flows through AssignOrExpr β Expr β Primary β CaseExpr and the walker converts it to a CaseStmt for efficient codegen (no IIFE overhead).
Operator precedence (lowest to highest):
| Level | Operators |
|---|---|
| 1 | || |
| 2 | && |
| 3 | == != < > <= >= |
| 4 | + - |
| 5 | * / % |
| 6 | ! (unary) - (unary) |
| 7 | () [] . (postfix) |
The parser produces a flat []int32 array encoding the parse tree. Non-terminal nodes are encoded as (-symbol, childCount, children...) and terminal tokens as positive indices into the token stream. This compact representation is then walked by the AST walker.
The typed AST is defined in ast/nodes.go. It uses Go interfaces with marker methods for type safety:
Node (interface)
βββ Statement (interface)
β βββ Program β root node, contains []Statement
β βββ UseStmt β use "module" (Rugo stdlib)
β βββ ImportStmt β import "go/pkg" [as alias] (Go bridge)
β βββ RequireStmt β require "path" [as alias | with mod1, mod2, ...]
β βββ SandboxStmt β sandbox [ro: [...], rw: [...], env: [...], ...] (Landlock + env)
β βββ FuncDef β def name(params) body end
β βββ TestDef β rats "name" body end
β βββ IfStmt β if/elsif/else/end
β βββ CaseStmt β case/of/elsif/else/end (contains []OfClause)
β βββ WhileStmt β while cond body end
β βββ ForStmt β for var [, var2] in expr body end
β βββ BreakStmt β break
β βββ NextStmt β next
β βββ ReturnStmt β return [expr]
β βββ ExprStmt β expression as statement
β βββ AssignStmt β target = value
β βββ IndexAssignStmt β obj[index] = value
β β
β β (produced by transforms β not in the parse tree)
β βββ ImplicitReturnStmt β last expr converted to return (from ImplicitReturnLowering)
β βββ TryResultStmt β last expr in try handler (from ImplicitReturnLowering)
β βββ SpawnReturnStmt β return inside spawn body (from ConcurrencyLowering)
β βββ TryHandlerReturnStmt β return inside try handler (from ConcurrencyLowering)
β
βββ Expr (interface)
βββ BinaryExpr β left op right
βββ UnaryExpr β op operand
βββ CallExpr β func(args...)
βββ IndexExpr β obj[index]
βββ SliceExpr β obj[start, length]
βββ DotExpr β obj.field
βββ IdentExpr β variable/function reference
βββ IntLiteral β integer
βββ FloatLiteral β float
βββ StringLiteral β string (Raw: true for single-quoted)
βββ BoolLiteral β true/false
βββ NilLiteral β nil
βββ ArrayLiteral β [elem, ...]
βββ HashLiteral β {key: value, ...} or {expr => value, ...}
βββ TryExpr β try expr or err handler end
βββ SpawnExpr β spawn body end
βββ ParallelExpr β parallel body end
βββ FnExpr β fn(params) body end (lambda)
βββ CaseExpr β case/of/elsif/else/end as expression (IIFE codegen)
β
β (produced by ConcurrencyLowering β replace their non-lowered counterparts)
βββ LoweredTryExpr β try with extracted result expr and handler body
βββ LoweredSpawnExpr β spawn with extracted result expr
βββ LoweredParallelExpr β parallel with pre-categorized branches (ParallelBranch)
Every statement node embeds BaseStmt, which carries a SourceLine field mapping back to the original .rugo source. This is populated by the walker using the line map from the preprocessor.
The Factory (ast/factory.go) centralizes AST node creation for transform passes, providing copy-on-write helpers like ProgramFrom, FuncDefWithBody, and IfStmtWithBranches to ensure consistent construction without mutating the original tree.
The walker (ast/walker.go) transforms the parser's flat []int32 encoding into the typed AST. It reads the flat array sequentially, matching non-terminal symbols to construct the appropriate node types. The walker also applies the preprocessor's line map to set accurate source line numbers on each statement.
The code generator (compiler/codegen.go) traverses the typed AST and produces Go source via a two-stage process: first building a Go AST (compiler/goast.go), then serializing it to source (compiler/goprint.go).
Before codegen begins, the Compile() function runs semantic checks (UndefinedIdentCheck), and the generate() function runs the transform chain (ConcurrencyLowering + ImplicitReturnLowering) and type inference. The codegen is split across several files:
| File | Responsibility |
|---|---|
codegen.go |
Orchestration, codeGen struct, generate() entry point |
check_idents.go |
Semantic check: undefined variable and function detection |
codegen_expr.go |
Expression compilation: exprString() converts Rugo expressions to Go source strings |
codegen_stmt.go |
Statement compilation: buildStmt() converts statements to GoStmt nodes |
codegen_func.go |
Function and lambda codegen, including closure variable capture |
codegen_scope.go |
Variable scope tracking and management |
codegen_runtime.go |
Runtime helper injection: sandbox, spawn/parallel templates, Go bridge stubs |
codegen_build.go |
Test and benchmark harness generation |
Rather than emitting raw strings, codegen builds a GoFile tree (compiler/goast.go) composed of GoDecl, GoStmt, and GoExpr nodes. The GoFile contains the package name, imports, top-level declarations (functions, variables, runtime code), and the main() body. The printer (compiler/goprint.go, PrintGoFile()) then serializes this tree to properly formatted Go source with correct indentation. A GoRawDecl escape hatch allows injecting pre-formatted code for runtime templates and complex generated blocks.
The generated file includes:
- Imports β standard library imports plus any module-specific Go imports.
- Runtime helpers β type conversion, arithmetic, comparison, shell execution, iteration, and panic handling functions.
- Module runtimes β Go struct and method implementations for imported stdlib modules, plus auto-generated wrapper functions.
- User functions β each
defcompiles to a Go function with signaturefunc rugofn_NAME(params ...interface{}) interface{}. - Main function β top-level statements wrapped in
func main()with adefer/recoverfor panic handling.
Variable scoping: The codegen maintains a scope stack. First assignment in a scope uses :=, subsequent assignments use =. Every assigned variable gets a _ = varname line to suppress Go's "declared but not used" errors.
for..in loops: The single-variable form (for x in coll) uses rugo_iterable_default() which returns values for arrays and keys for hashes (Python-style). The two-variable form (for k, v in coll) uses rugo_iterable() which returns []rugo_kv (key-value pairs) for uniform array/hash iteration. Arrays produce {index, value} pairs; hashes produce {key, value} pairs. Integer collections iterate from 0 to N-1. The range(start, end) builtin generates efficient Go for loops when used in for-loop collections (no slice allocation); outside for-loops it returns an array.
Index assignment: arr[0] = x and hash["key"] = y compile to rugo_index_set(obj, idx, val), which type-switches on the target. Negative indices are supported for arrays (e.g., arr[-1] = x sets the last element).
Negative array indexing: Array access supports negative indices (Ruby behavior). arr[-1] returns the last element, arr[-2] the second-to-last, etc. This is handled by the rugo_array_index runtime helper, which normalizes negative indices by adding len(arr).
Slicing: obj[start, length] compiles to rugo_slice(obj, start, length), which supports both arrays and strings. For arrays it returns a new array; for strings it returns a substring. Out-of-bounds indices are clamped silently (Ruby behavior) rather than panicking. Slicing unsupported types (int, bool, hash, etc.) produces a developer-friendly error like cannot slice hash (expected string or array).
Argument count validation: User-defined function calls are validated during code generation. If the number of arguments doesn't match the function's parameter count, a Rugo-specific error is emitted (e.g., wrong number of arguments for greet (2 for 1)) instead of exposing internal Go compiler errors.
try/or expressions: Compile to a Go IIFE with defer/recover. The tried expression is the return value; if it panics, the recovery handler runs and produces the fallback value.
//line directives: The codegen emits //line file.rugo:N directives before each statement so that Go runtime panics show .rugo source locations instead of generated Go line numbers.
Test harness: When rats blocks are present, the codegen generates a TAP-compliant test runner instead of a regular main(). Each test block becomes a separate function, with optional setup/teardown (per-test) and setup_file/teardown_file (per-file) hooks.
| Rugo construct | Go function name |
|---|---|
def greet(...) |
rugofn_greet(...) |
ns.func(...) (user module) |
rugons_ns_func(...) |
mod.func(...) (stdlib module) |
rugo_mod_func(...) |
puts(...) |
rugo_puts(...) |
__shell__(...) |
rugo_shell(...) |
__capture__(...) |
rugo_capture(...) |
Rugo has three ways to bring in external functionality:
| Keyword | Purpose | Example |
|---|---|---|
use |
Load Rugo stdlib modules | use "http" |
import |
Bridge to Go stdlib packages | import "strings" |
require |
Load user .rugo files or Go modules |
require "helpers" |
Modules provide namespaced standard library functionality. Each module self-registers via Go init() using modules.Register().
Prefer use modules for standard operations. They provide a curated, Ruby-inspired API covering math, file paths, encoding, crypto, time, and more. The import keyword gives direct access to Go's stdlib for advanced needs, but use modules are the idiomatic approach.
A module consists of:
runtime.goβ A Go source file with a struct type and methods, tagged with//go:build ignoreso it's not compiled directly. It's embedded as a string and emitted into the generated program.- Registration file β Declares the module name, type, function signatures with typed args, required Go imports, and embeds the runtime source.
- User writes
use "http"in their.rugoscript. - The codegen looks up the module in the registry and collects its Go imports.
- The module's
FullRuntime()method generates:- The cleaned runtime source (struct + methods)
- A module instance variable (
var _http = &HTTP{}) - Wrapper functions for each declared function that convert
interface{}args to typed parameters
| ArgType | Go type | Runtime converter |
|---|---|---|
String |
string |
rugo_to_string |
Int |
int |
rugo_to_int |
Float |
float64 |
rugo_to_float |
Bool |
bool |
rugo_to_bool |
Any |
interface{} |
none (passed through) |
The import keyword provides direct access to whitelisted Go standard library packages. The compiler maintains a static registry of bridgeable Go functions and auto-generates type conversions between Rugo's interface{} values and Go's typed parameters.
import "strings"
import "math"
puts strings.contains("hello world", "world") # true
puts math.sqrt(144.0) # 12Function names use snake_case in Rugo and are auto-converted to Go's PascalCase. Go functions returning (T, error) auto-panic on error, integrating with try/or. The as keyword provides aliasing: import "os" as go_os.
User modules use require:
require "helpers" # loads helpers.rugo, namespace: helpers
require "lib/utils" as u # loads lib/utils.rugo, namespace: u
require "lib/utils" as "u" # quoted form also accepted
helpers.greet("World")
u.compute(42)Paths are resolved relative to the calling file. The .rugo extension is added automatically if missing. Requires are resolved recursively and deduplicated. If the path points to a directory, Rugo resolves an entry point: <dirname>.rugo β main.rugo β sole .rugo file (file takes precedence over directory when both exist).
The with clause selectively loads specific .rugo files from a directory (local or remote):
# Local directory
require "mylib" with client, helpers
client.connect()
# Remote repository
require "github.com/user/rugo-utils@v1.0.0" with client, helpersEach name loads <name>.rugo from the directory or repository root (falling back to lib/<name>.rugo), using the filename as the namespace.
Remote git repositories can also be required as a single module:
require "github.com/user/rugo-utils@v1.0.0" as "utils"
utils.slugify("Hello World")Remote modules are shallow-cloned and cached in ~/.rugo/modules/. Tagged versions (@v1.0.0) and commit SHAs are cached forever; branch refs (@main) are locked to their resolved SHA on first fetch. Use @latest to automatically resolve to the highest stable semver tag.
Use rugo mod tidy to generate a rugo.lock file that records the exact commit SHA for every remote module, making builds reproducible. Use rugo mod update to re-resolve mutable dependencies, or rugo build --frozen to fail if the lock file is stale.
require also supports Go packages with exported functions. When a required path resolves to a directory containing go.mod and .go files (instead of .rugo files), the compiler introspects the Go source, classifies exported functions, and bridges them automatically β no manifest or registration needed:
require "path/to/my_go_module"
my_go_module.greet("world")
require "github.com/user/rugo-slug@v1.0.0" as slug
slug.make("Hello World!")The Go module author writes a standard Go package with exported functions using bridgeable types (string, int, float64, bool, error, []string, []byte). Functions with non-bridgeable signatures (interfaces, channels, generics) are automatically excluded with clear compile-time warnings.
Exported structs with bridgeable field types are also supported β the compiler generates wrapper types so struct values can be created, have fields read/set via dot syntax, and be passed to Go functions:
require "mymod"
c = mymod.config() # zero-value constructor
c.name = "app" # field set
c.port = 8080
c2 = mymod.new_config("x", 3) # Go constructor returning *Config
puts(mymod.describe(c2)) # pass struct to Go functionSee Go Modules for full details on struct support.
See External Modules for details on creating Go modules.
There is no implicit search path β the require string tells you exactly where the code comes from: a relative path is local, a URL-shaped path is remote.
The embed keyword embeds file contents into the compiled binary at build time. The file is read during compilation and baked into the executable β no external files needed at runtime.
embed "config.yaml" as config
embed "assets/template.html" as template
puts config
puts len(template)Syntax: embed "path" as name
pathβ file path relative to the source filenameβ variable name that holds the file content as a string
Path restriction: Embedded file paths must resolve to the same directory or a subdirectory of the .rugo source file that declares them. This mirrors Go's embed restriction and prevents libraries from accessing files outside their own tree:
embed "data/config.txt" as cfg # OK: subdirectory
embed "sibling.txt" as sib # OK: same directory
embed "../secret.txt" as secret # ERROR: escapes source directoryHow it works: The compiler uses Go's //go:embed under the hood. Files are copied into the build directory and linked directly into the binary's data section β efficient even for large files.
Note:
embedcannot be used witheval.run()becauseeval.run()compiles from an ephemeral temp directory with no files to embed. Useeval.file()instead when embedding is needed.
Functions prefixed with _ are private to their module. The compiler rejects any attempt to call them from outside:
# mylib.rugo
def _helper() # private β only callable within mylib
return "internal"
end
def greet() # public β callable from anywhere
return _helper() # OK: same module
end# main.rugo
require "mylib"
puts mylib.greet() # OK
puts mylib._helper() # compile error: '_helper' is private to module 'mylib'Functions without the _ prefix are public. This applies to all require forms: plain, as, and with.
These functions are always available without any use or import:
| Function | Description |
|---|---|
puts(args...) |
Print args separated by spaces, followed by newline |
print(args...) |
Print args separated by spaces, no trailing newline |
len(v) |
Length of string (character count), array, or hash |
append(arr, val) |
Append value to array, returns new array. Can be used as a bare statement: append arr, val |
raise(msg) |
Raise a runtime error with the given message |
type_of(v) |
Returns the type name of a value as a string |
exit(code?) |
Terminate the program with optional exit code (default: 0) |
Arrays and hashes have built-in methods dispatched via rugo_dot_call. These are always available without imports. Built-in methods take priority over hash key lookup β use hash["key"] for key access when a key name collides with a method.
| Method | Returns | Description |
|---|---|---|
.map(fn) |
Array | Transform each element |
.filter(fn) |
Array | Keep elements where fn returns truthy |
.reject(fn) |
Array | Remove elements where fn returns truthy |
.each(fn) |
nil | Iterate with side effects |
.reduce(init, fn) |
Any | Accumulate: fn(acc, val) |
.find(fn) |
Any/nil | First matching element |
.any(fn) |
Bool | True if any element matches |
.all(fn) |
Bool | True if all elements match |
.count(fn) |
Int | Count matching elements |
.join(sep) |
String | Join elements with separator |
.first() |
Any/nil | First element |
.last() |
Any/nil | Last element |
.min() |
Any/nil | Minimum value (numeric or string) |
.max() |
Any/nil | Maximum value (numeric or string) |
.sum() |
Number | Sum of numeric elements |
.flatten() |
Array | Flatten one level of nesting |
.uniq() |
Array | Remove duplicates (preserving order) |
.sort_by(fn) |
Array | Sort by lambda result (non-mutating) |
.flat_map(fn) |
Array | Map then flatten |
.take(n) |
Array | First n elements |
.drop(n) |
Array | All but first n elements |
.zip(other) |
Array | Pair elements from two arrays |
.chunk(n) |
Array | Split into groups of n |
Hash method lambdas receive (key, value):
| Method | Returns | Description |
|---|---|---|
.map(fn) |
Array | Transform each pair: fn(k, v) |
.filter(fn) |
Hash | Keep pairs where fn(k, v) returns truthy |
.reject(fn) |
Hash | Remove pairs where fn(k, v) returns truthy |
.each(fn) |
nil | Iterate pairs: fn(k, v) |
.reduce(init, fn) |
Any | Accumulate: fn(acc, k, v) |
.find(fn) |
Array/nil | First matching [key, value] pair |
.any(fn) |
Bool | True if any pair matches |
.all(fn) |
Bool | True if all pairs match |
.count(fn) |
Int | Count matching pairs |
.keys() |
Array | All keys |
.values() |
Array | All values |
.merge(other) |
Hash | Combine hashes (other wins conflicts) |
Rugo includes a built-in test framework using rats/end blocks:
use "test"
rats "arithmetic works"
test.assert_eq(1 + 1, 2)
end
rats "string interpolation"
name = "World"
test.assert_eq("Hello, #{name}!", "Hello, World!")
endTest files use the _test.rugo extension and produce TAP (Test Anything Protocol) output. The test harness supports:
setup/teardownfunctions called before/after each testsetup_file/teardown_filefunctions called once before/after all teststest.assert_eq,test.assert,test.skipfrom the test module- Exit code 1 on any test failure
Rugo uses position-based # comment attachment for documentation:
# File-level documentation goes here.
# Calculates the factorial of n.
# Returns 1 when n <= 1.
def factorial(n)
# This is a regular comment β not shown by rugo doc
if n <= 1
return 1
end
return n * factorial(n - 1)
end
# A Dog with a name and breed.
struct Dog
name
breed
endRules:
- Consecutive
#lines immediately beforedef/struct(no blank line gap) = doc comment - First
#block at top of file before any code = file-level doc #inside function bodies, after a blank line gap, or inline = regular comment
Use rugo doc to view documentation for files, modules, and bridge packages:
rugo doc file.rugo # all docs in a file
rugo doc file.rugo factorial # specific symbol
rugo doc http # stdlib module
rugo doc strings # bridge package
rugo doc use:os # force stdlib module (when name is ambiguous)
rugo doc import:os # force bridge package (when name is ambiguous)
rugo doc --all # list everything