qbasic.run - Help

If you are reading this, you must truly need help. I'll do my best.

Quick Start
Language Reference
Implementation Notes

These books written by actual technical writers are really good, though somewhat dated and presuming less general technical knowledge and perhaps more nicotine and hairspray use than are now the norm.

Quick Start

Here's a simple program so you can see what we're dealing with.

' This is a procedure (a subroutine) named PrintKey.
' It takes one string parameter k$ and shows its value on
' the screen.
SUB PrintKey (k$)
  PRINT "You pressed "; k$
END SUB

DO
  ' Call the builtin function INKEY$ to read whatever key
  ' is pending from the keyboard.
  k$ = INKEY$
  IF k$ <> "" THEN
    ' If key is not empty, call our subroutine to display it.
    PrintKey k$
  END IF
' Loop until a key is pressed.
LOOP WHILE k$ = ""

QBasic is a statically typed procedural language with block structure and two levels of lexical scope. Global variables exist the entire time a program is running, and are only visible outside procedures. Local variables are allocated on a stack each time a procedure is called, and only exist during that call.

Variables are implicitly defined the first time they are used, but can also be declared before use. Variable declarations can change default scope behavior. Globals can be made visible to all procedures or to specific procedures by declaring them SHARED. Locals declared STATIC persist after procedure calls. Globals declared COMMON can persist after a program finishes, by being passed to another program (no, really).

By default, procedure parameters are passed by reference.

' Increment adds 1 to its integer parameter.
SUB Increment (i%)
  ' This i% refers to the global j%, so modifying it
  ' will also modify j%
  i% = i% + 1
END SUB

j% = 41
Increment j%
PRINT j%  ' Prints 42.

Procedures defined with FUNCTION return a value instead of just modifying their parameters.

' Increment% returns its integer parameter's value + 1.
FUNCTION Increment% (i%)
  ' Return by assigning to the function name.
  Increment% = i% + 1
END FUNCTION

PRINT Increment(41)  ' Prints 42.

QBasic also supports some older procedure constructs DEF FN and GOSUB.

Older BASICs use line numbers and GOTO statements heavily, but QBasic prefers block structured programming constructs like IF...END IF, DO...LOOP, FOR...NEXT.

FOR i = 1 to 10
  IF foo THEN
    ' Do this if foo is true ...
  ELSE IF bar THEN
    ' Otherwise, do this if bar is true ...
  ELSE
    ' Otherwise, do this ...
  END IF
NEXT i

Language Reference

Builtin Types

All values in QBasic programs have an associated type which is determined at compile time.

Type name	Sigil	Range	Size
`INTEGER`	%	16-bit signed integer -32768 to 32767	2 bytes
`LONG`	&	32-bit signed integer -2147483648 to 2147483647	4 bytes
`SINGLE`	!	Single-precision float \|x\| 2.802597e-45 to 3.402823e+38	4 bytes
`DOUBLE`	#	Double-precision float \|x\| 4.940656458412465d-324 to 1.79769313486231d+308	8 bytes
`STRING`	$	Variable length string up to 32767 bytes	?
`STRING * n%`	$	Fixed length string	n% bytes

A value's type can be specified by putting one of the characters %, &, !, #, or $ after it. In qbasic.run we refer to these characters as "sigils", borrowing terminology from Perl 5.

QBasic also supports user defined types.

Literal values

QBasic operates on signed integers, floating point numbers (floats), and strings.

A base ten number with no decimal point and no sigil is read as the smallest type that fits it: an INTEGER, a LONG, or a DOUBLE. But note -32768 is read as a LONG, and -2147483648 as a DOUBLE. Numbers can be entered in hex by prefixing them with &H and octal by prefixing them with &O. Hex and octal numbers are implicitly signed, so &HFFFF = -1%. To get 65535, you have to write &HFFFF&.

Numbers with a decimal point like 3.14 or numbers in scientific notation like 6.02E23 are floats. Decimals are read as SINGLE if they have 7 or fewer digits, else DOUBLE. Scientific notation numbers are SINGLE if they use E to mark the exponent, and DOUBLE if they use D. If you try to write an E-number with more than 7 digits, QBasic changes E to D when you press enter.

PRINT 42        ' integer
PRINT 32768     ' long
PRINT 3.142857  ' single
PRINT 6.02e23   ' single
PRINT 6.02d23   ' double

STRINGs are lists of bytes which may either be binary data or text. Fixed length strings are initially filled with NUL characters (ASCII 0). Variable-length strings are stored along with a 16-bit integer length, so they can also include NUL characters unlike C strings.

s$ = "I am a string."
' To put quotes in a string, use CHR$(34).
t$ = CHR$(34) + "I say," + CHR$(34) + " he said."
' Strings can contain arbitrary bytes.
u$ = CHR$(0) + CHR$(26)

The + operator concatenates two strings into a new string.

See this note for more details about how qbasic.run handles character encodings for strings. See this note for more details about memory model for string storage.

Arithmetic Operators

Operator	Description	Result Type
`a + b`	Addition	Most precise of a or b
`a - b`	Subtraction	Most precise of a or b
`a * b`	Multiplication	Most precise of a or b
`a / b`	Floating point division	a or b DOUBLE -> DOUBLE otherwise SINGLE
`a \ b`	Integer division	a and b INTEGER -> INTEGER otherwise LONG
`a MOD b`	Integer remainder	a and b INTEGER -> INTEGER otherwise LONG
`a ^ b`	Floating point exponent	a or b DOUBLE -> DOUBLE otherwise SINGLE
`-a`	Negation	a

(For "most precise", DOUBLE > SINGLE > LONG > INTEGER.)

Arithmetic precision is determined by operand types, which can be confusing.

PRINT 320 * 200  ' Overflow error!

A runtime error occurs if arithmetic overflows. In the example above, 320 and 200 are both 16-bit signed integers, but their product 64000 > 32767. Similarly, there are runtime errors for division by zero and 0 ^ -k. One way to avoid overflow is to specify explicit types.

PRINT 320& * 200&  ' Impossibly huge 32-bit numbers.

If operands do not have the correct types, QBasic automatically converts them first. For example, this means DOUBLE operands for \ must fit in a LONG.

PRINT 2147483648# \ 2#  ' Overflow error!

There are a few more details about qbasic.run artihmetic precision. See this note for more about overflow checking.

Comparison Operators

Operator	Description
`a = b`	Equals
`a < b`	Less than
`a <= b`	Less than or equal
`a <> b`	Not equals
`a >= b`	Greater than or equal
`a > b`	Greater than

a and b must both be numeric types or both be strings. String comparison is lexicographic: a < b if the first differing character code is lower in a than b or a is a prefix of b.

The result is an INTEGER, 0 for false and -1 for true.

Logical Operators

Operator	Description
`a AND b`	a & b
`a OR b`	a \| b
`a XOR b`	a ^ b
`a EQV b`	~(a ^ b)
`a IMP b`	~a \| b
`NOT a`	~a

These are really just arithmetic operators. The operands are first converted to LONG if necessary, and the result is INTEGER or LONG the same as for integer division.

Type Conversions

Arithmetic operators automatically convert operand types as necessary.

Builtin functions and procedures expect specific parameter types. QBasic tries to convert numeric parameters to the expected numeric type, and tries to convert results to the expected type for assignments. A runtime error occurs if there is an overflow during type conversion.

' 65.2 is automatically converted to an integer.
PRINT CHR$(65.2)  ' Prints "A"
' Type mismatch error: CHR$() expects a numeric argument.
PRINT CHR$("A")

Some builtin functions are polymorphic. For example ABS accepts any numeric type and returns a value of the same type. And ATN returns a DOUBLE if its argument is a DOUBLE, otherwise SINGLE. User defined procedures are not polymorphic.

Conversion functions CINT CLNG CSNG CDBL explicitly convert a value to a desired type.

User Defined Types

The TYPE statement lets you define your own types as compositions of builtin types and previously defined user defined types. Types consist of named elements. All elements must have a known size at compile time.

' A Pet has a name up to 20 characters long.
TYPE Pet
  Name AS STRING * 20
END TYPE

' A Person has a name up to 40 characters long,
' an age which is presumed to be less than 32768,
' a height as a single precision float in cm,
' and a dog.
TYPE Person
  Name   AS STRING * 40
  Age    AS INTEGER
  Height AS SINGLE
  Dog    AS Pet
END TYPE

' The variable Joe has type Person.
DIM Joe AS Person

' Use variable dot element to refer to type elements.
Joe.Name = "Joe"
Joe.Age  = 22
Joe.Height = 182.88
Joe.Dog.Name = "Milo"

A variable can be defined with user defined type using DIM. DIM Joe AS Person defines Joe to have the Person type. Elements of a user defined type variable can be accessed using period syntax as variable.element. Note that element names cannot include periods or sigils, and elements cannot be arrays.

See this note for more quirks about how element lookup works.

Named Constants

A constant is a name for a value defined at compile time. Constants can be used in arithmetic expressions anywhere a value would be used. A constant's type is automatically inferred from its value, or given by an explicit sigil.

CONST False = 0
CONST True = NOT False
CONST Pi! = 3.14159

Constants can be defined using arithmetic expressions involving values and other constants, but not function calls.

' You can use arithmetic to define constants.
CONST W = 80, H = 25, Area = W * H
' This fails at compile time with "Invalid constant".
CONST Angle = ATN(1)

A name used for a constant cannot also be used for a variable or procedure, and each constant name can only have one type.

CONST W% = 80
' This is a "Duplicate definition" error, because W is
' already reserved for the value 80.
CONST W$ = "Hello"
' Ditto.
W% = 40
' Ditto.
SUB W: END SUB

Constants defined in a procedure body are local to that procedure.

CONST k% = 42, k2% = 1
SUB foo
  CONST k% = 50, k3% = 5
  ' Prints 50, 1.
  ' The local k% takes precedence over global k%,
  ' but k2% is looked up from global scope.
  PRINT k%, k2%
END SUB
' Prints 42, 0 because CONST k3% is not in scope here.
' So this k3% is an implicit variable definition.
PRINT k%, k3%
foo

Variables

A variable is a name for a value or an array of values in memory. Variable names must start with a letter and may include letters, periods, and numbers.

x = 2
Thing42% = 1
i.am.a.variable! = 42

Every variable has a type defined at compile time. A variable's type can be specified by putting a sigil after its name. For example, p& means a 32-bit signed LONG named p. p& and p$ are different variables, and it is valid to use both in the same program.

Variables can also be declared to have some type using declaration statements DIM COMMON STATIC SHARED with AS syntax. For example, DIM x AS INTEGER declares x to be a 16-bit signed integer. After this AS declaration, the name x can only be used for INTEGER variables.

DIM x AS INTEGER
PRINT x   ' Ok, prints the integer named x.
PRINT x%  ' Ditto.
' Since the name x has been declared as an integer, can't have
' other variables named x with a different type anymore.
PRINT x$  ' Error: Duplicate definition.

If a variable name has no sigil and no AS-declared type, its type is determined by the current DEFtype.

Arrays

Arrays are lists of values of the same type that can be accessed by index. A variable followed by indices in parentheses refers to an array of that variable's type.

FOR i = 0 TO 2
  ' Define the ith value in the array x().
  ' The array is created implicitly when first used, and has
  ' values of whatever type x has.
  x(i) = i
NEXT i
' Sum the first, second, and third values in x().
PRINT x(0) + x(1) + x(2)

Note that the same name can refer to both an array variable and a normal variable with one value (a "scalar" variable) depending on context. Most of the time array variables must have parentheses after their name to distinguish them from scalar variables.

' q and q() are different variables with the same name "q".
q = 42
q(2) = 10
PRINT q, q(2)

Arrays can have many dimensions with an index per dimension.

m(1, 2) = 55
n(0, 3, 6) = 6

You cannot change the number of dimensions an array has after it is defined.

m(1, 2) = 55
' Fails with "Wrong number of dimensions".
m(1, 2, 3) = 42

Array indices for a dimension usually start at 0, although this can be changed with OPTION BASE. Implicitly defined arrays have a maximum index of 10 for each dimension. You can explicitly specify bounds for each dimension with DIM, though the bounds must fit in INTEGER range.

' Define a 2d array x% of integer values.
' First dimension has indices ranging from -5 to 5
' Second dimension has indices ranging from 0 to 32
DIM x%(-5 TO 5, 32)
x%(-5, 0) = 30
x%(4, 32) = 30

AS type declarations for a name apply to both array and scalar variables with that name.

DIM x AS INTEGER
x = 42
' x(2) implicitly defines an integer array.
x(2) = 10
' Illegal because x$ is not an integer.
x$(2) = "nope"

DIM y(10) AS STRING
' This is ok, because y has string type.
y$ = "ok"
' Illegal because y% is not a string.
y% = 42

Array Allocation

QBasic distinguishes between "static" and "dynamic" arrays. Static arrays are allocated at compile time, and dynamic arrays can be allocated and resized at runtime using DIM ERASE REDIM.

How an array is allocated depends on how it is defined. Arrays are static if they are defined implicitly or DIM'd with constant bounds. Arrays are dynamic if they are DIM'd with variable bounds, declared first by COMMON declarations, or DIM'd in non-STATIC procedures.

QBasic does not support allocating arrays on a call stack, so only dynamic arrays are supported in non-STATIC procedures. This means that implicitly defining arrays in procedures is usually illegal.

COMMON r()  ' r is dynamic, size unknown.
DIM s(k%)   ' s is dynamic, k% + 1 values.
DIM t(10)   ' t is static, 11 values.
u(5) = 1    ' u is static, 11 values.

SUB foo
  DIM v(2)  ' v is dynamic, 3 values.
  w(4) = 1  ' Compile time error, "Array not defined".
END SUB

Keen readers will notice this implies that QBasic has a simple garbage collector... dynamic arrays created in procedures are deallocated when their stack frame is torn down.

The metacommands $STATIC $DYNAMIC can change default allocation behavior, except that arrays in non-STATIC procedures are always dynamic.

See this note for more details about memory model for array storage.

Names and Scope Rules

Procedures, parameters, variables, and constants share the same namespace - in general, a name can only refer to one of these things at any point in a program's text and this is fixed at compile time (lexical scope). Trying to reuse the same name for two different things causes a "Duplicate definition" error. For example, trying to define a variable with the same name as a procedure is a duplicate definition.

Variables are distinguished from each other not only by name, but also by type and whether they are a scalar or array. So the name x can correspond to ten distinct variables: the scalars x!, x#, x$, x%, and x&, and the arrays x!(), x#(), x$(), x%(), and x&(). The same rules apply for parameters within a procedure.

The "AS" declaration syntax such as DIM AS limits what variable types may be associated with a name. If you write DIM x AS INTEGER, then only x% and x%() are valid variables with the name x. Similarly, DIM x(10) AS INTEGER means only x% and x%() are valid variables with the name x.

Constants and procedures are identified only by name and not by type. So CONST k% and CONST k$, and FUNCTION foo% and FUNCTION foo$, are duplicate definitions.

Note: QBasic permits constants to have the same name as array variables and procedure parameters, but not procedures or most scalar variables (see local scope).

Global Scope

A program has one default, global scope.

Procedures, constants, and variables defined at the top level of a program are global. Procedures and constants are visible everywhere including inside of procedure bodies, but by default, global variables are only visible outside of procedures.

To make a global variable visible inside a procedure body, it must be declared as SHARED.

Local Scope

Each procedure has its own local scope.

Procedure parameters and variables defined in a procedure are local to that procedure. They can usually have the same name as a global variable, and "shadow" it inside the procedure body. For instance if there is some normal unshared global variable x, and there is a SUB foo with a local variable x, code inside foo that references x sees the value of the local variable x.

DIM x
SUB foo
  PRINT x  ' Prints 0 (local shadows global x).
END SUB
x = 42
foo

If a global variable is shared with a procedure by being defined as DIM SHARED or COMMON SHARED or in a SHARED declaration, then local variables cannot shadow it. SHARED variables participate in the local scope the same as other variables, so this would be a duplicate definition.

DIM SHARED x
SUB foo
  DIM x  ' Error!  Duplicate definition.
END SUB

Unlike local variables, procedure parameters can shadow SHARED globals.

DIM SHARED x
SUB foo(x)
  PRINT x  ' Prints 42 (parameter shadows shared global x).
END SUB
x = 10
foo 42

Surprisingly, STATIC variables quietly shadow global variables.

DIM SHARED x
SUB foo
  STATIC x
  PRINT x  ' Prints 0...
END SUB
x = 10
foo

Strangely, constants defined in a procedure can also shadow shared global variables. And parameters can shadow functions - but not subs - if the functions are first declared using DECLARE FUNCTION. These are probably bugs, but real programs depend on them.

Common Variables

Variables declared with COMMON are global variables that can be shared by several different programs. Each program must declare the same set of common variables in the same order. When a program cedes control to a new program with CHAIN, the memory backing its common variables is copied to the new program's memory (in COMMON declaration order) so that the new program retains the values.

The intended use of COMMON was to split up big programs to work within memory constraints using CHAIN. However, COMMON declarations also have useful semantics for array declarations, and so many programs that don't use CHAIN still declare variables as COMMON. An array declared first in a COMMON statement implicitly uses dynamic allocation, which is often desired. And COMMON SHARED can declare a global array as shared without dimensioning it, while DIM SHARED cannot.

Implementation Notes

Character Encodings

MS-DOS files like QBasic programs mostly use CP437 ASCII encoding.

qbasic.run represents strings using Unicode internally so that debuggers and other tools see intelligible strings, while QBasic code still sees the original CP437 code points. For example, CP437 character 1 (☺︎) is stored in memory as U+263A WHITE SMILING FACE. Since JavaScript strings use UTF-16, this doesn't cost extra memory.

When you open a program, the shell usually guesses it is CP437 encoded and translates it to Unicode to deal with funky characters like ☺ or ░▒▓. But if the program is already valid UTF-8, the shell first tries to load it without doing any translation - so you can edit programs with modern tools, save them as UTF-8, and reload them later. This heuristic mostly works fine since extended ASCII isn't likely to be valid UTF-8, and nonprintable characters 1-31 aren't recognized code points.

DOS used &H0D &H0A for new lines in text files, while Unix uses just &H0A. qbasic.run will parse either &H0D &H0A or just &H0A as a newline, but writes &H0A.

Precision

Semantically JavaScript does all math with doubles, so qbasic.run uses doubles. It tries to match single precision results by internally rounding to the nearest single-precision values using Math.fround(). But this is not exact.

When floats must be converted to integers for arithmetic, halfway cases are rounded to the nearest even number. This is different than standard JavaScript rounding which rounds halfway cases up. qbasic.run rounds correctly.

  print 3.5 \ 1  ' 4
  print 2.5 \ 1  ' 2

Overflow Checking

QBasic was ahead of its time in doing overflow checking for all its arithmetic. The computers it ran on were very slow, and this added significant overhead - so much so that when the same programs were compiled as standalone executables by QBasic's commercial big brother, QuickBasic, it left off overflow checking. So you got safe arithmetic for free, and if you were a pro, you could pay $100 for the good stuff.

As a result, in collections of old programs, you'll sometimes find programs that depend on unchecked overflow [1]. These programs only work in compiled form, not when run from within QBasic or the QuickBasic interpreter.

qbasic.run behaves like the interpreted QBasic environment and always checks overflow, so some old programs need to be adapted to work.

[1] It's not that these programs are sloppy. They most often want to use unsigned 16-bit arithmetic for address calculations in graphics inner loops since the mode 13 VGA framebuffer is 64k. QBasic just doesn't have unsigned integers, so programs had to depend on two's complement wrapping to write "efficient" code with 16-bit math. Why 16-bit math was more efficient 10-15 years after Intel processors were natively 32-bit is another question.

Element Lookup

When you define a variable of a user defined type, qbasic.run implicitly defines variables for each type element. For example, if TYPE Point has elements x and y, we define variables P.x and P.y whenever a Point P is defined. QBasic probably hacked this the same way, which is why period is allowed in names.

The tricky part is that QBasic allows you to use periods in names for any other variables, constants, and functions, and not just for implicitly defined type elements. These names could be ambiguous with elements.

QBasic has a bunch of bolted on rules to avoid this ambiguity. It does not allow you to name user defined type variables or elements with periods. Also, after you define a user defined type variable P, QBasic seems to prohibit you from naming anything else P.whatever, regardless of scope or visibility rules.

qbasic.run implements these same weird rules.

Memory Model

QBasic stored variable length strings as a 16-bit length and 16-bit offset into a table of string data. This means a given string can store at most 32K characters, and all the strings in a program can total at most 32K. Colocating string data in memory meant string operations could be more efficient.

QBasic allocated arrays using separate heap allocations, and so arrays could hold more data: a single array can hold up to 32K items, can span up to 64K of memory, and several large arrays can be in memory at once. So most programs use arrays to store data. TODO