#Creating a Terminal shell

Creating a Terminal shell

August 13, 2017

In this series of posts we’ll create a Linux terminal shell from the ground up using the C programming language and some basic Linux system calls. This shell is not designed to replace your current shell like Bash, Zsh, etc. which have had years of development, but rather to learn how these programs work internally.

Modern shells today support a myriad of features ranging anywhere from advanced text completion to fancy prompts. Alas, it would be quite difficult to cover everything in this series, so we’ll try to restrict ourselves to a very basic shell, something that works and you can build upon later. We would at the very least try to support the following features nearly every shell supports:

Process Execution
Builtins like cd (cd is not a program, but rather a builtin command nearly every shell provides)
Redirecting streams like stdin, stdout, and stderr
Pipes : ls | cowsay
Environment Variables : $PATH, $HOME
Job Control — Ctrl + C (Process Termination)and Ctrl + Z (Process Suspension)

To be able to follow along you would need gcc installed on your system, if you are more comfortable with clang, or any other compiler tool chain, just follow along, but be sure to substitute anything which is exclusive to gcc with what you’re using. Since we would be doing system calls, I recommend people to follow this tutorial with a Linux Operating system, this is only due to the fact that I’m totally clueless about Windows and MacOs when it comes to system calls. If you’re averse to Linux for any reason(I don’t judge), most parts of this program should work well with any operating system, just be sure to substitute the ones which don’t with the variants particular to your operating system(I’ll be sure to remind you of this).

A shell in an essence is a REPL program

R — Read the current command

E — Evaluate it

P — Print the results

L — Loop and continue with step 1 (Read)

In this post we’ll start with the basics of constructing a REPL system and process execution.

Since we’re in the land of C Programming, we’ll be declaring our functions in header files, and defining them in source files. Moreover, we’ll start by creating a simple Makefile to build our shell. If you have little experience with Makefiles then checkout these tutorials here and here.

We’ll start with a simple Makefile, we’ll organize our source files into two directories : src (containg actual code) and include (containing header files). Our Makefile will be responsible for creating two more folders when build is complete: lib (containing object files) and bin (containing the actual binary).

	CC=gcc

	BIN_DIR=bin
	LIB_DIR=lib
	SRC_DIR=src
	INC_DIR=include

	CFLAGS=-I$(INC_DIR) -g
	DEPS=$(wildcard $(INC_DIR)/*.h)

	OBJS=$(LIB_DIR)/shell.o

	.PHONY: clean

	$(BIN_DIR)/msh: $(OBJS) $(SRC_DIR)/main.c
	@mkdir -p bin ;\
	$(CC) -o $@ $^ $(CFLAGS)

	$(LIB_DIR)/%.o: $(SRC_DIR)/%.c $(DEPS)
	@mkdir -p lib ;\
	$(CC) -c -o $@ $< $(CFLAGS)

	clean:
	@rm -rf bin lib

view raw Makefile hosted with ❤ by GitHub

Now first we’ll create the REPL part our of shell. This will be a do-while loop which keeps running unless it encounters an error.

	// Create this in src/main.c
	#include "shell.h"

	int main () {
	int err = 0;
	do {
	print_prompt();

	// read
	char *input = 0;
	err = read_line(&input);
	if(err < 0) break;

	// eval
	err = eval(input);

	// loop
	} while (!err);

	return 0;
	}

view raw main.c hosted with ❤ by GitHub

Now we’ll create the print_prompt, read_line, and eval functions.

The print_prompt function is the easiest of the set, it merely prints the name of our shell and the >> symbol. For now we’ll leave it like this, but you can always add stuff to it like the name of user, machine, and/or the current working directory.

	void print_prompt () {
	printf("msh >> ");
	}

view raw tm1.c hosted with ❤ by GitHub

The other function that we have the read_line function is just a wrapper over library functions for getting input string. In this tutorial I’m using the getline function, but feel free to use any other function that gets the job done. The getline function will return the number of characters read, we’ll use this for our error handling mechanism

A note: There are way better methods for error handling, for example setting error numbers. Returning error numbers is a simple method which works and is simple to implement for small projects like this one, but for large and production level projects look for more robust methods for handling errors.

	int read_line (char **input) {
	size_t len;

	int char_read = getline(input, &len, stdin);

	// For ctrl-D we'll get a char_read of -1
	return char_read;
	}

view raw tm2.c hosted with ❤ by GitHub

Now before working out the eval function, we need to make two more functions : strip_line and tokenize_line

The first strip_line function is easy, it’s responsible for removing \n from the end of line which we’ll get while using our getline function before

	static void strip_line(char *line) {
	size_t len = strlen(line);
	size_t end = len - 1;

	// since 'end' is unsigned, we can't check for greater than 0
	while (1) {
	if (line[end] == '\n') line[end] = '\0';
	if (end == 0) break;

	end--;
	}

	// handle 0 index
	if (line[end] == '\n') line[end] = '\0';
	}

view raw tm3.c hosted with ❤ by GitHub

The next tokenize_line is a bit more involved. We’re using strtok library function for tokenizing our line into individual tokens. For now we’ll rely on the <space> character as a delimiter. We’ll also create an array of char*, this will be used for storing pointers to individual tokens. By default, we’ll only handle upto 1024 tokens, we’ll create a macro to define this limit.

	#define TOKEN_LIMIT 1024
	static int tokenize_line(char line, char **tokens) {
	// clear new lines
	strip_line(line);

	// by default we support 1024 tokens. Plus one for final NULL pointer
	(tokens) = (char)malloc(sizeof(char) * TOKEN_LIMIT);

	const char *delim = " ";
	char *token = strtok(line, delim);

	int num_tokens = 0;
	while (token) {
	(*tokens)[num_tokens++] = token;

	// token limit reached
	if (num_tokens == TOKEN_LIMIT) return -1;

	token = strtok(NULL, delim);
	}

	return num_tokens;
	}

view raw tm4.c hosted with ❤ by GitHub

Let me explain the gory char ***tokens. So the tokens array is responsible for holding strings, in C world it would be char*. Since tokens is an array itself, its declaration would look something like char **tokens. Now, the above function is responsible for mutating the tokens array, filling it up with tokens, therefore we need to send a reference of this array to this function, tokenize_string(&tokens); Thus, in the function itself, we would need char ***tokens.

Now let’s finish up by creating our eval function, before that let’s learn the basics of starting processes with the help of standard library. Now what I’m describing below is specific to Linux systems, so be sure to switch them with variants specific to your operating system.

The Linux Kernel provides a whole family of exec functions which can be used for calling processes, these range from being able to pass array of arguments, to variable number of arguments, and whether to search for program name in PATH environment variable or not. You can find more information about them here. The following is the list:

You can view the exact use of each exec function in this StackOverflow answer. For us, execvp will do the job. The v is there because we’ll be passing the arguments as an array, and the p because we want exec to use the PATH environment variable for calling processes (which is what you’d expect from any shell)

Apart from execvp we need to be aware of three more functions : fork, waitpid, and WIFEXITED. The fork function lets us create a copy of the calling process and the control then literally “forks” into two branches (that’s why the name). One branch for the return value greater than 0 corresponds to the parent process (The process which has called the fork function), the other branch corresponds to the child process (which has forked from the parent). In the child process we’ll call execvp to replace the child process with the program we want to call, and in the parent process we would wait for the child process to complete. We can use waitpid function call to achieve this, just to remind you that the return value of fork is the process id of child process. Then with the return value of waitpid we would call WIFEXITED to check whether the child process terminated with or without errors, and take corresponding action.

Here’s how it’s done:

	int eval (const char *input) {
	char **tokens;

	// duplicate the input since strtok would modify it
	char *input_dup = strdup(input);

	int num_tokens = tokenize_line(input_dup, &tokens);
	if (num_tokens == 0) return 0;
	if (num_tokens < 0) {
	printf("Huge number of tokens\n");
	return -1;
	}

	pid_t pid = fork();

	if (pid > 0) {
	int status;
	waitpid(pid, &status, 0);
	free(input_dup);
	free(tokens);

	if (WIFEXITED(status)) {
	return 0;
	} else {
	return 1;
	}
	} else {
	execvp(tokens[0], tokens);
	}
	}

view raw tm5.c hosted with ❤ by GitHub

Now to sum it all up, create a file named shell.c in the src directory, it should look like following :

	#include <errno.h>
	#include <stdio.h>
	#include <string.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/types.h>
	#include <sys/wait.h>
	#include "shell.h"


	static void strip_line(char *line) {
	size_t len = strlen(line);
	size_t end = len - 1;

	// since 'end' is unsigned, we can't check for greater than 0
	while (1) {
	if (line[end] == '\n') line[end] = '\0';
	if (end == 0) break;

	end--;
	}

	// handle 0 index
	if (line[end] == '\n') line[end] = '\0';
	}


	static int tokenize_line(char line, char **tokens) {
	// clear new lines
	strip_line(line);

	// by default we support 1024 tokens. Plus one for final NULL pointer
	(tokens) = (char)malloc(sizeof(char) * TOKEN_LIMIT);

	const char *delim = " ";
	char *token = strtok(line, delim);

	int num_tokens = 0;
	while (token) {
	(*tokens)[num_tokens++] = token;

	// token limit reached
	if (num_tokens == TOKEN_LIMIT) return -1;

	token = strtok(NULL, delim);
	}

	return num_tokens;
	}


	void print_prompt () {
	printf("msh >> ");
	}


	int read_line (char **input) {
	size_t len;

	int char_read = getline(input, &len, stdin);

	// For ctrl-D we'll get a char_read of -1
	return char_read;
	}


	int eval (const char *input) {
	char **tokens;

	// duplicate the input since strtok would modify it
	char *input_dup = strdup(input);

	int num_tokens = tokenize_line(input_dup, &tokens);
	if (num_tokens == 0) return 0;
	if (num_tokens < 0) {
	printf("Huge number of tokens\n");
	return -1;
	}

	pid_t pid = fork();

	if (pid > 0) {
	int status;
	waitpid(pid, &status, 0);
	free(input_dup);
	free(tokens);

	if (WIFEXITED(status)) {
	return 0;
	} else {
	return 1;
	}
	} else {
	execvp(tokens[0], tokens);
	}
	}

view raw shell.c hosted with ❤ by GitHub

In the include directory, create the following file : shell.h

	#ifndef __SHELL__
	#define __SHELL__

	#define TOKEN_LIMIT 1024

	void print_prompt ();
	int read_line (char **input);
	int eval (const char *input);

	#endif

view raw shell.h hosted with ❤ by GitHub

In your working directory, you should have the following directory

.
├── include
│   └── shell.h
├── Makefile
└── src
    ├── main.c
    └── shell.c

Now just run make on the command line, and voila you’ll have bin/msh as your own shell, ready to run programs

kartik@kt:~/projects/blog$ make
kartik@kt:~/projects/blog$ ./bin/msh
msh >> ls
bin  include  lib  Makefile  src

In the next part, we’ll add builtins to our shell, especially the cd builtin!