Have you ever wondered what’s really happening under the hood from the CLI up till the Kernel level when you type the ‘ls’ command? Let’s figure it out!
The very first thing that allows you to start typing in the shell is a prompt. In most of the Linux distros, you’ll see it set as I
by default in the PS1
variable.
The moment you start typing, a keyboard interrupt takes place that further calls a key handler which will display the characters on the shell.
After displaying on the screen whatever you wrote on the screen, let’s suppose you wrote ls
(for the sake of this post), the shell reads the command using thegetLine()
function’s STDIN
data stream. It will store the input into a buffer as a string.
Buffer reads from STDIN
to the given block size and writes each block to the standard output.
Now, the string is broken into tokens by removing whitespace(suppose if you wrote ls *.c
). This is stored in an array of strings.
Now, it checks if any token has an alias defined. If there’s an alias defined for the token, then it will replace it with that particular value. The next step is to check if any token is a built-in function. Since built-in functions are treated differently by shell voluntarily. For example cd, echo, help are all built-in commands.
If it’s not a built-in function, we’ll go to find the PATH
variable in the directory. Since it holds the absolute paths for all the executable binary files. Each location specified in the PATH
variable is separated using the delimiter :
and searches recursively by appending the command at the end of the path.
For example: usr/bin
will be searched by appending usr/bin/ls
. Also, since it searches recursively, it will first search in the pwd and then its parent and so on and so forth with all other commands.
Once it finds the binary for ls
, the program is loaded in memory and a system call fork()
is made. This creates a child process as ls
and the shell will be the parent process. The fork()
returns 0
to the child process so it knows it has to act as a child and returns PID
of the child to the parent process(i.e. the shell).
Next, the ls
process executes the system call execve()
that will give it a brand new address space with the program that it has to run. Now, the ls
can start running its program. The ls
utility uses a function to read the directories and files from the disk by consulting the underlying filesystem’s inode
entries.
You can use the strace
with the ls to dig deeper to know which library functions and system calls are being executed.
adeel@pycen:~/foo$ strace ls
execve("/bin/ls", ["ls"], [/* 30 vars */]) = 0
adeel@pycen:/usr/src/bash-4.0/bash-4.0$ find . | xargs grep -n "execve ("
./builtins/exec.def:201: shell_execve (command, args, env);
./execute_cmd.c:4323: 5) execve ()
./execute_cmd.c:4466: exit (shell_execve (command, args, export_env));
./execute_cmd.c:4577: return (shell_execve (execname, args, env));
./execute_cmd.c:4653:/* Call execve (), handling interpreting shell scripts, and handling
./execute_cmd.c:4656:shell_execve (command, args, env)
./execute_cmd.c:4665: execve (command, args, env);
Once ls
process is done executing, it will call the _exit()
system call with an integer 0
that denotes a normal execution and the kernel will free up its resources.
The shell will free up memory, exits, and re-prompts the user for input.