About File Systems Design

Motto:

If you have no old people
Then you need to buy some!

-Old Romanian saying

Chapter 1. The Requirements

Requirements Introduction

What are the requirements for a file system?

In order to respect Einstein's first rule: "Use the most simple solution that is functional but not simpler."

We must find the point where a filesystem becomes functional.

Starting from that exact point ...

IF we ADD anything more then we are going into the realm of bloatware, "mind aberations" and useless waste of resources.
IF we REMOVE anything then we are going into the realm of "sub-optimal solutions".

Requirements analysis

We must first consider the devices for those filesystems. Below is a short list of devices taken into consideration:

By default we will consider Hard Disks as our main target.
We must also consider the HDD ancestor: the floppy disk. Because many file systems actually originated from the floppy.
CD-ROMs and DVDs.
RAM disks and caches and ROM/FLASH devices.
USB memory stick "drives" (FLASH memory)
And to a historical reason we might even mention the old mainframe "pile" hard disks.

All of those device have some things in common:

They are block devices
They have mechanical parts in movement
Reading sequencialy is 10x or 100x times faster than randomly

There are some notable exceptions from the above principles:

RAM and caches
USB memory stick/card/SSD devices

Aparently those devices have no moving parts and the access to random data is not far slower than sequenial reading.

And to some extent it is true... but not exactly ;) we will analyse the differences when appropiate. As we will analyse the huge mistakes, miss understandings and havoc that those "exceptions" have generated into the filesystems implementations and use.

So what must a filesytem do?

This is the minimal set of requirements:

REQUIREMENT NO.1 - "To Organize"

It must organize the blocks of data on the physical device in such a way as to be grouped and access as "files".

A "file" is simply a set of such blocks with a conveient name and set of properties attached / labeled to it.

REQUIREMENT NO.2 - "Automate and Interface"

It must do so automatically

Work through a minimal set of interface functions.

REQUIREMENT NO.3 - "Performance and Safety"

It should be done in such a way as:

as simple as possible
as fast as possible
using the specifics of the physical device to its advantage
offer some ways to recover in case of physical damage

What is a "FILE"?

The most important atributes of a file are:

The file NAME
The file BLOCKS it occupies on the device
The file SIZE
The file TYPE

What are the basic minimal interfaces?

The basic operations a filesystem must support are:

Create a file
Open a file
Read from a file
Write to a file position
Close a file
Seek to a certain file position
Delete a file
Get information about a file (size, atributes, rights)

Hands down "on paper" filesystem...

In fact we could just do this:

Keep a list on paper with the names of the files on the device.
Under each name keep a list of the blocks it occupies.
Keep a number with the file SIZE and write down the TYPE of contents in a simple word.

Ironically this is a functional filesystem. And by our definition it is "sub-optimal" because it must be updated manually "on paper"

However it serves a purpose: It helps us understand the minimal operations that will have to be done by a specific filesystem in a real life implementation.

Chapter 2. A Historical re-view of filesystems "evolution"

Question: What concepts we can extract from checking the history of the old ones...

The CP/M filesystem

Uggh... what ...when?

Too old? is that what you are asking? Think again. This was the base for one of the most succesfull/used filesystem of our time

One of the first modern filesystem ever designed

Incredibly enough its creator was right from the start. I am sorry for myself not knowing the name of its creator(s) or team because they deserve a place in history... All I know is that the company was Digital Research (back then).

It was designed for the mass block device of personal computers of it's time: the floppy disk. However its design was so good that it transceded time up to nowadays HDD's

A minimal CP/M description

The general on disk layout

OS Boot code	Track:00,01
FS Root	Start on: Track=02, Head=01, Sector=01 End on: established at design time
FS File data	Start on: End of Root End on: End of Disk

The first 2 tracks are reserved for the OS Boot.

There ia no boot sector or you can consider that there is: a big 2 tracks boot sector.

In this regard it is a huge improvement compared to today's 512bytes one sector boot sector... but since there was no standardised header this in only partially exact.

When the PC was booted it usually simply read the track_0 and track_1 in memory and jumped into it.

It could have been the whole OS or just a smaller loader that would have loaded a seccond stage file from the filesystem itself
... remember Smiddy and his FAT12 SolarOS loader?

Observation:

Basically it was a custom that the filesystem sould start at track_2 but this could have been very easy shifted upwards by a simple define in the specific OS implementation, the rest of the FS would have been the same.

The only problem would have been that other machines using the same FS assumed the start to be at track_2 so it would breack some compatibility ;)

Rant:

What a strange world back then, multiple machines made by very diferent manufacturers; some even made "in house" ... but all shared the same filesystem and mostly the same OS.

And you only have had to impplement 17 standard BIOS hardware functions to have you custom made hardware use the same CP/M OS... deh the world has "evolved" a lot since them... or was "involution" the right word?

So lets me stop this rant for now and return to track_2 of the disk... what was there?

The Root directory

The root directory was starting at a fixed location: Sector:01, Head:01, Track:2.

The root directory contained an vector of what you could name file "etiquetes" or "labels" or "entrys".

The file entry

The numner of those entrys was fixed at design time so let us say there where a maximum of 128 files or 512 files (does this ring a bell?) or sometimes 2048 files entrys available in the Root folder.

It is important to notice that one file could eventually use up more than a single entry in this vector.

Each entry was fixed in size usually 32 or 64 bytes and contained:

the first byte was and indication or the user owning that file (usually 1, or zero if the entry was free or 0xE5h if the entry was deleted)
The file name : 8 bytes for the name and 3 bytes for the extension (now does this really ring a bell?)
the files size in 128bytes increments: 2-3 bytes (for this entry)
some file flags/attributes plus other info
the order number of this file entry: 1 or 2bytes
the allocation "map of blocks" contained into this file extent: 16 blocks usually Each byte, word or dword in this allocation map represented a cluster that belonged to this file

More on allocation blocks and file Creation

When a file was created an empty allocation map was created in the first file entry freely available in the root folder. It contained the name of the file and no allocatio blocks.

As data was written to the file more blocks where alocated to this file and they where written in the allocation map inside the current file entry.

Once 16 clusters have been all fileld up --> a new file entry was created and it contained the exact same name as the entry_1 but was this time marked as entry_2 and new blocks started to be allocated and recorded into its file map area.

This process continued until the whole file was written etc.

File Delete

When a file was deleted an 0xE5h bytes was written to the first byte on the entry efectively marking it as erased but not doing anything else.

The importance of this CP/M FS

It was the first Small,fast, simple and reliable filesystem for personal or micro computers.

All the elements of the moder computin as we know it have been developen based on this FS and its later incarnations.

Microsoft developed: Assembler, Linker, C compiler, first Excel and the first Basic based on this filesystem.

Borland deleloped the first IDE : Turbo Pascal.

One of the best test editors ever: Wordstar started on this also. Not to mention hundreds of other tools and firms that developed applications for it.

The CP/M good things

Simple. Also fast, easy to understand and compatible.
Easy to undelete and safe
Deleted data was no overwritten too soon so you could undelete in case of errros or disaster the FS scattered the allocations in multiple entrys so you cound still locate parts of files from partial root folder content alone.
Hard to lose structural integrity
It was very hard to loose the structural integrity of the FS since there where no duplicated things that needed re-syncronization.
Fast Find of files
The simple fixed structure of file entrys alowed fast parsing or locating of files. Even more this was confined in a single area on the disk. This area was one of the most relable on disk also. On track_2: startup with low data density but not exactly at the margins avaoiding mechanical problems.
Fast sequential and random access to files data
The cluster organization and good localization of files on the disk helped on this.

The CP/M bad things

It had no subfolders structure.

Of course for that time this was not an issue but nowdays it is. It did had "a try" with the USER byte in front of the name. But that only alowed 16 subfolders and there was no name for the folders :D just numbers. Folder 01,02..up to 15 :P

Additional CP/M explanantions

Question: Where was the whole disk allocation actually hold?

Answer: It was interlaced in the root folder. The OS read the root folder once at startup and created a bitmap from all the occupied blocks marked. This way it knowed the free alocation maps and this was keep in syncro in memory with further operations.

Notice: At power down there was no need to save this information, since it could have been re-read again at the next startup from the root folder.

A problem appeared when you did switched the floppy disks in drives and hence you had to press Ctrl+C to let the OS know about this and re-read allocation map in such occasions.

The Microsoft's FATxx Filesystem

I presume that you all know this filesystem a little so I will only highlight the design decisions made by its creators.

First of all let me say that it is based on CP/M. It has the 8.3 filename and ".txt" or ".com" file extensions.