Back to www.oby.ro

About File Systems Design

Motto:

If you have no old people
Then you need to buy some!

-Old Romanian saying


Copyright (c) 2001,2016 Bogdan Valentin Ontanu. All rights reserved

Chapter 1. The Requirements

Requirements Introduction

What are the requirements for a file system?

In order to respect Einstein's first rule: "Use the most simple solution that is functional but not simpler."

We must find the point where a filesystem becomes functional.

Starting from that exact point ...

Requirements analysis

We must first consider the devices for those filesystems. Below is a short list of devices taken into consideration:

All of those device have some things in common:

There are some notable exceptions from the above principles:

Aparently those devices have no moving parts and the access to random data is not far slower than sequenial reading.

And to some extent it is true... but not exactly ;) we will analyse the differences when appropiate. As we will analyse the huge mistakes, miss understandings and havoc that those "exceptions" have generated into the filesystems implementations and use.

So what must a filesytem do?

This is the minimal set of requirements:

REQUIREMENT NO.1 - "To Organize"
  • It must organize the blocks of data on the physical device in such a way as to be grouped and access as "files".
  • A "file" is simply a set of such blocks with a conveient name and set of properties attached / labeled to it.
    REQUIREMENT NO.2 - "Automate and Interface"
  • It must do so automatically
  • Work through a minimal set of interface functions.
    REQUIREMENT NO.3 - "Performance and Safety"

    It should be done in such a way as:

    What is a "FILE"?

    The most important atributes of a file are:

    What are the basic minimal interfaces?

    The basic operations a filesystem must support are:

    Hands down "on paper" filesystem...

    In fact we could just do this:

    Ironically this is a functional filesystem. And by our definition it is "sub-optimal" because it must be updated manually "on paper"

    However it serves a purpose: It helps us understand the minimal operations that will have to be done by a specific filesystem in a real life implementation.

    Chapter 2. A Historical re-view of filesystems "evolution"

    Question: What concepts we can extract from checking the history of the old ones...

    The CP/M filesystem

    Uggh... what ...when?

    Too old? is that what you are asking? Think again. This was the base for one of the most succesfull/used filesystem of our time

    One of the first modern filesystem ever designed

    Incredibly enough its creator was right from the start. I am sorry for myself not knowing the name of its creator(s) or team because they deserve a place in history... All I know is that the company was Digital Research (back then).

    It was designed for the mass block device of personal computers of it's time: the floppy disk. However its design was so good that it transceded time up to nowadays HDD's

    A minimal CP/M description

    The general on disk layout

    OS Boot code Track:00,01
    FS Root Start on: Track=02, Head=01, Sector=01
    End on: established at design time
    FS File data Start on: End of Root
    End on: End of Disk

    The first 2 tracks are reserved for the OS Boot.

    There ia no boot sector or you can consider that there is: a big 2 tracks boot sector.

    In this regard it is a huge improvement compared to today's 512bytes one sector boot sector... but since there was no standardised header this in only partially exact.

    When the PC was booted it usually simply read the track_0 and track_1 in memory and jumped into it.

    It could have been the whole OS or just a smaller loader that would have loaded a seccond stage file from the filesystem itself
    ... remember Smiddy and his FAT12 SolarOS loader?

    Observation:

    Basically it was a custom that the filesystem sould start at track_2 but this could have been very easy shifted upwards by a simple define in the specific OS implementation, the rest of the FS would have been the same.

    The only problem would have been that other machines using the same FS assumed the start to be at track_2 so it would breack some compatibility ;)

    Rant:


    What a strange world back then, multiple machines made by very diferent manufacturers; some even made "in house" ... but all shared the same filesystem and mostly the same OS.

    And you only have had to impplement 17 standard BIOS hardware functions to have you custom made hardware use the same CP/M OS... deh the world has "evolved" a lot since them... or was "involution" the right word?

    So lets me stop this rant for now and return to track_2 of the disk... what was there?


    The Root directory

    The root directory was starting at a fixed location: Sector:01, Head:01, Track:2.

    The root directory contained an vector of what you could name file "etiquetes" or "labels" or "entrys".

    The file entry

    The numner of those entrys was fixed at design time so let us say there where a maximum of 128 files or 512 files (does this ring a bell?) or sometimes 2048 files entrys available in the Root folder.

    It is important to notice that one file could eventually use up more than a single entry in this vector.

    Each entry was fixed in size usually 32 or 64 bytes and contained:

    More on allocation blocks and file Creation

    When a file was created an empty allocation map was created in the first file entry freely available in the root folder. It contained the name of the file and no allocatio blocks.

    As data was written to the file more blocks where alocated to this file and they where written in the allocation map inside the current file entry.

    Once 16 clusters have been all fileld up --> a new file entry was created and it contained the exact same name as the entry_1 but was this time marked as entry_2 and new blocks started to be allocated and recorded into its file map area.

    This process continued until the whole file was written etc.

    File Delete

    When a file was deleted an 0xE5h bytes was written to the first byte on the entry efectively marking it as erased but not doing anything else.

    The importance of this CP/M FS

    It was the first Small,fast, simple and reliable filesystem for personal or micro computers.

    All the elements of the moder computin as we know it have been developen based on this FS and its later incarnations.

    Microsoft developed: Assembler, Linker, C compiler, first Excel and the first Basic based on this filesystem.

    Borland deleloped the first IDE : Turbo Pascal.

    One of the best test editors ever: Wordstar started on this also. Not to mention hundreds of other tools and firms that developed applications for it.

    The CP/M good things

    The CP/M bad things

    Of course for that time this was not an issue but nowdays it is. It did had "a try" with the USER byte in front of the name. But that only alowed 16 subfolders and there was no name for the folders :D just numbers. Folder 01,02..up to 15 :P

    Additional CP/M explanantions

    Question: Where was the whole disk allocation actually hold?

    Answer: It was interlaced in the root folder. The OS read the root folder once at startup and created a bitmap from all the occupied blocks marked. This way it knowed the free alocation maps and this was keep in syncro in memory with further operations.

    Notice: At power down there was no need to save this information, since it could have been re-read again at the next startup from the root folder.

    A problem appeared when you did switched the floppy disks in drives and hence you had to press Ctrl+C to let the OS know about this and re-read allocation map in such occasions.

    The Microsoft's FATxx Filesystem

    I presume that you all know this filesystem a little so I will only highlight the design decisions made by its creators.

    First of all let me say that it is based on CP/M. It has the 8.3 filename and ".txt" or ".com" file extensions.


    To be continued....