File systems
CS 273 (OS), Fall 2020
Typical system calls
See Selected Linux System Calls. Some
of these (e.g., open, read, write
) are typical in
operating systems; others (e.g., mknod, umask, link
) may
be specific to Linux/Linux.
Features apparent to user
- Logical internal organization
- Linear sequence of bytes
- Hierarchical tree. E.g., CDC Cyber (partitions, sections, records)
- Device independence
- File naming.
- Extensions (recognized by OS? by programs?)
- E.g., MS-DOS
.EXE
,.COM
; Linuxfile
program - File types.
- Internal structure of executable (p.385).
(Relocation bits tell loader which words are addresses that must be changed if physical memory location for code changes.) - RCS file (
man 5 rcsfile
) - Random access. E.g., Linux
lseek
; hierarchical access - Memory mapped files
- Idea: associate memory with file; use memory addresses instead of system calls
- Segmented memory is convenient
- File consistency?
- Huge files?
Issues and practices in file system design
- Implementation using disk blocks
- Size of a disk block? Internal fragmentation vs I/O efficiency
- Free block management? Linked lists, bitmaps
- Contiguous disk blocks?
- File storage strategies
- E.g., MS-DOS FAT; Linux i-node
(
include/uapi/linux/minix_fs.h
,include/linux/fs.h
); NTFS Master File Table (MFT) - Implementation of directories (unique dir; MS-DOS; Linux; NTFS ).
Linux: (
include/uapi/linux/minix_fs.h
) - Data structure for managing a file system: Linux superblock, e.g.,
(
include/uapi/linux/minix_fs.h
(see also Minix book, p. 301) ?>) - File sharing
- E.g., Linux links ("hard" and "soft")
- Backup?
- Reliability
- Backup. Incremental? Full? Archive bit?
- Bad blocks. Hardware (addresses stored in particular sector)? Software (bad block file)?
- Consistency check. Linux
fsck
, Windows/DOSSCANDISK
. fsck
algorithm:- Record how often each block appears (a) in a file, (b) in free list
- Fix problems (missing blocks, duplicate free blocks, block in multiple files, free+allocated)
- Record how often each file appears in each directory
- Fix link counts (too high? too low?)
- DOS undelete (lazy placement of blocks onto free list)
- Performance
- Block cache.
- Same issues as paging, but different values, e.g., LRU now feasible
- Blocks needed for consistency, e.g., crash holding modified inode?
- Likely unneeded blocks, e.g., doubly indirect blocks?
- Modified LRU:
- Quickly expell blocks that won't be needed again soon
- Quickly write blocks needed for file system consistency
- Sync/update
- Write-through cache
- Cluster blocks together on disk?
Some trends in file systems
LFS, Log-structured File System
Motivation: relative difference in speeds between CPU and disk access is increasing; main memories are getting large enough to store entire files in the disk cache in main memory.
(Note: Relative differences in speeds along the memory hierarchy continues to increase, and is part of motivation for multicore -- the "Memory Wall")
Motivation: most writes are small, and randomly distributed over disk (e.g., writing out i-nodes and directories for consistency, or writing a portion of a file)
Strategy: write a log of what file system operations should be performed, before performing them in cache. Write only the log on disk.
Advantage: More efficient use of disk I/O; more reliable (can use log to recover from crashes in mid-operation); order of magnitude better performance for small writes, and comparable for ordinary sized operations.
Disadvantage: I-nodes now scattered around disk, so must build an i-node map to find them; deletions create holes, leading to a need for compaction; most importantly, very incompatable with existing file systems.
Journaling file system, e.g., NTFS, Linux ReiserFS and ext3+: keep a more traditional file system structure, but add logging of operations before performing them.
Those operations must be idempotent, and for reliability should be atomic
Virtual file system.
Distributed file system.
Example: HDFS (implemented over local file systems!) - redundant copies of all data, designed for fault tolerance and petascale computations (e.g., 64MB blocks)LVM (Logical Volume Management) - multiple physical disk partitions presented as a single logical partition