______ File systems (CS 273 (OS), Fall 2020)
Home
>>    




File systems

CS 273 (OS), Fall 2020

Typical system calls

See Selected Linux System Calls. Some of these (e.g., open, read, write) are typical in operating systems; others (e.g., mknod, umask, link) may be specific to Linux/Linux.

Features apparent to user

  • Logical internal organization
    • Linear sequence of bytes
    • Hierarchical tree. E.g., CDC Cyber (partitions, sections, records)
  • Device independence
  • File naming.
    • Extensions (recognized by OS? by programs?)
    • E.g., MS-DOS .EXE, .COM; Linux file program
  • File types.
    • Internal structure of executable (p.385).
      (Relocation bits tell loader which words are addresses that must be changed if physical memory location for code changes.)
    • RCS file (man 5 rcsfile)
  • Random access. E.g., Linux lseek; hierarchical access
  • Memory mapped files
    • Idea: associate memory with file; use memory addresses instead of system calls
    • Segmented memory is convenient
    • File consistency?
    • Huge files?

Issues and practices in file system design

  • Implementation using disk blocks
    • Size of a disk block? Internal fragmentation vs I/O efficiency
    • Free block management? Linked lists, bitmaps
    • Contiguous disk blocks?
  • File storage strategies
  • File sharing
    • E.g., Linux links ("hard" and "soft")
    • Backup?
  • Reliability
    • Backup. Incremental? Full? Archive bit?
    • Bad blocks. Hardware (addresses stored in particular sector)? Software (bad block file)?
    • Consistency check. Linux fsck, Windows/DOS SCANDISK.
    • fsck algorithm:
      1. Record how often each block appears (a) in a file, (b) in free list
      2. Fix problems (missing blocks, duplicate free blocks, block in multiple files, free+allocated)
      3. Record how often each file appears in each directory
      4. Fix link counts (too high? too low?)
    • DOS undelete (lazy placement of blocks onto free list)
  • Performance
    • Block cache.
      • Same issues as paging, but different values, e.g., LRU now feasible
      • Blocks needed for consistency, e.g., crash holding modified inode?
      • Likely unneeded blocks, e.g., doubly indirect blocks?
      • Modified LRU:
        • Quickly expell blocks that won't be needed again soon
        • Quickly write blocks needed for file system consistency
      • Sync/update
      • Write-through cache
    • Cluster blocks together on disk?

Some trends in file systems

  • LFS, Log-structured File System

    • Motivation: relative difference in speeds between CPU and disk access is increasing; main memories are getting large enough to store entire files in the disk cache in main memory.

      (Note: Relative differences in speeds along the memory hierarchy continues to increase, and is part of motivation for multicore -- the "Memory Wall")

    • Motivation: most writes are small, and randomly distributed over disk (e.g., writing out i-nodes and directories for consistency, or writing a portion of a file)

    • Strategy: write a log of what file system operations should be performed, before performing them in cache. Write only the log on disk.

    • Advantage: More efficient use of disk I/O; more reliable (can use log to recover from crashes in mid-operation); order of magnitude better performance for small writes, and comparable for ordinary sized operations.

    • Disadvantage: I-nodes now scattered around disk, so must build an i-node map to find them; deletions create holes, leading to a need for compaction; most importantly, very incompatable with existing file systems.

  • Journaling file system, e.g., NTFS, Linux ReiserFS and ext3+: keep a more traditional file system structure, but add logging of operations before performing them.

    Those operations must be idempotent, and for reliability should be atomic

  • Virtual file system.

  • Distributed file system.
    Example: HDFS (implemented over local file systems!) - redundant copies of all data, designed for fault tolerance and petascale computations (e.g., 64MB blocks)

  • LVM (Logical Volume Management) - multiple physical disk partitions presented as a single logical partition