一、Introduction
1、GFS:google file system,a scalable distributed file system
2、Key Observations:
First,component failures are the norm rather than the exception
Second,files are huge by traditional standards.Multi-GB files are common.
Third,most files are mutated by appending new data rather than overwriting existing data.
Fourth,co-designing the applications and the file system API benefits the overall system by increasing our flexibility.
二、Design Overview
1、Assumptions
1)the system is built from many inexpensive commodity
2)the system stores a modest number of large files
3)the workloads primarily consist of two kinds of reads
4)the workloads hava many large,sequential writes that append data to files
5)the system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file
6)high subtained bandwidth is more important than low latency
2、Interface
create,delete,open,close,read,write
snapshot,record append
3、Architecture
A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients.
the master periodically communicates with each chunkserver in HeartBeat messages
4、Single Master
Having a single master simplify design.
we mush minimize its involvement in reads and writes so that is does
not become a bottleneck.
5、Chunk size
chunk size is 64MB
a file consists of a number of chunks
6、Metadata
stores three types of metadata:the file and chunnk namespaces,the mapping from files to chunks,the locations of each chunk's replicas
All metadata is kept in the master's memory and the first two are also kept persistent on the local disk.
7、Consistency Model
三、System interactions
control flows
data flows
atomic record appends
snapshot
AFS(The Andrew File System )是美国卡内基梅隆大学开发的一种分布式文件系统,它的主要功能是用于管理分布在网络不同节点上的文件。与普通文件系统相比,AFS的主要特点在于三个方面:分布式、跨平台、高安全性。
四、master operation
namespace management and locking
replica placement
creation,re-replication,rebalancing
garbage collection
stale replica detection
五、fault tolerance and diagnosis
high availability(高可用性)
data integrity
diagnostic tools(log) |
|