你好,游客 登录 注册 搜索
背景:
阅读新闻

hadoop:org.apache.hadoop.hdfs.server.namenode各个类的功能与角色

[日期:2013-06-07] 来源:CSDN  作者:kntao [字体: ]

以hadoop0.21为例。

NameNode.java: 主要维护文件系统的名字空间和文件的元数据,以下是代码中的说明。

/********************************************************** 

 * NameNode serves as both directory namespace manager and 

 * "inode table" for the Hadoop DFS.  There is a single NameNode 

 * running in any DFS deployment.  (Well, except when there 

 * is a second backup/failover NameNode.) 

 * 

 * The NameNode controls two critical tables: 

 *   1)  filename ->blocksequence (namespace) 

 *   2)  block ->machinelist ("inodes") 

 * 

 * The first table is stored on disk and is very precious. 

 * The second table is rebuilt every time the NameNode comes 

 * up. 

 * 

 * 'NameNode' refers to both this class as well as the 'NameNode server'. 

 * The 'FSNamesystem' class actually performs most of the filesystem 

 * management.  The majority of the 'NameNode' class itself is concerned 

 * with exposing the IPC interface and the http server to the outside world, 

 * plus some configuration management. 

 * 

 * NameNode implements the ClientProtocol interface, which allows 

 * clients to ask for DFS services.  ClientProtocol is not 

 * designed for direct use by authors of DFS client code.  End -users 

 * should instead use the org.apache.nutch.hadoop.fs.FileSystem class. 

 * 

 * NameNode also implements the DatanodeProtocol interface, used by 

 * DataNode programs that actually store DFS data blocks.  These 

 * methods are invoked repeatedly and automatically by all the 

 * DataNodes in a DFS deployment. 

 * 

 * NameNode also implements the NamenodeProtocol interface, used by 

 * secondary namenodes or rebalancing processes to get partial namenode's 

 * state, for example partial blocksMap etc. 

 **********************************************************/  

FSNamesystem.java: 主要维护几个表的信息:维护了文件名与block列表的映射关系;有效的block的集合;block与节点列表的映射关系;节点与block列表的映射关系;更新的heatbeat节点的LRU cache

/*************************************************** 

 * FSNamesystem does the actual bookkeeping work for the 

 * DataNode. 

 * 

 * It tracks several important tables. 

 * 

 * 1)  valid fsname --> blocklist  (kept on disk, logged) 

 * 2)  Set of all valid blocks (inverted #1) 

 * 3)  block --> machinelist (kept in memory, rebuilt dynamically from reports) 

 * 4)  machine --> blocklist (inverted #2) 

 * 5)  LRU cache of updated -heartbeat machines 

 ***************************************************/  

INode.java:HDFS将文件和文件目录抽象成INode。

/** 

 * We keep an in-memory representation of the file/block hierarchy. 

 * This is a base INode class containing common fields for file and 

 * directory inodes. 

 */  

FSImage.java:需要将INode信息持久化到磁盘上FSImage上。

/** 

 * FSImage handles checkpointing and logging of the namespace edits. 

 * 

 */  

FSEditLog.java:写Edits文件

/** 

 * FSEditLog maintains a log of the namespace modifications. 

 * 

 */  

BlockInfo.java:INode主要是所文件和目录信息的,而对于文件的内容来说,这是用block描述的。我们假设一个文件的长度大小为Size,那么从文件的0偏移开始,按照固定大小,顺序对文件划分并编号,划分好的每一块为一个block

/** 

 * Internal class for block metadata. 

 */  

DatanodeDescriptor.java:代表的具体的存储对象。

/************************************************** 

 * DatanodeDescriptor tracks stats on a given DataNode, 

 * such as available storage capacity, last update time, etc., 

 * and maintains a set of blocks stored on the datanode. 

 * 

 * This data structure is a data structure that is internal 

 * to the namenode. It is *not* sent over- the- wire to the Client 

 * or the Datnodes. Neither is it stored persistently in the 

 * fsImage. 

 

 **************************************************/  

FSDirectory.java: 代表了HDFS中的所有目录和结构属性

/************************************************* 

 * FSDirectory stores the filesystem directory state. 

 * It handles writing/loading values to disk, and logging 

 * changes as we go. 

 * 

 * It keeps the filename->blockset mapping always- current 

 * and logged to disk. 

 * 

 *************************************************/  

EditLogOutputStream.java:所有的日志记录都是通过EditLogOutputStream输出,在具体实例化的时候,这一组EditLogOutputStream包含多个EditLogFIleOutputStream和一个EditLogBackupOutputStream

/** 

 * A generic abstract class to support journaling of edits logs into 

 * a persistent storage. 

 */  

EditLogFileOutputStream.java:将日志记录写到edits或edits.new中。

/** 

 * An implementation of the abstract class {@link EditLogOutputStream}, which 

 * stores edits in a local file. 

 */  

EditLogBackupOutputStream.java:将日志通过网络发送到backupnode上。

/** 

 * An implementation of the abstract class {@link EditLogOutputStream}, 

 * which streams edits to a backup node. 

 * 

 * @see org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol#journal 

 * (org.apache.hadoop.hdfs.server.protocol.NamenodeRegistration, 

 *  int, int, byte[]) 

 */  

BackupNode.java:name Node的backup:升级阶段:Secondary Name Node -》Checkpoint Node(定期保存元数据,定期checkpoint) -》Backup Node(在内存中保持一份和Name Node完全一致的镜像,当元数据发生变化时,其元数据进行更新,可以利用自身的镜像来checkpoint,无需从nameNode下载)-》Standby Node(可以进行热备)

/** 

 * BackupNode. 

 * <p> 

 * Backup node can play two roles. 

 * <ol> 

 * <li>{@link NamenodeRole#CHECKPOINT} node periodically creates checkpoints, 

 * that is downloads image and edits from the active node, merges them, and 

 * uploads the new image back to the active. </li> 

 * <li>{@link NamenodeRole#BACKUP} node keeps its namespace in sync with the 

 * active node, and periodically creates checkpoints by simply saving the 

 * namespace image to local disk(s).</li> 

 * </ol> 

 */  

BackupStorage.java:在Backup Node备份目录下创建jspool,并创建edits.new,将输出流指向edits.new

/** 

 * Load checkpoint from local files only if the memory state is empty.<br> 

 * Set new checkpoint time received from the name -node. <br> 

 * Move <code>lastcheckpoint.tmp </code> to <code>previous.checkpoint</code> . 

 * @throws IOException 

 */  

TransferFsImage.java:负责从name Node去文件。

/** 

 * This class provides fetching a specified file from the NameNode. 

 */  

GetImageServlet.java:是httpServlet的子类,处理doGet请求。

/** 

 * This class is used in Namesystem's jetty to retrieve a file. 

 * Typically used by the Secondary NameNode to retrieve image and 

 * edit file for periodic checkpointing. 

 */  

 





收藏 推荐 打印 | 录入: | 阅读:
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数
点评:
       
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款