Java之美[从菜鸟到高手演练]之Hadoop常用命令-Hadoop-@大数据资讯

作者：二青

邮箱：xtfggef@gmail.com 微博：http://weibo.com/xtfggef

这篇文章主要是讲一下位于bin下的hadoop命令，我们可以直接输入hadoop无任何参数看一下：

用法就是：hadoop [---config confdir] COMMAND此处COMMAND就是下面列出来的那些，fs, version,jar 等等。

用户命令

fs

目前版本的hadoop已经摒弃了fs命令，取而代之的是hdfs dfs.

Usage: hdfs dfs [GENERIC_OPTIONS] [COMMAND_OPTIONS], 熟悉linux命令的同学学这个很快，举个例子，创建目录，mkdir，在这里为：hdfs dfs -mkdir /test

解释：此处-mkdir即为GENERIC_OPTIONS，后面的/test为COMMAND_OPTIONS，默认路径为/，所以执行完上面的操作后，我们在根路径/下面创建了test目录。下面是我总结了一下hadoop中fs相关的命令，亦可见文件EXCEL。

参数	作用	示例	返回值
appendToFile	将一个或者多个本地文件追加到目的文件	hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile	Returns 0 on success and 1 on error
cat	输出文件	hdfs dfs -cat file:///file3 /user/hadoop/file4	Returns 0 on success and -1 on error
chgrp	改变文件的分组	hdfs dfs -chgrp [-R] GROUP URI [URI ...]
chmod	改变文件的权限	hdfs dfs -chmod [-R] <MODE[,MODE]... \| OCTALMODE> URI [URI ...]
chown	改变文件的拥有者	hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
copyFromLocal	从本地复制
copyToLocal	复制到本地
count	得到文件/目录等数目追加参数-q, -h有不同的意义	hdfs dfs -count -q hdfs://nn1.example.com/file1	Returns 0 on success and -1 on error
cp	复制，参数-f,-p	hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2	Returns 0 on success and -1 on error
du	得到指定文件的大小	hdfs dfs -du /test/hadoop	Returns 0 on success and -1 on error.
dus	已摒弃，和du类似
expunge	清空回收站	hdfs dfs -expunge
get	复制文件到本地路径下	hdfs dfs -get /user/hadoop/file localfile	Returns 0 on success and -1 on error
getfacl	显示文件或者目录的权限控制列表	hdfs dfs -getfacl /file hdfs dfs -getfacl -R /dir	Returns 0 on success and non-zero on error
getfattr	显示文件或者目录的扩展属性	hdfs dfs -getfattr -d /file	Returns 0 on success and non-zero on error
getmerge	合并多个文件一个目标文件里	hdfs dfs -getmerge <src> <localdst> [addnl]
ls	和linux里一样	hdfs dfs -ls /user/hadoop/file1	Returns 0 on success and -1 on error
lsr	等同于ls -R
mkdir	创建目录，-p创建多层目录	hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2	Returns 0 on success and -1 on error
moveFromLocal	类似put，区别在于put完后删除原文件
moveToLocal	目前没有实现
mv	移动文件	hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2	Returns 0 on success and -1 on error
put	像目标目录推送文件	hdfs dfs -put localfile /user/hadoop/hadoopfile	Returns 0 on success and -1 on error
rm	删除文件	hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir	Returns 0 on success and -1 on error
rmr	类似于rm -r
setfacl	设置文件或者目录的权限控制列表	hdfs dfs -setfacl -m user:hadoop:rw- /file	Returns 0 on success and non-zero on error
setfattr	设置文件或者目录的扩展属性	hdfs dfs -setfattr -n user.myAttr -v myValue /file	Returns 0 on success and non-zero on error
setrep	改变文件和目录的复制因子	hdfs dfs -setrep -w 3 /user/hadoop/dir1	Returns 0 on success and -1 on error
stat	返回路径信息	hdfs dfs -stat path	Exit Code: Returns 0 on success and -1 on error
tail	输出文件的最后1千字节	hdfs dfs -tail pathname	Returns 0 on success and -1 on error
test	检查文件	hdfs dfs -test -e filename
text	以文本方式输出文件	hdfs dfs -text <src>
touchz	创建空文件	hdfs dfs -touchz pathname	Returns 0 on success and -1 on error

version

hadoop version查看hadoop版本信息

jar

可以运行jar文件

credential

这个命令用来管理证书，密钥和一些其他的私有信息。

distcp

distcp是一个文件和目录复制命令，用于集群之间已经集群内节点之间的文件复制，详情见官方文档。

fsck

fsck是一个文件系统检查工具，用来检查各类问题，比如文件块丢失等。

用法：hdfs fsck [GENERIC_OPTIONS] <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots]

COMMAND_OPTION	Description
path	Start checking from this path.
-move	Move corrupted files to /lost+found
-delete	Delete corrupted files.
-files	Print out files being checked.
-openforwrite	Print out files opened for write.
-includeSnapshots	Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocks	Print out list of missing blocks and files they belong to.
-blocks	Print out block report.
-locations	Print out locations for every block.
-racks	Print out network topology for data-node locations.

这是一个独立的命令，不属于dfs. 看个例子：检查/下面的文件目录情况。

fetchdt

从namenode上拿到授权符，暂时没有用到，后面更新。

job

与map reduce的job进行交互。

COMMAND_OPTION	Description
-submit job-file	Submits the job.
-status job-id	Prints the map and reduce completion percentage and all job counters.
-counter job-id group-name counter-name	Prints the counter value.
-kill job-id	Kills the job.
-events job-id from-event-# #-of-events	Prints the events' details received by jobtracker for the given range.
-history [all]jobOutputDir	Prints job details, failed and killed tip details. More details about the job such as successful tasks and task attempts made for each task can be viewed by specifying the [all] option.
-list [all]	Displays jobs which are yet to complete. `-list all` displays all jobs.
-kill-task task-id	Kills the task. Killed tasks are NOT counted against failed attempts.
-fail-task task-id	Fails the task. Failed tasks are counted against failed attempts.
-set-priority job-id priority	Changes the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

我们可以通过：mapred job -status <job_id> 来查看job的运行状态。

pipes

运行管道job，稍后更新。

queue

获得与job交互的队列的信息，用法：./mapred queue -info default

CLASSNAME

hadoop可以运行任何java class。

classpath

打印类路径。

管理命令

balancer

查看系统平衡情况，hdfs balancer [-threshold <threshold>] [-policy <policy>], 最直接的用法：./hdfs balancer

[html] view plaincopy

root@master:~/hadoop/bin# ./hdfs balancer
15/01/15 16:20:03 INFO balancer.Balancer: namenodes = [hdfs://master:9000]
15/01/15 16:20:03 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/01/15 16:20:06 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.116:50010
15/01/15 16:20:06 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.189:50010
15/01/15 16:20:06 INFO balancer.Balancer: 0 over-utilized: []
15/01/15 16:20:06 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Jan 15, 2015 4:20:06 PM 0 0 B 0 B -1 B
Jan 15, 2015 4:20:06 PM Balancing took 3.215 seconds
root@master:~/hadoop/bin#

datanode

运行一个data node，Usage: hdfs datanode [-regular | -rollback | -rollingupgrace rollback]

COMMAND_OPTION	Description
-regular	Normal datanode startup (default).
-rollback	Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version.
-rollingupgrade rollback	Rollback a rolling upgrade operation.

daemonlog

dfsadmin

dfs管理工具，功能比较强大，参考官方文档。

Mover

一个新的数据移植工具，和balancer类似，定期扫描HDFS系统里的archive文件，来检查是否满足HDFS存储策略，如果不满足则将其副本移到另一个地方。

namenode

运行namenode，也是一个比较核心、强大的工具，参考官方文档。

secondarynamenode

运行HDFS次要name node。

hsadmin

管理HDFS history server

historyserver

启动history server，用法：./mapred historyserver

原文链接：http://blog.csdn.net/zhangerqing/article/details/42718103