1.hdfs为分布式部署

Hadoop:

1
2
广义:以apache hadoop软件为主的生态圈(hive,zookeeper,spark,hbase)
狭义:apache hadoop软件

查询组件的官网:

1
2
3
hadoop.apache
hive.apache.org
spark.apache.org

hadoop软件有哪些组件:

1
2
3
hdfs:存储,分布式文件系统
mapreduce:计算
yarn:资源(cpu,memory)和作业调度

我所用的版本为 hadoop2.6.0-cdh5.7.0
cdh hadoop:http://archive.cloudera.com/cdh5/cdh/5/

1
可以通过wget上传:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz

1.创建用户和上传hadoop软件:

1
2
3
4
5
6
useradd hadoop
su - hadoop 切换用户
[hadoop@hadoop002 ~]$ mkdir app (需创建一个 app 文件夹 下存放hadoop软件)
[hadoop@hadoop002 ~]$ cd app/
[hadoop@hadoop002 ~]$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz或
[hadoop@hadoop002 ~]$ rz 上传

2.部署jdk:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
由于CDH环境的要求:jdk必须要部署在 /usr/java 目录下,否则会出现一些坑。
以后需要在/usr/shara/java 下部署CDH 需要的 mysql jdbc 的jar包
rz上传 jdk-8u45-linux-x64.gz

解压:
[root@hadoop002 java]# tar -xzvf jdk-8u45-linux-x64.gz
[root@hadoop002 java]# ll
total 319156
drwxr-xr-x 8 uucp 143 4096 Apr 11 2015 jdk1.8.0_45
-rw-r--r-- 1 root root 173271626 Sep 19 11:49 jdk-8u45-linux-x64.gz
注意:(解压之后会出现 jdk中的用户和用户组不对的情况,需要修改用户和用户组,这是查看 cat /etc/passwd,会发现有这两种形式:/bin/bash和/sbin/nologin,将用户的bash修改成nologin,用户就会无法登陆,
su - zookeeper切不了:This account is currently not available.
生产怎么做:/sbin/nologin --》 /bin/bash)

权限修正
[root@hadoop002 java]# chown -R root:root jdk1.8.0_45
[root@hadoop002 java]# ll
total 319156
drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.8.0_45
-rw-r--r-- 1 root root 173271626 Sep 19 11:49 jdk-8u45-linux-x64.gz
配置环境变量:
[root@hadoop002 java]# vi /etc/profile
#env
export JAVA_HOME=/usr/java/jdk1.8.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JER_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JER_HOME/bin:$PATH
配置完成后 source /etc/peofile

3.解压hadoop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
3.解压hadoop
[hadoop@hadoop002 app]$ tar -xzvf hadoop-2.6.0-cdh5.7.0.tar.gz
[hadoop@hadoop002 app]$ cd hadoop-2.6.0-cdh5.7.0
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ ll
total 76
drwxr-xr-x 2 hadoop hadoop 4096 Mar 24 2016 bin 可执行脚本
drwxr-xr-x 2 hadoop hadoop 4096 Mar 24 2016 bin-mapreduce1
drwxr-xr-x 3 hadoop hadoop 4096 Mar 24 2016 cloudera
drwxr-xr-x 6 hadoop hadoop 4096 Mar 24 2016 etc 配置目录(conf)
drwxr-xr-x 5 hadoop hadoop 4096 Mar 24 2016 examples
drwxr-xr-x 3 hadoop hadoop 4096 Mar 24 2016 examples-mapreduce1
drwxr-xr-x 2 hadoop hadoop 4096 Mar 24 2016 include
drwxr-xr-x 3 hadoop hadoop 4096 Mar 24 2016 lib jar包目录
drwxr-xr-x 2 hadoop hadoop 4096 Mar 24 2016 libexec
drwxr-xr-x 3 hadoop hadoop 4096 Mar 24 2016 sbin hadoop组件的启动 停止脚本
drwxr-xr-x 4 hadoop hadoop 4096 Mar 24 2016 share
drwxr-xr-x 17 hadoop hadoop 4096 Mar 24 2016 src

4.修改配置文件信息Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Use the following:

etc/hadoop/core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>


etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

5.配置ssh localhost无密码信任关系:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
	

> 这里要注意 我用的和官方的方法不一样

[hadoop@hadoop002 ~]$ ssh-keygen(密钥生成)
Generating public/private rsa key pair.(三下回车)
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
ba:48:3d:ff:af:4d:da:74:67:31:d6:98:ad:a0:b3:76 hadoop@hadoop002
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| +.|
| S . o+o|
| . . . ...o|
| . + o o o o|
| . . + .OE. o |
| . . .o=++ |
+-----------------+
[hadoop@hadoop002 ~]$ cd .ssh
[hadoop@hadoop002 .ssh]$ ll
total 8
-rw------- 1 hadoop hadoop 1675 Feb 13 22:36 id_rsa 私钥
-rw-r--r-- 1 hadoop hadoop 398 Feb 13 22:36 id_rsa.pub 公钥
[hadoop@hadoop002 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys(将公钥放到认证文件中,来提供访问)

[hadoop@hadoop002 .ssh]$ ll
total 12
-rw-rw-r-- 1 hadoop hadoop 398 Feb 13 22:37 authorized_keys
-rw------- 1 hadoop hadoop 1675 Feb 13 22:36 id_rsa
-rw-r--r-- 1 hadoop hadoop 398 Feb 13 22:36 id_rsa.pub
-rw-r--r-- 1 hadoop hadoop 0 Feb 13 22:39 known_hosts

ssh localhost date 是需要输入密码,但是这个用户是没有配置密码。
我们应该在没有配置密码情况下去完成无密码信任呢?

将authorized_keys 权限修改成 640
[hadoop@hadoop002 .ssh]$ chmod 600 authorized_keys
[hadoop@hadoop002 .ssh]$ ssh localhost date
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is b1:94:33:ec:95:89:bf:06:3b:ef:30:2f:d7:8e:d2:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Wed Feb 13 22:41:17 CST 2019

6.格式化:

1
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ bin/hdfs namenode -format

7.启动:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ sbin/start-dfs.sh(启动)
19/02/13 22:57:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop002.out
localhost: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop002.out

> ssh 信任关系 是配置localhost

Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hadoop002.out
19/02/13 22:57:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps
15059 Jps
14948 SecondaryNameNode 第二名称节点 老二
14783 DataNode 数据节点 小弟
14655 NameNode 名称节点 老大 读写
(必须保证这三个进程都有)
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

open http://ip:50070

8.配置环境变量:

1
2
3
4
5
6
7
8
9
10
11
12
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ vi ~/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs
export HADOOP_PREFIX=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_PREFIX/bin:$PATH
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

9.命令:

1
2
hdfs dfs -mkdir /ruozedate(在根目录下创建一个ruozedate文件夹)
hdfs dfs -ls /(查看根目录下文件或者文件夹)

10.如何查看命令帮助:

1
hdfs dfs

本文标题:1.hdfs为分布式部署

文章作者:skygzx

发布时间:2019年04月07日 - 09:02

最后更新:2019年04月07日 - 12:26

原始链接:http://yoursite.com/2019/04/07/1.hadoop伪分布式部署-hdfs/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

-------------本文结束感谢您的阅读-------------
0%