Spark编译hadoop-2.6.0-cdh2.7.0

Spark-2.4.0下载地址:

官方地址:https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz

编译Spark源码的文档(参考官方文档)

http://spark.apache.org/docs/latest/building-spark.html

编译Spark源码的前置要求

软件 Hadoop scala maven JDK
版本 2.6.0-cdh5.7.0 2.11.12 3.6.1 jdk1.8.0_45

编译与配置:

1解压Spark源码:

1
2
3
4
5
6
7
[hadoop@hadoop001 software]$ ll spark-2.4.2.tgz 

-rw-r--r--. 1 hadoop hadoop 16165557 428 04:41 spark-2.4.2.tgz

[hadoop@hadoop001 software]$ tar -zxvf spark-2.4.2.tgz

[hadoop@hadoop001 software]$ cd spark-2.4.2

修改make-make-distribution.sh中的版本号,避免编译时自己取寻找

make-distribution.sh脚本的Github地址:

https://github.com/apache/spark/blob/master/dev/make-distribution.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[hadoop@hadoop001 spark-2.4.2]$ vim dev/make-distribution.sh
//修改
VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| fgrep --count "<id>hive</id>";\
# Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# because we use "set -o pipefail"
echo -n)

//修改为:
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

3.修改 pom.xml文件

如果要编译 cdh,必须要添加一个仓库

1
2
3
4
5
6
7
8
[hadoop@hadoop614 spark-2.4.2]$ vim pom.xml 

<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

4.编译命令

通过观察pom.xml,可以观察到编译Spark的时候,如果不手动指定hadoop与yarn的版本,会默认采用hadoop、yarn的版本

1
2
3
4
5
6
7
8
9
10
11
[hadoop@hadoop001 spark-2.4.2]$ pwd
/home/hadoop/software/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ ./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn -Pkubernetes


--name:设置打包后的包名字中添加2.6.0-cdh-5.7.0,方便自己知道支持哪个版本
-Phadoop-2.6:指定hadoop的版本是2.6
-Dhadoop.version=2.6.0-cdh-5.7.0 指定hadoop具体是使用哪一个版本,如果不指定,竟会使用默认版本
-Phive:支持使用hive
-Phive-thriftserver 支持使用Jdbc连接池
-Pyarn:支持使用yarn,并且版本号与hadoop相同,如果想更换版本号,则采用-Dhadoop.version

解压部署

1.解压

1
2
3
4
5
6
7
8
[hadoop@hadoop001 spark-2.4.2]$ ll spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz 
-rw-rw-r--. 1 hadoop hadoop 231193116 428 06:32 spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
[hadoop@hadoop001 spark-2.4.2]$ pwd
/home/hadoop/software/spark-2.4.2
[hadoop@hadoop001 spark-2.4.2]$ tar -zxvf spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz -C ~/app
[hadoop@hadoop001 spark-2.4.2]$ cd ~/app
[hadoop@hadoop001 app]$ ls -ld spark-2.4.2-bin-2.6.0-cdh5.7.0/
drwxrwxr-x. 11 hadoop hadoop 4096 428 06:31 spark-2.4.2-bin-2.6.0-cdh5.7.0/

2.配置环境变量

1
2
3
4
5
6
[hadoop@hadoop001 app]$ vim ~/.bash_profile 

export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0
export PATH=${SPARK_HOME}/bin:$PATH

[hadoop@hadoop001 app]$ source ~/.bash_profile

启动Spark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[hadoop@hadoop001 spark-2.4.2]$ ./spark-shell 
19/04/28 06:44:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop614:4040
Spark context available as 'sc' (master = local[*], app id = local-1556405067469).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.2
/_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

master:运行的模式

local:表示在本地上运行

本文标题:Spark编译hadoop-2.6.0-cdh2.7.0

文章作者:skygzx

发布时间:2019年04月28日 - 16:14

最后更新:2019年04月30日 - 15:39

原始链接:http://yoursite.com/2019/04/28/Spark编译hadoop-2.6.0-cdh2.7.0/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

-------------本文结束感谢您的阅读-------------
0%