博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark进阶之路-Standalone模式搭建
阅读量:6877 次
发布时间:2019-06-26

本文共 10543 字,大约阅读时间需要 35 分钟。

              Spark进阶之路-Standalone模式搭建

                                   作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

 

 

一.Spark的集群的准备环境

1>.master节点信息(s101)

2>.worker节点信息(s102)

 

3>.worker节点信息(s103)

 

4>.worker节点信息(s104)

 

 

二.Spark的Standalone模式搭建

1>.下载Spark安装包

  Spark下载地址:https://archive.apache.org/dist/spark/ 

[yinzhengjie@s101 download]$ sudo yum -y install wget[sudo] password for yinzhengjie: Loaded plugins: fastestmirrorLoading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.comResolving Dependencies--> Running transaction check---> Package wget.x86_64 0:1.14-15.el7_4.1 will be installed--> Finished Dependency ResolutionDependencies Resolved===================================================================================================================================================================== Package                             Arch                                  Version                                         Repository                           Size=====================================================================================================================================================================Installing: wget                                x86_64                                1.14-15.el7_4.1                                 base                                547 kTransaction Summary=====================================================================================================================================================================Install  1 PackageTotal download size: 547 kInstalled size: 2.0 MDownloading packages:wget-1.14-15.el7_4.1.x86_64.rpm                                                                                                               | 547 kB  00:00:00     Running transaction checkRunning transaction testTransaction test succeededRunning transaction  Installing : wget-1.14-15.el7_4.1.x86_64                                                                                                                       1/1   Verifying  : wget-1.14-15.el7_4.1.x86_64                                                                                                                       1/1 Installed:  wget.x86_64 0:1.14-15.el7_4.1                                                                                                                                      Complete![yinzhengjie@s101 download]$
安装wget软件包([yinzhengjie@s101 download]$ sudo yum -y install wget)
[yinzhengjie@s101 download]$ wget https://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz    #下载你想要下载的版本

2>.解压配置文件

[yinzhengjie@s101 download]$ lltotal 622512-rw-r--r--. 1 yinzhengjie yinzhengjie 214092195 Aug 26  2016 hadoop-2.7.3.tar.gz-rw-r--r--. 1 yinzhengjie yinzhengjie 185540433 May 17  2017 jdk-8u131-linux-x64.tar.gz-rw-r--r--. 1 yinzhengjie yinzhengjie 201142612 Jul 25  2017 spark-2.1.1-bin-hadoop2.7.tgz-rw-r--r--. 1 yinzhengjie yinzhengjie  36667596 Jun 20 09:29 zookeeper-3.4.12.tar.gz[yinzhengjie@s101 download]$ [yinzhengjie@s101 download]$ tar -xf spark-2.1.1-bin-hadoop2.7.tgz -C /soft/              #加压Spark安装包到指定目录[yinzhengjie@s101 download]$ ll /soft/total 16lrwxrwxrwx.  1 yinzhengjie yinzhengjie   19 Aug 13 10:31 hadoop -> /soft/hadoop-2.7.3/drwxr-xr-x. 10 yinzhengjie yinzhengjie 4096 Aug 13 12:44 hadoop-2.7.3lrwxrwxrwx.  1 yinzhengjie yinzhengjie   19 Aug 13 10:32 jdk -> /soft/jdk1.8.0_131/drwxr-xr-x.  8 yinzhengjie yinzhengjie 4096 Mar 15  2017 jdk1.8.0_131drwxr-xr-x. 12 yinzhengjie yinzhengjie 4096 Apr 25  2017 spark-2.1.1-bin-hadoop2.7lrwxrwxrwx.  1 yinzhengjie yinzhengjie   23 Aug 13 12:13 zk -> /soft/zookeeper-3.4.12/drwxr-xr-x. 10 yinzhengjie yinzhengjie 4096 Mar 27 00:36 zookeeper-3.4.12[yinzhengjie@s101 download]$ ll /soft/spark-2.1.1-bin-hadoop2.7/                    #查看目录结构total 88drwxr-xr-x. 2 yinzhengjie yinzhengjie  4096 Apr 25  2017 bindrwxr-xr-x. 2 yinzhengjie yinzhengjie  4096 Apr 25  2017 confdrwxr-xr-x. 5 yinzhengjie yinzhengjie    47 Apr 25  2017 datadrwxr-xr-x. 4 yinzhengjie yinzhengjie    27 Apr 25  2017 examplesdrwxr-xr-x. 2 yinzhengjie yinzhengjie  8192 Apr 25  2017 jars-rw-r--r--. 1 yinzhengjie yinzhengjie 17811 Apr 25  2017 LICENSEdrwxr-xr-x. 2 yinzhengjie yinzhengjie  4096 Apr 25  2017 licenses-rw-r--r--. 1 yinzhengjie yinzhengjie 24645 Apr 25  2017 NOTICEdrwxr-xr-x. 8 yinzhengjie yinzhengjie  4096 Apr 25  2017 pythondrwxr-xr-x. 3 yinzhengjie yinzhengjie    16 Apr 25  2017 R-rw-r--r--. 1 yinzhengjie yinzhengjie  3817 Apr 25  2017 README.md-rw-r--r--. 1 yinzhengjie yinzhengjie   128 Apr 25  2017 RELEASEdrwxr-xr-x. 2 yinzhengjie yinzhengjie  4096 Apr 25  2017 sbindrwxr-xr-x. 2 yinzhengjie yinzhengjie    41 Apr 25  2017 yarn[yinzhengjie@s101 download]$

3>.编辑slaves配置文件,将worker的节点主机名输入,默认是localhost

[yinzhengjie@s101 download]$ cd /soft/spark-2.1.1-bin-hadoop2.7/conf/[yinzhengjie@s101 conf]$ lltotal 32-rw-r--r--. 1 yinzhengjie yinzhengjie  987 Apr 25  2017 docker.properties.template-rw-r--r--. 1 yinzhengjie yinzhengjie 1105 Apr 25  2017 fairscheduler.xml.template-rw-r--r--. 1 yinzhengjie yinzhengjie 2025 Apr 25  2017 log4j.properties.template-rw-r--r--. 1 yinzhengjie yinzhengjie 7313 Apr 25  2017 metrics.properties.template-rw-r--r--. 1 yinzhengjie yinzhengjie  865 Apr 25  2017 slaves.template-rw-r--r--. 1 yinzhengjie yinzhengjie 1292 Apr 25  2017 spark-defaults.conf.template-rwxr-xr-x. 1 yinzhengjie yinzhengjie 3960 Apr 25  2017 spark-env.sh.template[yinzhengjie@s101 conf]$ cp slaves.template slaves[yinzhengjie@s101 conf]$ vi slaves[yinzhengjie@s101 conf]$ cat slaves## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## A Spark Worker will be started on each of the machines listed below.s102s103s104[yinzhengjie@s101 conf]$

4>.编辑spark-env.sh文件,指定master节点和端口号

[yinzhengjie@s101 ~]$ cp /soft/spark/conf/spark-env.sh.template /soft/spark/conf/spark-env.sh[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ echo export JAVA_HOME=/soft/jdk >> /soft/spark/conf/spark-env.sh[yinzhengjie@s101 ~]$ echo SPARK_MASTER_HOST=s101 >> /soft/spark/conf/spark-env.sh[yinzhengjie@s101 ~]$ echo SPARK_MASTER_PORT=7077 >> /soft/spark/conf/spark-env.sh[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ grep -v ^# /soft/spark/conf/spark-env.sh | grep -v ^$export JAVA_HOME=/soft/jdkSPARK_MASTER_HOST=s101SPARK_MASTER_PORT=7077[yinzhengjie@s101 ~]$

5>.将s101的spark配置信息分发到worker节点

[yinzhengjie@s101 ~]$ more `which xrsync.sh`#!/bin/bash#@author :yinzhengjie#blog:http://www.cnblogs.com/yinzhengjie#EMAIL:y1053419035@qq.com#判断用户是否传参if [ $# -lt 1 ];then        echo "请输入参数";        exitfi#获取文件路径file=$@#获取子路径filename=`basename $file`#获取父路径dirpath=`dirname $file`#获取完整路径cd $dirpathfullpath=`pwd -P`#同步文件到DataNodefor (( i=102;i<=104;i++ ))do        #使终端变绿色         tput setaf 2        echo =========== s$i %file ===========        #使终端变回原来的颜色,即白灰色        tput setaf 7        #远程执行命令        rsync -lr $filename `whoami`@s$i:$fullpath        #判断命令是否执行成功        if [ $? == 0 ];then                echo "命令执行成功"        fidone[yinzhengjie@s101 ~]$
需要配置无秘钥登录,之后执行启动脚本进行同步([yinzhengjie@s101 ~]$ more `which xrsync.sh`)

  关于配置无秘钥登录请参考我之间的笔记:https://www.cnblogs.com/yinzhengjie/p/9065191.html。配置好无秘钥登录后,直接执行上面的脚本进行同步数据。

[yinzhengjie@s101 ~]$ xrsync.sh /soft/spark-2.1.1-bin-hadoop2.7/=========== s102 %file ===========命令执行成功=========== s103 %file ===========命令执行成功=========== s104 %file ===========命令执行成功[yinzhengjie@s101 ~]$

6>.修改配置文件,将spark运行脚本添加至系统环境变量

[yinzhengjie@s101 ~]$ ln -s /soft/spark-2.1.1-bin-hadoop2.7/ /soft/spark      #这里做一个软连接,方便简写目录名称[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ sudo vi /etc/profile                      #修改系统环境变量的配置文件[sudo] password for yinzhengjie: [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ tail -3 /etc/profile#ADD SPARK_PATH by yinzhengjieexport SPARK_HOME=/soft/sparkexport PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ source /etc/profile                      #重写加载系统配置文件,使其变量在当前shell生效。[yinzhengjie@s101 ~]$

7>.启动spark集群

[yinzhengjie@s101 ~]$ more `which xcall.sh`#!/bin/bash#@author :yinzhengjie#blog:http://www.cnblogs.com/yinzhengjie#EMAIL:y1053419035@qq.com#判断用户是否传参if [ $# -lt 1 ];then        echo "请输入参数"        exitfi#获取用户输入的命令cmd=$@for (( i=101;i<=104;i++ ))do        #使终端变绿色         tput setaf 2        echo ============= s$i $cmd ============        #使终端变回原来的颜色,即白灰色        tput setaf 7        #远程执行命令        ssh s$i $cmd        #判断命令是否执行成功        if [ $? == 0 ];then                echo "命令执行成功"        fidone[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ more `which xcall.sh`
[yinzhengjie@s101 ~]$ /soft/spark/sbin/start-all.sh       #启动spark集群starting org.apache.spark.deploy.master.Master, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.master.Master-1-s101.outs102: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s102.outs103: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s103.outs104: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s104.out[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ xcall.sh jps              #查看进程master和slave节点是否起来了============= s101 jps ============17587 Jps17464 Master命令执行成功============= s102 jps ============12845 Jps12767 Worker命令执行成功============= s103 jps ============12523 Jps12445 Worker命令执行成功============= s104 jps ============12317 Jps12239 Worker命令执行成功[yinzhengjie@s101 ~]$

8>.检查Spark的webUI界面

 

9>.启动spark-shell 

 

三.在Spark集群中执行Wordcount

1>.链接到master集群([yinzhengjie@s101 ~]$ spark-shell --master spark://s101:7077)

2>.登录webUI,查看正在运行的APP

 

 

3>.查看应用细节

4>.查看job的信息

5>.查看stage

 

6>.查看具体的详细信息

7>.退出spark-shell

8>.查看spark的完成应用,发现日志没了?

  那么问题来了。如果看日志呢?详情请参考:。

 

你可能感兴趣的文章
(转)我为什么鼓励工程师写blog
查看>>
mysql数据恢复
查看>>
每秒处理10万订单乐视集团支付架构
查看>>
study
查看>>
错误: 无法将文件“obj\Debug\Web.dll”复制到“bin\Web.dll”。对路径“bin\Web.dll”的访问被拒绝...
查看>>
jquery怎么实现左右滑动的问题
查看>>
hadoop(11)yarn状态机
查看>>
SSH中web.xml
查看>>
金中半日baoling游-----stoi
查看>>
HTTP协议
查看>>
Linux查看用户及分组
查看>>
Demo 6:完整的用户体验演示
查看>>
使用脚本方式和使用控件方式来输出Html内容的区别
查看>>
P5038 [SCOI2012]奇怪的游戏
查看>>
P4175 [CTSC2008]网络管理
查看>>
C#开发学习——常用的正则表达式
查看>>
这样一篇典型的软文
查看>>
微信内置的浏览器window.location.href 跳转不兼容问题
查看>>
数据库相关内容 已看1 有用
查看>>
【bzoj3110】[Zjoi2013]K大数查询
查看>>