Halo
发布于 2022-10-17 / 113 阅读 / 1 评论 / 0 点赞

hadloop( single-node )建立hdfs

依赖工具安装

ssh

sudo apt-get install ssh

java

sudo apt install default-jdk
java --version

下载hadoop

https://dlcdn.apache.org/hadoop/common/
下载后并解压进入目录, 例如

cd hadoop-3.3.4

配置java

nano  etc/hadoop/hadoop-env.sh
# set to the root of your Java installation
export JAVA_HOME=/usr/lib/jvm/default-java

测试hadoop环境和类库是否ok

bin/hadoop

启动dfs

配置参数

vi etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
vi etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
    </property>
</configuration>

ssh 无密码登陆

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
chmod 0600 ~/.ssh/authorized_keys2
sudo service ssh restart

如果在进行上诉操作后还提示 user@localhost: Permission denied (publickey).

chmod 750 $HOME

启动

bin/hdfs namenode -format
bin/hdfs datanode -format
sbin/start-dfs.sh

验证

http://localhost:9870/

curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]op=..."
curl -i 'http://10.60.2.114:9870/webhdfs/v1?op=LISTSTATUS'

远程访问

sudo nano /etc/hosts
10.60.2.114 server-precision-3630-tower

例如上传文件到: http://10.60.2.114:9870/user/data/model.json

from hdfs import InsecureClient

client = InsecureClient('http://10.60.2.114:9870', user='data')
files = client.list('/')
print(files)
with client.write('model.json', encoding='utf-8') as writer:
  writer.write('111')

运行ResourceManager/NodeManager

配置参数

vi etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
vi etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

启动

sbin/start-yarn.sh

验证

http://localhost:8088

jps

jps
1685161 NodeManager
1684825 ResourceManager
1686011 SecondaryNameNode
1685795 DataNode
1685618 NameNode
1686165 Jps

不同命令的区别

start-all.sh & stop-all.sh

Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.

start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh

Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh

hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager

To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.

Use case

Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,

bin/hadoop-daemon.sh start datanode

Note : You should have ssh enabled if you want to start all the daemons on all the nodes from one machine.


评论