Hadoop企业优化

2022-01-17

字数统计: 7.5k字 | 阅读时长≈ 40分

跳出原有的格局，用积极的心态，去学习新的思维方式，许多困难都能迎刃而解。——人民日报

前言

mapreduce 是一个hadoop的计算引擎,是hadoop的几个模块之一. 他是可插拔的.就是说mapreduce是可以换的. 因为mapreduce计算的太慢了. 所以后期我们会将mapreduce换成hive,spark,Flink. 因需求而定.此处了解.

MapReduce 跑的慢的原因

MapReduce 程序效率的瓶颈在于两点：
1.计算机性能
       CPU、内存、磁盘健康、网络
2．I/O操作优化
      （1）数据倾斜
      （2）Map和Reduce数设置不合理
      （3）Map运行时间太长，导致Reduce等待过久
      （4）小文件过多
      （5）大量的不可分块的超大文件
      （6）Spill次数过多
      （7）Merge次数过多等。

MapReduce优化方法

MapReduce优化方法主要从六个方面考虑：数据输入、Map阶段、Reduce阶段、IO传输、数据倾斜问题和常用的调优参数。

数据输入

1
2

(1) 合并小文件：在执行MR任务前将小文件进行合并，大量的小文件会产生大量的Map任务，增大Map任务装载次数，而任务的装载比较耗时，从而导致MR运行较慢。
(2) 采用CombineTextlnputFormat来作为输入，解决输入端大量小文件场景。

Map阶段

1
2
3

（1）减少溢写（Spill）次数：通过调整io.sort.mb及sort.spill.percent参数值，增大触发Spill的内存上限，减少Spill次数，从而减少磁盘IO。
（2）减少合并（Merge）次数：通过调整io.sort.factor参数，增大Merge的文件数目，减少Merge的次数，从而缩短MR处理时间。
（3）在Map之后，不影响业务逻辑前提下，先进行Combine处理，减少I/O

Reduce阶段

（1）合理设置Map和Reduce数：两个都不能设置太少，也不能设置太多。太少，会导致Task等待，延长处理时间；太多，会导致Map、Reduce任务间  竞争资源，造成处理超时等错误。
（2）设置Map、Reduce共存：调整slowstart.completedmaps参数，使Map运行到一定程度后，Reduce也开始运行，减少Reduce的等待时间。
（3）规避使用Reduce：因为Reduce在用于连接数据集的时候将会产生大量的网络消耗。
（4）合理设置Reduce端的Buffer：默认情况下，数据达到一个阈值的时候，Buffer中的数据就会写入磁盘，然后Reduce会从磁盘中获得所有的数据。也就是说，Buffer和Reduce是没有直接关联的，中间多次写磁盘->读磁盘的过程，既然有这个弊端，那么就可以通过参数来配置，使得Buffer中的一部分数据可以直接输送到Reduce，从而减少IO开销：mapreduce.reduce.input.buffer.percent，默认为0.0。当值大于0的时候，会保留指定比例的内存读Buffer中的数据直接拿给Reduce使用。这样一来，设置Buffer需要内存，读取数据需要内存，Reduce计算也要内存，所以要根据作业的运行情况进行调整。

I/O传输

1 2	(1)采用数据压缩的方式，减少网络IO的的时间。安装Snappy和LZO压缩编码器。 (2)使用SequenceFile二进制文件。

数据倾斜问题

1. 数据倾斜现象
      数据频率倾斜------某一个区域的数据量要远远大于其他区域。
      数据大小倾斜------部分记录的大小远远大于平均值。
2. 减少数据倾斜的方法
			方法1: 抽样和范围分区可以通过对原始数据进行抽样得到的结果集来预设分区边界值。
			方法2：自定义分区基于输出键的背景知识进行自定义分区。例如，如果Map输出键的单词来源于一本书。且其中某几个专业词汇较多。那么就可以自定义分区将这这些专业词汇发送给固定的一部分Reduce实例。而将其他的都发送给剩余的Reduce实例。
			方法3：Combine使用Combine可以大量地减小数据倾斜。在可能的情况下，Combine的目的就是聚合并精简数据。
			方法4：采用Map Join，尽量避免Reduce Join 。

常用的调优参数

1．资源相关参数

（1）以下参数是在用户自己的MR应用程序中配置就可以生效（mapred-default.xml）

配置参数	参数说明
mapreduce.map.memory.mb	一个MapTask可使用的资源上限（单位:MB），默认为1024。如果MapTask实际使用的资源量超过该值，则会被强制杀死。
mapreduce.reduce.memory.mb	一个ReduceTask可使用的资源上限（单位:MB），默认为1024。如果ReduceTask实际使用的资源量超过该值，则会被强制杀死。
mapreduce.map.cpu.vcores	每个MapTask可使用的最多cpu core数目，默认值: 1
mapreduce.reduce.cpu.vcores	每个ReduceTask可使用的最多cpu core数目，默认值: 1
mapreduce.reduce.shuffle.parallelcopies	每个Reduce去Map中取数据的并行数。默认值是5
mapreduce.reduce.shuffle.merge.percent	Buffer中的数据达到多少比例开始写入磁盘。默认值0.66
mapreduce.reduce.shuffle.input.buffer.percent	Buffer大小占Reduce可用内存的比例。默认值0.7
mapreduce.reduce.input.buffer.percent	指定多少比例的内存用来存放Buffer中的数据，默认值是0.0

(2) 应该在YARN启动之前就配置在服务器的配置文件中才能生效（yarn-default.xml）

配置参数	参数说明
yarn.scheduler.minimum-allocation-mb	给应用程序Container分配的最小内存，默认值：1024
yarn.scheduler.maximum-allocation-mb	给应用程序Container分配的最大内存，默认值：8192
yarn.scheduler.minimum-allocation-vcores	每个Container申请的最小CPU核数，默认值：1
yarn.scheduler.maximum-allocation-vcores	每个Container申请的最大CPU核数，默认值：32
yarn.nodemanager.resource.memory-mb	给Containers分配的最大物理内存，默认值：8192

（3）Shuffle性能优化的关键参数，应在YARN启动之前就配置好（mapred-default.xml）

配置参数	参数说明
mapreduce.task.io.sort.mb	Shuffle的环形缓冲区大小，默认100m
mapreduce.map.sort.spill.percent	环形缓冲区溢出的阈值，默认80%

2．容错相关参数(MapReduce性能优化)

配置参数	参数说明
mapreduce.map.maxattempts	每个Map Task最大重试次数，一旦重试参数超过该值，则认为Map Task运行失败，默认值：4。
mapreduce.reduce.maxattempts	每个Reduce Task最大重试次数，一旦重试参数超过该值，则认为Map Task运行失败，默认值：4。
mapreduce.task.timeout	Task超时时间，经常需要设置的一个参数，该参数表达的意思为：如果一个Task在一定时间内没有任何进入，即不会读取新的数据，也没有输出数据，则认为该Task处于Block状态，可能是卡住了，也许永远会卡住，为了防止因为用户程序永远Block住不退出，则强制设置了一个该超时时间（单位毫秒），默认是600000。如果你的程序对每条输入数据的处理时间过长（比如会访问数据库，通过网络拉取数据等），建议将该参数调大，该参数过小常出现的错误提示是“AttemptID:attempt_14267829456721_123456_m_000224_0 Timed out after 300 secsContainer killed by the ApplicationMaster.”。

HDFS小文件优化方法

HDFS小文件弊端

HDFS上每个文件都要在NameNode上建立一个索引，这个索引的大小约为150byte，这样当小文件比较多的时候，就会产生很多的索引文件，一方面会大量占用NameNode的内存空间，另一方面就是索引文件过大使得索引速度变慢。

HDFS小文件解决方案

小文件的优化无非以下几种方式：

（1）在数据采集的时候，就将小文件或小批数据合成大文件再上传HDFS。

（2）在业务处理之前，在HDFS上使用MapReduce程序对小文件进行合并。

（3）在MapReduce处理时，可采用CombineTextInputFormat提高效率。

重要:开启JVM重(chong)用效果是非常显著的.

MapReduce扩展案例

倒排索引案例（多job串联）

需求

有大量的文本（文档、网页），需要建立搜索索引

需求分析

第一次处理案例代码

https://github.com/ShangBaiShuYao/bigdata/blob/master/src/main/java/com/shangbaishuyao/hadoop/InvertedIndex/FirstTreatment/

第二次处理案例代码

https://github.com/ShangBaiShuYao/bigdata/blob/master/src/main/java/com/shangbaishuyao/hadoop/InvertedIndex/SecondTreatment/

TopN案例

需求

对需求输出结果进行加工，输出流量使用量在前10的用户信息

需求分析

案例代码

https://github.com/ShangBaiShuYao/bigdata/blob/master/src/main/java/com/shangbaishuyao/hadoop/TopN/

找博客共同好友案例

需求

以下是博客的好友列表数据，冒号前是一个用户，冒号后是该用户的所有好友（数据中的好友关系是单向的）
求出哪些人两两之间有共同好友，及他俩的共同好友都有谁？

数据输入:

A:B,C,D,F,E,O

B:A,C,E,K

C:F,A,D,I

D:A,E,F,L

E:B,C,D,M,L

F:A,B,C,D,E,O,M

G:A,C,D,E,F

H:A,C,D,E,O

I:A,O

J:B,O

K:A,C,D

L:D,E,F

M:E,F,G

O:A,H,I,J

需求分析

先求出A、B、C、….等是谁的好友

第一次输出结果

A	I,K,C,B,G,F,H,O,D,

B	A,F,J,E,

C	A,E,B,H,F,G,K,

D	G,C,K,A,L,F,E,H,

E	G,M,L,H,A,F,B,D,

F	L,M,D,C,G,A,

G	M,

H	O,

I	O,C,

J	O,

K	B,

L	D,E,

M	E,F,

O	A,H,I,J,F,

第二次输出结果

A-B	E C 

A-C	D F 

A-D	E F 

A-E	D B C 

A-F	O B C D E 

A-G	F E C D 

A-H	E C D O 

A-I	O 

A-J	O B 

A-K	D C 

A-L	F E D 

A-M	E F 

B-C	A 

B-D	A E 

B-E	C 

B-F	E A C 

B-G	C E A 

B-H	A E C 

B-I	A 

B-K	C A 

B-L	E 

B-M	E 

B-O	A 

C-D	A F 

C-E	D 

C-F	D A 

C-G	D F A 

C-H	D A 

C-I	A 

C-K	A D 

C-L	D F 

C-M	F 

C-O	I A 

D-E	L 

D-F	A E 

D-G	E A F 

D-H	A E 

D-I	A 

D-K	A 

D-L	E F 

D-M	F E 

D-O	A 

E-F	D M C B 

E-G	C D 

E-H	C D 

E-J	B 

E-K	C D 

E-L	D 

F-G	D C A E 

F-H	A D O E C 

F-I	O A 

F-J	B O 

F-K	D C A 

F-L	E D 

F-M	E 

F-O	A 

G-H	D C E A 

G-I	A 

G-K	D A C 

G-L	D F E 

G-M	E F 

G-O	A 

H-I	O A 

H-J	O 

H-K	A C D 

H-L	D E 

H-M	E 

H-O	A 

I-J	O 

I-K	A 

I-O	A 

K-L	D 

K-O	A 

L-M	E F

案例代码

https://github.com/ShangBaiShuYao/bigdata/blob/master/src/main/java/com/shangbaishuyao/hadoop/FindBlogFriends/

常见错误及解决方案

1）导包容易出错。尤其Text和CombineTextInputFormat。
2）Mapper中第一个输入的参数必须是LongWritable或者NullWritable，不可以是IntWritable.  报的错误是类型转换异常。
3）java.lang.Exception: java.io.IOException: Illegal partition for 13926435656 (4)，说明Partition和ReduceTask个数没对上，调整ReduceTask个数。
4）如果分区数不是1，但是reducetask为1，是否执行分区过程。答案是：不执行分区过程。因为在MapTask的源码中，执行分区的前提是先判断ReduceNum个数是否大于1。不大于1肯定不执行。
5）在Windows环境编译的jar包导入到Linux环境中运行，
hadoop jar wc.jar com.atguigu.mapreduce.wordcount.WordCountDriver /user/atguigu/ /user/atguigu/output
报如下错误：
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/atguigu/mapreduce/wordcount/WordCountDriver : Unsupported major.minor version 52.0
原因是Windows环境用的jdk1.7，Linux环境用的jdk1.8。
解决方案：统一jdk版本。
6）缓存pd.txt小文件案例中，报找不到pd.txt文件
原因：大部分为路径书写错误。还有就是要检查pd.txt.txt的问题。还有个别电脑写相对路径找不到pd.txt，可以修改为绝对路径。
7）报类型转换异常。
通常都是在驱动函数中设置Map输出和最终输出时编写错误。
Map输出的key如果没有排序，也会报类型转换异常。
8）集群中运行wc.jar时出现了无法获得输入文件。
原因：WordCount案例的输入文件不能放用HDFS集群的根目录。
9）出现了如下相关异常
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
	at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
	at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)
解决方案：拷贝hadoop.dll文件到Windows目录C:\Windows\System32。个别同学电脑还需要修改Hadoop源码。
方案二：创建如下包名，并将NativeIO.java拷贝到该包名下

10）自定义Outputformat时，注意在RecordWirter中的close方法必须关闭流资源。否则输出的文件内容中数据为空。
@Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
		if (atguigufos != null) {
			atguigufos.close();
		}
		if (otherfos != null) {
			otherfos.close();
		}
}

NativelO.java

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.io.nativeio;

import java.io.File;
import java.io.FileDescriptor;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.lang.reflect.Field;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.CommonConfigurationKeys;
import org.apache.hadoop.fs.HardLink;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SecureIOUtils.AlreadyExistsException;
import org.apache.hadoop.util.NativeCodeLoader;
import org.apache.hadoop.util.Shell;
import org.apache.hadoop.util.PerformanceAdvisory;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import sun.misc.Unsafe;
import com.google.common.annotations.VisibleForTesting;

/**
 * JNI wrappers for various native IO-related calls not available in Java. These
 * functions should generally be used alongside a fallback to another more
 * portable mechanism.
 */
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class NativeIO {
	public static class POSIX {
		// Flags for open() call from bits/fcntl.h
		public static final int O_RDONLY = 00;
		public static final int O_WRONLY = 01;
		public static final int O_RDWR = 02;
		public static final int O_CREAT = 0100;
		public static final int O_EXCL = 0200;
		public static final int O_NOCTTY = 0400;
		public static final int O_TRUNC = 01000;
		public static final int O_APPEND = 02000;
		public static final int O_NONBLOCK = 04000;
		public static final int O_SYNC = 010000;
		public static final int O_ASYNC = 020000;
		public static final int O_FSYNC = O_SYNC;
		public static final int O_NDELAY = O_NONBLOCK;

		// Flags for posix_fadvise() from bits/fcntl.h
		/* No further special treatment. */
		public static final int POSIX_FADV_NORMAL = 0;
		/* Expect random page references. */
		public static final int POSIX_FADV_RANDOM = 1;
		/* Expect sequential page references. */
		public static final int POSIX_FADV_SEQUENTIAL = 2;
		/* Will need these pages. */
		public static final int POSIX_FADV_WILLNEED = 3;
		/* Don't need these pages. */
		public static final int POSIX_FADV_DONTNEED = 4;
		/* Data will be accessed once. */
		public static final int POSIX_FADV_NOREUSE = 5;

		/*
		 * Wait upon writeout of all pages in the range before performing the write.
		 */
		public static final int SYNC_FILE_RANGE_WAIT_BEFORE = 1;
		/*
		 * Initiate writeout of all those dirty pages in the range which are not
		 * presently under writeback.
		 */
		public static final int SYNC_FILE_RANGE_WRITE = 2;

		/*
		 * Wait upon writeout of all pages in the range after performing the write.
		 */
		public static final int SYNC_FILE_RANGE_WAIT_AFTER = 4;

		private static final Log LOG = LogFactory.getLog(NativeIO.class);

		private static boolean nativeLoaded = false;
		private static boolean fadvisePossible = true;
		private static boolean syncFileRangePossible = true;

		static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY = "hadoop.workaround.non.threadsafe.getpwuid";
		static final boolean WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT = true;

		private static long cacheTimeout = -1;

		private static CacheManipulator cacheManipulator = new CacheManipulator();

		public static CacheManipulator getCacheManipulator() {
			return cacheManipulator;
		}

		public static void setCacheManipulator(CacheManipulator cacheManipulator) {
			POSIX.cacheManipulator = cacheManipulator;
		}

		/**
		 * Used to manipulate the operating system cache.
		 */
		@VisibleForTesting
		public static class CacheManipulator {
			public void mlock(String identifier, ByteBuffer buffer, long len) throws IOException {
				POSIX.mlock(buffer, len);
			}

			public long getMemlockLimit() {
				return NativeIO.getMemlockLimit();
			}

			public long getOperatingSystemPageSize() {
				return NativeIO.getOperatingSystemPageSize();
			}

			public void posixFadviseIfPossible(String identifier, FileDescriptor fd, long offset, long len, int flags)
					throws NativeIOException {
				NativeIO.POSIX.posixFadviseIfPossible(identifier, fd, offset, len, flags);
			}

			public boolean verifyCanMlock() {
				return NativeIO.isAvailable();
			}
		}

		/**
		 * A CacheManipulator used for testing which does not actually call mlock. This
		 * allows many tests to be run even when the operating system does not allow
		 * mlock, or only allows limited mlocking.
		 */
		@VisibleForTesting
		public static class NoMlockCacheManipulator extends CacheManipulator {
			public void mlock(String identifier, ByteBuffer buffer, long len) throws IOException {
				LOG.info("mlocking " + identifier);
			}

			public long getMemlockLimit() {
				return 1125899906842624L;
			}

			public long getOperatingSystemPageSize() {
				return 4096;
			}

			public boolean verifyCanMlock() {
				return true;
			}
		}

		static {
			if (NativeCodeLoader.isNativeCodeLoaded()) {
				try {
					Configuration conf = new Configuration();
					workaroundNonThreadSafePasswdCalls = conf.getBoolean(WORKAROUND_NON_THREADSAFE_CALLS_KEY,
							WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT);

					initNative();
					nativeLoaded = true;

					cacheTimeout = conf.getLong(CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY,
							CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT) * 1000;
					LOG.debug("Initialized cache for IDs to User/Group mapping with a " + " cache timeout of "
							+ cacheTimeout / 1000 + " seconds.");

				} catch (Throwable t) {
					// This can happen if the user has an older version of libhadoop.so
					// installed - in this case we can continue without native IO
					// after warning
					PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
				}
			}
		}

		/**
		 * Return true if the JNI-based native IO extensions are available.
		 */
		public static boolean isAvailable() {
			return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
		}

		private static void assertCodeLoaded() throws IOException {
			if (!isAvailable()) {
				throw new IOException("NativeIO was not loaded");
			}
		}

		/** Wrapper around open(2) */
		public static native FileDescriptor open(String path, int flags, int mode) throws IOException;

		/** Wrapper around fstat(2) */
		private static native Stat fstat(FileDescriptor fd) throws IOException;

		/** Native chmod implementation. On UNIX, it is a wrapper around chmod(2) */
		private static native void chmodImpl(String path, int mode) throws IOException;

		public static void chmod(String path, int mode) throws IOException {
			if (!Shell.WINDOWS) {
				chmodImpl(path, mode);
			} else {
				try {
					chmodImpl(path, mode);
				} catch (NativeIOException nioe) {
					if (nioe.getErrorCode() == 3) {
						throw new NativeIOException("No such file or directory", Errno.ENOENT);
					} else {
						LOG.warn(
								String.format("NativeIO.chmod error (%d): %s", nioe.getErrorCode(), nioe.getMessage()));
						throw new NativeIOException("Unknown error", Errno.UNKNOWN);
					}
				}
			}
		}

		/** Wrapper around posix_fadvise(2) */
		static native void posix_fadvise(FileDescriptor fd, long offset, long len, int flags) throws NativeIOException;

		/** Wrapper around sync_file_range(2) */
		static native void sync_file_range(FileDescriptor fd, long offset, long nbytes, int flags)
				throws NativeIOException;

		/**
		 * Call posix_fadvise on the given file descriptor. See the manpage for this
		 * syscall for more information. On systems where this call is not available,
		 * does nothing.
		 *
		 * @throws NativeIOException
		 *             if there is an error with the syscall
		 */
		static void posixFadviseIfPossible(String identifier, FileDescriptor fd, long offset, long len, int flags)
				throws NativeIOException {
			if (nativeLoaded && fadvisePossible) {
				try {
					posix_fadvise(fd, offset, len, flags);
				} catch (UnsupportedOperationException uoe) {
					fadvisePossible = false;
				} catch (UnsatisfiedLinkError ule) {
					fadvisePossible = false;
				}
			}
		}

		/**
		 * Call sync_file_range on the given file descriptor. See the manpage for this
		 * syscall for more information. On systems where this call is not available,
		 * does nothing.
		 *
		 * @throws NativeIOException
		 *             if there is an error with the syscall
		 */
		public static void syncFileRangeIfPossible(FileDescriptor fd, long offset, long nbytes, int flags)
				throws NativeIOException {
			if (nativeLoaded && syncFileRangePossible) {
				try {
					sync_file_range(fd, offset, nbytes, flags);
				} catch (UnsupportedOperationException uoe) {
					syncFileRangePossible = false;
				} catch (UnsatisfiedLinkError ule) {
					syncFileRangePossible = false;
				}
			}
		}

		static native void mlock_native(ByteBuffer buffer, long len) throws NativeIOException;

		/**
		 * Locks the provided direct ByteBuffer into memory, preventing it from swapping
		 * out. After a buffer is locked, future accesses will not incur a page fault.
		 * 
		 * See the mlock(2) man page for more information.
		 * 
		 * @throws NativeIOException
		 */
		static void mlock(ByteBuffer buffer, long len) throws IOException {
			assertCodeLoaded();
			if (!buffer.isDirect()) {
				throw new IOException("Cannot mlock a non-direct ByteBuffer");
			}
			mlock_native(buffer, len);
		}

		/**
		 * Unmaps the block from memory. See munmap(2).
		 *
		 * There isn't any portable way to unmap a memory region in Java. So we use the
		 * sun.nio method here. Note that unmapping a memory region could cause crashes
		 * if code continues to reference the unmapped code. However, if we don't
		 * manually unmap the memory, we are dependent on the finalizer to do it, and we
		 * have no idea when the finalizer will run.
		 *
		 * @param buffer
		 *            The buffer to unmap.
		 */
		public static void munmap(MappedByteBuffer buffer) {
			if (buffer instanceof sun.nio.ch.DirectBuffer) {
				sun.misc.Cleaner cleaner = ((sun.nio.ch.DirectBuffer) buffer).cleaner();
				cleaner.clean();
			}
		}

		/** Linux only methods used for getOwner() implementation */
		private static native long getUIDforFDOwnerforOwner(FileDescriptor fd) throws IOException;

		private static native String getUserName(long uid) throws IOException;

		/**
		 * Result type of the fstat call
		 */
		public static class Stat {
			private int ownerId, groupId;
			private String owner, group;
			private int mode;

			// Mode constants
			public static final int S_IFMT = 0170000; /* type of file */
			public static final int S_IFIFO = 0010000; /* named pipe (fifo) */
			public static final int S_IFCHR = 0020000; /* character special */
			public static final int S_IFDIR = 0040000; /* directory */
			public static final int S_IFBLK = 0060000; /* block special */
			public static final int S_IFREG = 0100000; /* regular */
			public static final int S_IFLNK = 0120000; /* symbolic link */
			public static final int S_IFSOCK = 0140000; /* socket */
			public static final int S_IFWHT = 0160000; /* whiteout */
			public static final int S_ISUID = 0004000; /* set user id on execution */
			public static final int S_ISGID = 0002000; /* set group id on execution */
			public static final int S_ISVTX = 0001000; /* save swapped text even after use */
			public static final int S_IRUSR = 0000400; /* read permission, owner */
			public static final int S_IWUSR = 0000200; /* write permission, owner */
			public static final int S_IXUSR = 0000100; /* execute/search permission, owner */

			Stat(int ownerId, int groupId, int mode) {
				this.ownerId = ownerId;
				this.groupId = groupId;
				this.mode = mode;
			}

			Stat(String owner, String group, int mode) {
				if (!Shell.WINDOWS) {
					this.owner = owner;
				} else {
					this.owner = stripDomain(owner);
				}
				if (!Shell.WINDOWS) {
					this.group = group;
				} else {
					this.group = stripDomain(group);
				}
				this.mode = mode;
			}

			@Override
			public String toString() {
				return "Stat(owner='" + owner + "', group='" + group + "'" + ", mode=" + mode + ")";
			}

			public String getOwner() {
				return owner;
			}

			public String getGroup() {
				return group;
			}

			public int getMode() {
				return mode;
			}
		}

		/**
		 * Returns the file stat for a file descriptor.
		 *
		 * @param fd
		 *            file descriptor.
		 * @return the file descriptor file stat.
		 * @throws IOException
		 *             thrown if there was an IO error while obtaining the file stat.
		 */
		public static Stat getFstat(FileDescriptor fd) throws IOException {
			Stat stat = null;
			if (!Shell.WINDOWS) {
				stat = fstat(fd);
				stat.owner = getName(IdCache.USER, stat.ownerId);
				stat.group = getName(IdCache.GROUP, stat.groupId);
			} else {
				try {
					stat = fstat(fd);
				} catch (NativeIOException nioe) {
					if (nioe.getErrorCode() == 6) {
						throw new NativeIOException("The handle is invalid.", Errno.EBADF);
					} else {
						LOG.warn(String.format("NativeIO.getFstat error (%d): %s", nioe.getErrorCode(),
								nioe.getMessage()));
						throw new NativeIOException("Unknown error", Errno.UNKNOWN);
					}
				}
			}
			return stat;
		}

		private static String getName(IdCache domain, int id) throws IOException {
			Map<Integer, CachedName> idNameCache = (domain == IdCache.USER) ? USER_ID_NAME_CACHE : GROUP_ID_NAME_CACHE;
			String name;
			CachedName cachedName = idNameCache.get(id);
			long now = System.currentTimeMillis();
			if (cachedName != null && (cachedName.timestamp + cacheTimeout) > now) {
				name = cachedName.name;
			} else {
				name = (domain == IdCache.USER) ? getUserName(id) : getGroupName(id);
				if (LOG.isDebugEnabled()) {
					String type = (domain == IdCache.USER) ? "UserName" : "GroupName";
					LOG.debug("Got " + type + " " + name + " for ID " + id + " from the native implementation");
				}
				cachedName = new CachedName(name, now);
				idNameCache.put(id, cachedName);
			}
			return name;
		}

		static native String getUserName(int uid) throws IOException;

		static native String getGroupName(int uid) throws IOException;

		private static class CachedName {
			final long timestamp;
			final String name;

			public CachedName(String name, long timestamp) {
				this.name = name;
				this.timestamp = timestamp;
			}
		}

		private static final Map<Integer, CachedName> USER_ID_NAME_CACHE = new ConcurrentHashMap<Integer, CachedName>();

		private static final Map<Integer, CachedName> GROUP_ID_NAME_CACHE = new ConcurrentHashMap<Integer, CachedName>();

		private enum IdCache {
			USER, GROUP
		}

		public final static int MMAP_PROT_READ = 0x1;
		public final static int MMAP_PROT_WRITE = 0x2;
		public final static int MMAP_PROT_EXEC = 0x4;

		public static native long mmap(FileDescriptor fd, int prot, boolean shared, long length) throws IOException;

		public static native void munmap(long addr, long length) throws IOException;
	}

	private static boolean workaroundNonThreadSafePasswdCalls = false;

	public static class Windows {
		// Flags for CreateFile() call on Windows
		public static final long GENERIC_READ = 0x80000000L;
		public static final long GENERIC_WRITE = 0x40000000L;

		public static final long FILE_SHARE_READ = 0x00000001L;
		public static final long FILE_SHARE_WRITE = 0x00000002L;
		public static final long FILE_SHARE_DELETE = 0x00000004L;

		public static final long CREATE_NEW = 1;
		public static final long CREATE_ALWAYS = 2;
		public static final long OPEN_EXISTING = 3;
		public static final long OPEN_ALWAYS = 4;
		public static final long TRUNCATE_EXISTING = 5;

		public static final long FILE_BEGIN = 0;
		public static final long FILE_CURRENT = 1;
		public static final long FILE_END = 2;

		public static final long FILE_ATTRIBUTE_NORMAL = 0x00000080L;

		/**
		 * Create a directory with permissions set to the specified mode. By setting
		 * permissions at creation time, we avoid issues related to the user lacking
		 * WRITE_DAC rights on subsequent chmod calls. One example where this can occur
		 * is writing to an SMB share where the user does not have Full Control rights,
		 * and therefore WRITE_DAC is denied.
		 *
		 * @param path
		 *            directory to create
		 * @param mode
		 *            permissions of new directory
		 * @throws IOException
		 *             if there is an I/O error
		 */
		public static void createDirectoryWithMode(File path, int mode) throws IOException {
			createDirectoryWithMode0(path.getAbsolutePath(), mode);
		}

		/** Wrapper around CreateDirectory() on Windows */
		private static native void createDirectoryWithMode0(String path, int mode) throws NativeIOException;

		/** Wrapper around CreateFile() on Windows */
		public static native FileDescriptor createFile(String path, long desiredAccess, long shareMode,
				long creationDisposition) throws IOException;

		/**
		 * Create a file for write with permissions set to the specified mode. By
		 * setting permissions at creation time, we avoid issues related to the user
		 * lacking WRITE_DAC rights on subsequent chmod calls. One example where this
		 * can occur is writing to an SMB share where the user does not have Full
		 * Control rights, and therefore WRITE_DAC is denied.
		 *
		 * This method mimics the semantics implemented by the JDK in
		 * {@link java.io.FileOutputStream}. The file is opened for truncate or append,
		 * the sharing mode allows other readers and writers, and paths longer than
		 * MAX_PATH are supported. (See io_util_md.c in the JDK.)
		 *
		 * @param path
		 *            file to create
		 * @param append
		 *            if true, then open file for append
		 * @param mode
		 *            permissions of new directory
		 * @return FileOutputStream of opened file
		 * @throws IOException
		 *             if there is an I/O error
		 */
		public static FileOutputStream createFileOutputStreamWithMode(File path, boolean append, int mode)
				throws IOException {
			long desiredAccess = GENERIC_WRITE;
			long shareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;
			long creationDisposition = append ? OPEN_ALWAYS : CREATE_ALWAYS;
			return new FileOutputStream(
					createFileWithMode0(path.getAbsolutePath(), desiredAccess, shareMode, creationDisposition, mode));
		}

		/** Wrapper around CreateFile() with security descriptor on Windows */
		private static native FileDescriptor createFileWithMode0(String path, long desiredAccess, long shareMode,
				long creationDisposition, int mode) throws NativeIOException;

		/** Wrapper around SetFilePointer() on Windows */
		public static native long setFilePointer(FileDescriptor fd, long distanceToMove, long moveMethod)
				throws IOException;

		/** Windows only methods used for getOwner() implementation */
		private static native String getOwner(FileDescriptor fd) throws IOException;

		/** Supported list of Windows access right flags */
		public static enum AccessRight {
			ACCESS_READ(0x0001), // FILE_READ_DATA
			ACCESS_WRITE(0x0002), // FILE_WRITE_DATA
			ACCESS_EXECUTE(0x0020); // FILE_EXECUTE

			private final int accessRight;

			AccessRight(int access) {
				accessRight = access;
			}

			public int accessRight() {
				return accessRight;
			}
		};

		/**
		 * Windows only method used to check if the current process has requested access
		 * rights on the given path.
		 */
		private static native boolean access0(String path, int requestedAccess);

		/**
		 * Checks whether the current process has desired access rights on the given
		 * path.
		 * 
		 * Longer term this native function can be substituted with JDK7 function
		 * Files#isReadable, isWritable, isExecutable.
		 *
		 * @param path
		 *            input path
		 * @param desiredAccess
		 *            ACCESS_READ, ACCESS_WRITE or ACCESS_EXECUTE
		 * @return true if access is allowed
		 * @throws IOException
		 *             I/O exception on error
		 */
		public static boolean access(String path, AccessRight desiredAccess) throws IOException {
			return true;
			// return access0(path, desiredAccess.accessRight());
		}

		/**
		 * Extends both the minimum and maximum working set size of the current process.
		 * This method gets the current minimum and maximum working set size, adds the
		 * requested amount to each and then sets the minimum and maximum working set
		 * size to the new values. Controlling the working set size of the process also
		 * controls the amount of memory it can lock.
		 *
		 * @param delta
		 *            amount to increment minimum and maximum working set size
		 * @throws IOException
		 *             for any error
		 * @see POSIX#mlock(ByteBuffer, long)
		 */
		public static native void extendWorkingSetSize(long delta) throws IOException;

		static {
			if (NativeCodeLoader.isNativeCodeLoaded()) {
				try {
					initNative();
					nativeLoaded = true;
				} catch (Throwable t) {
					// This can happen if the user has an older version of libhadoop.so
					// installed - in this case we can continue without native IO
					// after warning
					PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
				}
			}
		}
	}

	private static final Log LOG = LogFactory.getLog(NativeIO.class);

	private static boolean nativeLoaded = false;

	static {
		if (NativeCodeLoader.isNativeCodeLoaded()) {
			try {
				initNative();
				nativeLoaded = true;
			} catch (Throwable t) {
				// This can happen if the user has an older version of libhadoop.so
				// installed - in this case we can continue without native IO
				// after warning
				PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
			}
		}
	}

	/**
	 * Return true if the JNI-based native IO extensions are available.
	 */
	public static boolean isAvailable() {
		return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
	}

	/** Initialize the JNI method ID and class ID cache */
	private static native void initNative();

	/**
	 * Get the maximum number of bytes that can be locked into memory at any given
	 * point.
	 *
	 * @return 0 if no bytes can be locked into memory; Long.MAX_VALUE if there is
	 *         no limit; The number of bytes that can be locked into memory
	 *         otherwise.
	 */
	static long getMemlockLimit() {
		return isAvailable() ? getMemlockLimit0() : 0;
	}

	private static native long getMemlockLimit0();

	/**
	 * @return the operating system's page size.
	 */
	static long getOperatingSystemPageSize() {
		try {
			Field f = Unsafe.class.getDeclaredField("theUnsafe");
			f.setAccessible(true);
			Unsafe unsafe = (Unsafe) f.get(null);
			return unsafe.pageSize();
		} catch (Throwable e) {
			LOG.warn("Unable to get operating system page size.  Guessing 4096.", e);
			return 4096;
		}
	}

	private static class CachedUid {
		final long timestamp;
		final String username;

		public CachedUid(String username, long timestamp) {
			this.timestamp = timestamp;
			this.username = username;
		}
	}

	private static final Map<Long, CachedUid> uidCache = new ConcurrentHashMap<Long, CachedUid>();
	private static long cacheTimeout;
	private static boolean initialized = false;

	/**
	 * The Windows logon name has two part, NetBIOS domain name and user account
	 * name, of the format DOMAIN\UserName. This method will remove the domain part
	 * of the full logon name.
	 *
	 * @param Fthe
	 *            full principal name containing the domain
	 * @return name with domain removed
	 */
	private static String stripDomain(String name) {
		int i = name.indexOf('\\');
		if (i != -1)
			name = name.substring(i + 1);
		return name;
	}

	public static String getOwner(FileDescriptor fd) throws IOException {
		ensureInitialized();
		if (Shell.WINDOWS) {
			String owner = Windows.getOwner(fd);
			owner = stripDomain(owner);
			return owner;
		} else {
			long uid = POSIX.getUIDforFDOwnerforOwner(fd);
			CachedUid cUid = uidCache.get(uid);
			long now = System.currentTimeMillis();
			if (cUid != null && (cUid.timestamp + cacheTimeout) > now) {
				return cUid.username;
			}
			String user = POSIX.getUserName(uid);
			LOG.info("Got UserName " + user + " for UID " + uid + " from the native implementation");
			cUid = new CachedUid(user, now);
			uidCache.put(uid, cUid);
			return user;
		}
	}

	/**
	 * Create a FileInputStream that shares delete permission on the file opened,
	 * i.e. other process can delete the file the FileInputStream is reading. Only
	 * Windows implementation uses the native interface.
	 */
	public static FileInputStream getShareDeleteFileInputStream(File f) throws IOException {
		if (!Shell.WINDOWS) {
			// On Linux the default FileInputStream shares delete permission
			// on the file opened.
			//
			return new FileInputStream(f);
		} else {
			// Use Windows native interface to create a FileInputStream that
			// shares delete permission on the file opened.
			//
			FileDescriptor fd = Windows.createFile(f.getAbsolutePath(), Windows.GENERIC_READ,
					Windows.FILE_SHARE_READ | Windows.FILE_SHARE_WRITE | Windows.FILE_SHARE_DELETE,
					Windows.OPEN_EXISTING);
			return new FileInputStream(fd);
		}
	}

	/**
	 * Create a FileInputStream that shares delete permission on the file opened at
	 * a given offset, i.e. other process can delete the file the FileInputStream is
	 * reading. Only Windows implementation uses the native interface.
	 */
	public static FileInputStream getShareDeleteFileInputStream(File f, long seekOffset) throws IOException {
		if (!Shell.WINDOWS) {
			RandomAccessFile rf = new RandomAccessFile(f, "r");
			if (seekOffset > 0) {
				rf.seek(seekOffset);
			}
			return new FileInputStream(rf.getFD());
		} else {
			// Use Windows native interface to create a FileInputStream that
			// shares delete permission on the file opened, and set it to the
			// given offset.
			//
			FileDescriptor fd = NativeIO.Windows.createFile(
					f.getAbsolutePath(), NativeIO.Windows.GENERIC_READ, NativeIO.Windows.FILE_SHARE_READ
							| NativeIO.Windows.FILE_SHARE_WRITE | NativeIO.Windows.FILE_SHARE_DELETE,
					NativeIO.Windows.OPEN_EXISTING);
			if (seekOffset > 0)
				NativeIO.Windows.setFilePointer(fd, seekOffset, NativeIO.Windows.FILE_BEGIN);
			return new FileInputStream(fd);
		}
	}

	/**
	 * Create the specified File for write access, ensuring that it does not exist.
	 * 
	 * @param f
	 *            the file that we want to create
	 * @param permissions
	 *            we want to have on the file (if security is enabled)
	 *
	 * @throws AlreadyExistsException
	 *             if the file already exists
	 * @throws IOException
	 *             if any other error occurred
	 */
	public static FileOutputStream getCreateForWriteFileOutputStream(File f, int permissions) throws IOException {
		if (!Shell.WINDOWS) {
			// Use the native wrapper around open(2)
			try {
				FileDescriptor fd = NativeIO.POSIX.open(f.getAbsolutePath(),
						NativeIO.POSIX.O_WRONLY | NativeIO.POSIX.O_CREAT | NativeIO.POSIX.O_EXCL, permissions);
				return new FileOutputStream(fd);
			} catch (NativeIOException nioe) {
				if (nioe.getErrno() == Errno.EEXIST) {
					throw new AlreadyExistsException(nioe);
				}
				throw nioe;
			}
		} else {
			// Use the Windows native APIs to create equivalent FileOutputStream
			try {
				FileDescriptor fd = NativeIO.Windows.createFile(
						f.getCanonicalPath(), NativeIO.Windows.GENERIC_WRITE, NativeIO.Windows.FILE_SHARE_DELETE
								| NativeIO.Windows.FILE_SHARE_READ | NativeIO.Windows.FILE_SHARE_WRITE,
						NativeIO.Windows.CREATE_NEW);
				NativeIO.POSIX.chmod(f.getCanonicalPath(), permissions);
				return new FileOutputStream(fd);
			} catch (NativeIOException nioe) {
				if (nioe.getErrorCode() == 80) {
					// ERROR_FILE_EXISTS
					// 80 (0x50)
					// The file exists
					throw new AlreadyExistsException(nioe);
				}
				throw nioe;
			}
		}
	}

	private synchronized static void ensureInitialized() {
		if (!initialized) {
			cacheTimeout = new Configuration().getLong("hadoop.security.uid.cache.secs", 4 * 60 * 60) * 1000;
			LOG.info("Initialized cache for UID to User mapping with a cache" + " timeout of " + cacheTimeout / 1000
					+ " seconds.");
			initialized = true;
		}
	}

	/**
	 * A version of renameTo that throws a descriptive exception when it fails.
	 *
	 * @param src
	 *            The source path
	 * @param dst
	 *            The destination path
	 * 
	 * @throws NativeIOException
	 *             On failure.
	 */
	public static void renameTo(File src, File dst) throws IOException {
		if (!nativeLoaded) {
			if (!src.renameTo(dst)) {
				throw new IOException("renameTo(src=" + src + ", dst=" + dst + ") failed.");
			}
		} else {
			renameTo0(src.getAbsolutePath(), dst.getAbsolutePath());
		}
	}

	public static void link(File src, File dst) throws IOException {
		if (!nativeLoaded) {
			HardLink.createHardLink(src, dst);
		} else {
			link0(src.getAbsolutePath(), dst.getAbsolutePath());
		}
	}

	/**
	 * A version of renameTo that throws a descriptive exception when it fails.
	 *
	 * @param src
	 *            The source path
	 * @param dst
	 *            The destination path
	 * 
	 * @throws NativeIOException
	 *             On failure.
	 */
	private static native void renameTo0(String src, String dst) throws NativeIOException;

	private static native void link0(String src, String dst) throws NativeIOException;

	/**
	 * Unbuffered file copy from src to dst without tainting OS buffer cache
	 *
	 * In POSIX platform: It uses FileChannel#transferTo() which internally attempts
	 * unbuffered IO on OS with native sendfile64() support and falls back to
	 * buffered IO otherwise.
	 *
	 * It minimizes the number of FileChannel#transferTo call by passing the the src
	 * file size directly instead of a smaller size as the 3rd parameter. This saves
	 * the number of sendfile64() system call when native sendfile64() is supported.
	 * In the two fall back cases where sendfile is not supported,
	 * FileChannle#transferTo already has its own batching of size 8 MB and 8 KB,
	 * respectively.
	 *
	 * In Windows Platform: It uses its own native wrapper of CopyFileEx with
	 * COPY_FILE_NO_BUFFERING flag, which is supported on Windows Server 2008 and
	 * above.
	 *
	 * Ideally, we should use FileChannel#transferTo() across both POSIX and Windows
	 * platform. Unfortunately, the
	 * wrapper(Java_sun_nio_ch_FileChannelImpl_transferTo0) used by
	 * FileChannel#transferTo for unbuffered IO is not implemented on Windows. Based
	 * on OpenJDK 6/7/8 source code, Java_sun_nio_ch_FileChannelImpl_transferTo0 on
	 * Windows simply returns IOS_UNSUPPORTED.
	 *
	 * Note: This simple native wrapper does minimal parameter checking before copy
	 * and consistency check (e.g., size) after copy. It is recommended to use
	 * wrapper function like the Storage#nativeCopyFileUnbuffered() function in
	 * hadoop-hdfs with pre/post copy checks.
	 *
	 * @param src
	 *            The source path
	 * @param dst
	 *            The destination path
	 * @throws IOException
	 */
	public static void copyFileUnbuffered(File src, File dst) throws IOException {
		if (nativeLoaded && Shell.WINDOWS) {
			copyFileUnbuffered0(src.getAbsolutePath(), dst.getAbsolutePath());
		} else {
			FileInputStream fis = null;
			FileOutputStream fos = null;
			FileChannel input = null;
			FileChannel output = null;
			try {
				fis = new FileInputStream(src);
				fos = new FileOutputStream(dst);
				input = fis.getChannel();
				output = fos.getChannel();
				long remaining = input.size();
				long position = 0;
				long transferred = 0;
				while (remaining > 0) {
					transferred = input.transferTo(position, remaining, output);
					remaining -= transferred;
					position += transferred;
				}
			} finally {
				IOUtils.cleanup(LOG, output);
				IOUtils.cleanup(LOG, fos);
				IOUtils.cleanup(LOG, input);
				IOUtils.cleanup(LOG, fis);
			}
		}
	}

	private static native void copyFileUnbuffered0(String src, String dst) throws NativeIOException;
}

本文作者： xubatian
本文链接： http://xubatian.cn/Hadoop企业优化/
版权声明： 本博客所有文章除特别声明外均为原创，采用 CC BY 4.0 CN协议许可协议。转载请注明出处:https://www.xubatian.cn/