在yarn中对 yarn.nodemanager.local-dirs的状态更新操作,定义在 LocalDirsHandlerService(org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService)相关类中,在nm启动时,会启动一个LocalDirsHandlerService服务,循环检测yarn.nodemanager.local-dirs和yarn.nodemanager.log-dirs目录的可用性,本质上其实是用java.util.Timer 和java.util.TimerTask 实现的一个服务线程。
LocalDirsHandlerService的内部类 MonitoringTimerTask扩展了TimerTask类
通过MonitoringTimerTas的构造函数对进行初始化,比如获取设置的yarn.nodemanager.log-dirs和yarn.nodemanager.local-dirs 设置有效的local路径
这个线程常用的参数:
1 2 3 4 5 6 | YarnConfiguration.NM_DISK_HEALTH_CHECK_INTERVAL_MS //yarn.nodemanager.disk-health-checker.interval-ms 默认是2分钟 YarnConfiguration.NM_DISK_HEALTH_CHECK_ENABLE //yarn.nodemanager.disk-health-checker.enable 默认是开启 YarnConfiguration.NM_MIN_HEALTHY_DISKS_FRACTION //yarn.nodemanager.disk-health-checker.min-healthy-disks 默认是0.25,即最少应该是1/4的设置路径是正常的 |
在cdh4.6.0中,MonitoringTimerTask的构造函数如下:
1 2 3 4 5 6 7 8 9 | public MonitoringTimerTask( Configuration conf) throws YarnException { localDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOCAL_DIRS))); logDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOG_DIRS))); localDirsAllocator = new LocalDirAllocator( YarnConfiguration.NM_LOCAL_DIRS); logDirsAllocator = new LocalDirAllocator( YarnConfiguration.NM_LOG_DIRS); } |
而在cdh5.2.0中,构造函数多了两个配置项
1 2 3 4 5 6 | YarnConfiguration.NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE //percentage of disk that can be used before the dir is taken out of the good dirs list //yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage 默认是100(这个值需要改成小于100,比如80,否则容易出现磁盘满地问题) YarnConfiguration.NM_MIN_PER_DISK_FREE_SPACE_MB //minimum space, in MB, that must be available on the disk for the dir to be marked as good //yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb 默认是0MB |
在检查localdirs的初始可用性会考虑到这两个设置(validatePaths方法)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | public MonitoringTimerTask( Configuration conf) throws YarnRuntimeException { float maxUsableSpacePercentagePerDisk = conf.getFloat( YarnConfiguration.NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE, YarnConfiguration.DEFAULT_NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE); long minFreeSpacePerDiskMB = conf.getLong( YarnConfiguration.NM_MIN_PER_DISK_FREE_SPACE_MB, YarnConfiguration.DEFAULT_NM_MIN_PER_DISK_FREE_SPACE_MB); localDirs = new DirectoryCollection( validatePaths(conf .getTrimmedStrings( YarnConfiguration.NM_LOCAL_DIRS)), maxUsableSpacePercentagePerDisk, minFreeSpacePerDiskMB); logDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings( YarnConfiguration.NM_LOG_DIRS)), maxUsableSpacePercentagePerDisk, minFreeSpacePerDiskMB); localDirsAllocator = new LocalDirAllocator( YarnConfiguration.NM_LOCAL_DIRS); logDirsAllocator = new LocalDirAllocator( YarnConfiguration.NM_LOG_DIRS); } |
local dirs的判断线程会每隔一段时间对目录的可用性进行测试,调用的方法是
1 | checkDirs---->updateDirsAfterFailure--->areDisksHealthy |
可用判断主要是判断错误的目录占配置目录的比例,当yarn.nodemanager.local-dirs或者yarn.nodemanager.log-dirs异常目录占了一定百分比后,磁盘检测就会失败,nm就会抛出异常:
具体的判断逻辑在areDisksHealthy方法中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | public boolean areDisksHealthy() { if (! isDiskHealthCheckerEnabled) { //判断是否开启了磁盘状态检测的功能 return true ; } int goodDirs = getLocalDirs().size(); int failedDirs = localDirs.getFailedDirs().size(); int totalConfiguredDirs = goodDirs + failedDirs; if (goodDirs/( float )totalConfiguredDirs < minNeededHealthyDisksFactor ) { //异常的yarn.nodemanager.local-dirs比例判断 return false ; // Not enough healthy local- dirs } goodDirs = getLogDirs().size(); failedDirs = logDirs.getFailedDirs().size(); totalConfiguredDirs = goodDirs + failedDirs; if (goodDirs/( float )totalConfiguredDirs < minNeededHealthyDisksFactor ) { //异常的yarn.nodemanager.log-dirs比例判断 return false ; // Not enough healthy log- dirs } return true ; } |