Spark 1.1中下载jar包时的一个小错误

我在用yarn中跑spark时,总是遇见一个这样的错误:

java.io.FileNotFoundException: D:\data\yarnnm\local\usercache\chasun\appcache\application_1434537262045_0033\container_1434537262045_0033_01_000054\netlib-native\_ref-win-x86\_64-1.1-natives.jar (Access is denied)
  at java.io.FileOutputStream.open(Native Method)
  at java.io.FileOutputStream.(FileOutputStream.java:221)
  at org.spark-project.guava.common.io.Files¥FileByteSink.openStream(Files.java:223)
  at org.spark-project.guava.common.io.Files¥FileByteSink.openStream(Files.java:211)
  at org.spark-project.guava.common.io.ByteSource.copyTo(ByteSource.java:203)
  at org.spark-project.guava.common.io.Files.copy(Files.java:436)
  at org.spark-project.guava.common.io.Files.move(Files.java:651)
  at org.apache.spark.util.Utils¥.fetchFile(Utils.scala:437)
  at org.apache.spark.executor.Executor¥¥anonfun¥org¥apache¥spark¥executor¥Executor¥¥updateDependencies¥6.apply(Executor.scala:372)
  at org.apache.spark.executor.Executor¥¥anonfun¥org¥apache¥spark¥executor¥Executor¥¥updateDependencies¥6.apply(Executor.scala:370)
  at scala.collection.TraversableLike¥WithFilter¥¥anonfun¥foreach¥1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap¥¥anonfun¥foreach¥1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashMap¥¥anonfun¥foreach¥1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashTable¥class.foreachEntry(HashTable.scala:226)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
  at scala.collection.TraversableLike¥WithFilter.foreach(TraversableLike.scala:771)
  at org.apache.spark.executor.Executor.org¥apache¥spark¥executor¥Executor¥¥updateDependencies(Executor.scala:370)
  at org.apache.spark.executor.Executor¥TaskRunner.run(Executor.scala:166)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor¥Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:724)

我花了2-3个小时才找出原因。

它原来的fetch file的代码是这样的:

    if (targetFile.exists && !Files.equal(cachedFile, targetFile)) {
        if (conf.getBoolean("spark.files.overwrite", false)) {
          targetFile.delete()
          logInfo((s"File $targetFile exists and does not match contents of $url, " +
            s"replacing it with $url"))
        } else {
          throw new SparkException(s"File $targetFile exists and does not match contents of $url")
        }
   }
   Files.copy(cachedFile, targetFile)

这里的Files.equal会把两个文件逐字节的比较,如果相同就返回true。 但是有这么一种情况: 目标文件存在,确实也和cachedFile内容相同,但是它只是一个符号连接,指向一个system wide的readonly的文件。那么下面对targetFile进行Files.copy就会触发"Access is denied"。嗯,异常的名字却是FileNotFoundException。

这种情况发生于:yarn有一个public file cache(下图中的Public Localizer),如果jar包在这个file cache中,就会被软链接过来。

详情见: http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/

这个BUG在这个https://github.com/apache/spark/commit/7981f969762e77f1752ef8f86c546d4fc32a1a4f commit中被修补了。

此博客中的热门博文

少写代码,多读别人写的代码

在windows下使用llvm+clang

tensorflow distributed runtime初窥