如果你的training data在hadoop上,那么java恐怕是你做数据预处理的最佳语言。为了在Hadoop上生TFRecord文件,你需要三样东西:
protoc,版本需要和hadoop上的protobuf版本保持一致tensorflow的源代码下面的TFRecordFileWriter
首先运行下面的命令生成protobuf messages的java文件
protoc --proto_path=C:\tensorflow --java_out=. C:\tensorflow\tensorflow\core\example\example.proto
然后把下面这个class添加到你的项目中
package com.bing.imagetool;
import java.io.BufferedOutputStream;
import java.io.Closeable;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import org.tensorflow.example.BytesList;
import org.tensorflow.example.Feature;
import org.tensorflow.example.FloatList;
import org.tensorflow.example.Int64List;
import com.google.protobuf.ByteString;
public class TFRecordFileWriter implements Closeable {
/**
* Implements CRC32-C as defined in: "Optimization of Cyclic Redundancy-CHeck Codes with 24 and 32
* Parity Bits", IEEE Transactions on Communications 41(6): 883-892 (1993).
*
* The implementation of this class has been sourced from the Appe…