Java API 读取HDFS的单文件

论坛 期权论坛 编程之家     
选择匿名的用户   2021-5-29 09:59   568   0

HDFS上的单文件:

-bash-3.2$ hadoop fs -ls /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category
Found 1 items
-rw-r--r--   2 deploy supergroup        520 2014-08-14 17:03 /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category/repeatRecCategory.txt
文件内容:

-bash-3.2$ hadoop fs -cat /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category/repeatRecCategory.txt | more
8104
960985
5472
971917
5320
971895
971902
971922
958261
972047
972050

Java API使用FileSystem方式 读取HDFS单文件的方法

/**
 * 获取可重复推荐的类目,以英文逗号分隔
 * @param filePath
 * @param conf
 * @return
 */
public String getRepeatRecCategoryStr(String filePath) {
 final String DELIMITER = "\t";
 final String INNER_DELIMITER = ",";
 
 String categoryFilterStrs = new String();
 BufferedReader br = null;
 try {
  FileSystem fs = FileSystem.get(new Configuration());
  FSDataInputStream inputStream = fs.open(new Path(filePath));
  br = new BufferedReader(new InputStreamReader(inputStream));
  
  String line = null;
  while (null != (line = br.readLine())) {
   String[] strs = line.split(DELIMITER);
   categoryFilterStrs += (strs[0] + INNER_DELIMITER);
  }
 } catch (IOException e) {
  e.printStackTrace();
 } finally {
  if (null != br) {
   try {
    br.close();
   } catch (IOException e) {
    e.printStackTrace();
   }
  }
 }
 
 return categoryFilterStrs;
}

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP