|
感觉没个老师带着学spark真是举步维艰啊···一个问题能卡了2天才解决···写一下这俩天的收获吧···
//////////////////////////打包
1.Project Structure
2.Artifacts --- +jar --- From module with··· --- main class --- ok
3.class paths 要填/usr/local/scala-2.10.4/lib/scala-swing.jar
/usr/local/scala-2.10.4/lib/scala-library.jar
/usr/local/scala-2.10.4/lib/scala-actors.jar
/usr/local/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar
4.build
5.rebuild
///////////////////////////////////////////
java -jar lfspark--.jar lfspark--.jar hdfs://localhost:9000/datatnt/text1.txt hdfs://localhost:9000/outputtnt/
/////////////////////////////////////////////
jar包里设置sc.txtfile(hdfs://localhost:9000/datatnt/text1.txt)
saveAsTextFile("hdfs://localhost:9000/outputtnt/")
///////////////////////////////////////
Hadoop hdfs文件系统文件已存在 解决办法:
bin/hadoop fs -rmr /output
///////////////////////////////////////
java -jar lfspark--.jar lfspark--.jar hdfs://localhost:9000/datatnt/text1.txt hdfs://localhost:9000/outputtnt/
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/30 23:27:32 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1;
using 192.168.30.129 instead (on interface ens33)
15/03/30 23:27:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/03/30 23:27:38 INFO SecurityManager: Changing view acls to: root
15/03/30 23:27:38 INFO SecurityManager: Changing modify acls to: root
15/03/30 23:27:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with
view permissions: Set(root); users with modify permissions: Set(root)
15/03/30 23:27:40 INFO Slf4jLogger: Slf4jLogger started
15/03/30 23:27:41 INFO Remoting: Starting remoting
15/03/30 23:27:42 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://sparkDriver@192.168.30.129:51013]
15/03/30 23:27:42 INFO Utils: Successfully started service 'sparkDriver' on port 51013.
15/03/30 23:27:42 INFO SparkEnv: Registering MapOutputTracker
15/03/30 23:27:42 INFO SparkEnv: Registering BlockManagerMaster
15/03/30 23:27:42 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150330232742-e238
15/03/30 23:27:42 INFO MemoryStore: MemoryStore started with capacity 129.5 MB
15/03/30 23:27:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
15/03/30 23:27:45 INFO HttpFileServer: HTTP File server directory is /tmp/spark-e1e41ea5-ffa7-42a0-937f-
8edb39cc7cf0
15/03/30 23:27:45 INFO HttpServer: Starting HTTP Server
15/03/30 23:27:45 INFO Utils: Successfully started service 'HTTP file server' on port 58859.
15/03/30 23:27:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/30 23:27:46 INFO SparkUI: Started SparkUI at http://192.168.30.129:4040
15/03/30 23:27:46 INFO SparkContext: Added JAR lfspark--.jar at http://192.168.30.129:58859/jars/lfspark--.jar
with timestamp 1427729266877
15/03/30 23:27:47 INFO AkkaUtils: Connecting to HeartbeatReceiver:
akka.tcp://sparkDriver@192.168.30.129:51013/user/HeartbeatReceiver
15/03/30 23:27:49 INFO NettyBlockTransferService: Server created on 42172
15/03/30 23:27:49 INFO BlockManagerMaster: Trying to register BlockManager
15/03/30 23:27:49 INFO BlockManagerMasterActor: Registering block manager localhost:42172 with 129.5 MB RAM,
BlockManagerId(<driver>, localhost, 42172)
15/03/30 23:27:49 INFO BlockManagerMaster: Registered BlockManager
15/03/30 23:27:51 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=135753891
15/03/30 23:27:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free
129.3 MB)
15/03/30 23:27:51 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=135753891
15/03/30 23:27:51 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB,
free 129.3 MB)
15/03/30 23:27:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:42172 (size: 22.2 KB,
free: 129.4 MB)
15/03/30 23:27:51 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/30 23:27:51 INFO SparkContext: Created broadcast 0 from textFile at paixu1.scala:26
15/03/30 23:27:54 INFO FileInputFormat: Total input paths to process : 1
15/03/30 23:27:55 INFO SparkContext: Starting job: count at paixu1.scala:33
15/03/30 23:27:55 INFO DAGScheduler: Got job 0 (count at paixu1.scala:33) with 2 output partitions
(allowLocal=false)
15/03/30 23:27:55 INFO DAGScheduler: Final stage: Stage 0(count at paixu1.scala:33)
15/03/30 23:27:55 INFO DAGScheduler: Parents of final stage: List()
15/03/30 23:27:55 INFO DAGScheduler: Missing parents: List()
15/03/30 23:27:55 INFO DAGScheduler: Submitting Stage 0 (hdfs://localhost:9000/datatnt/text1.txt MappedRDD[1]
at textFile at paixu1.scala:26), which has no missing parents
15/03/30 23:27:55 INFO MemoryStore: ensureFreeSpace(2544) called with curMem=186397, maxMem=135753891
15/03/30 23:27:55 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.5 KB, free
129.3 MB)
15/03/30 23:27:55 INFO MemoryStore: ensureFreeSpace(1892) called with curMem=188941, maxMem=135753891
15/03/30 23:27:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1892.0
B, free 129.3 MB)
15/03/30 23:27:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:42172 (size: 1892.0 B,
free: 129.4 MB)
15/03/30 23:27:55 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/30 23:27:55 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/03/30 23:27:55 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0
(hdfs://localhost:9000/datatnt/text1.txt MappedRDD[1] at textFile at paixu1.scala:26)
15/03/30 23:27:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/30 23:27:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1359
bytes)
15/03/30 23:27:55 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1359
bytes)
15/03/30 23:27:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/30 23:27:55 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/03/30 23:27:55 INFO Executor: Fetching http://192.168.30.129:58859/jars/lfspark--.jar with timestamp
1427729266877
15/03/30 23:27:56 INFO Utils: Fetching http://192.168.30.129:58859/jars/lfspark--.jar to
/tmp/fetchFileTemp1295935324425756912.tmp
15/03/30 23:27:57 INFO Executor: Adding file:/tmp/spark-c3800e4e-dd19-417a-83bc-674b999ccaee/lfspark--.jar to
class loader
15/03/30 23:27:57 INFO HadoopRDD: Input split: hdfs://localhost:9000/datatnt/text1.txt:0+56
15/03/30 23:27:57 INFO HadoopRDD: Input split: hdfs://localhost:9000/datatnt/text1.txt:56+56
15/03/30 23:27:57 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/03/30 23:27:57 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/03/30 23:27:57 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/03/30 23:27:57 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/03/30 23:27:57 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/03/30 23:28:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1920 bytes result sent to driver
15/03/30 23:28:02 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1920 bytes result sent to driver
15/03/30 23:28:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6725 ms on localhost (1/2)
15/03/30 23:28:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6720 ms on localhost (2/2)
15/03/30 23:28:02 INFO DAGScheduler: Stage 0 (count at paixu1.scala:33) finished in 6.959 s
15/03/30 23:28:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/30 23:28:02 INFO DAGScheduler: Job 0 finished: count at paixu1.scala:33, took 7.615616 s
5
15/03/30 23:28:05 INFO SparkContext: Starting job: saveAsTextFile at paixu1.scala:35
15/03/30 23:28:05 INFO DAGScheduler: Got job 1 (saveAsTextFile at paixu1.scala:35) with 2 output partitions
(allowLocal=false)
15/03/30 23:28:05 INFO DAGScheduler: Final stage: Stage 1(saveAsTextFile at paixu1.scala:35)
15/03/30 23:28:05 INFO DAGScheduler: Parents of final stage: List()
15/03/30 23:28:05 INFO DAGScheduler: Missing parents: List()
15/03/30 23:28:05 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[5] at saveAsTextFile at paixu1.scala:35),
which has no missing parents
15/03/30 23:28:05 INFO MemoryStore: ensureFreeSpace(112944) called with curMem=190833, maxMem=135753891
15/03/30 23:28:05 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 110.3 KB, free
129.2 MB)
15/03/30 23:28:06 INFO MemoryStore: ensureFreeSpace(67292) called with curMem=303777, maxMem=135753891
15/03/30 23:28:06 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 65.7 KB,
free 129.1 MB)
15/03/30 23:28:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:42172 (size: 65.7 KB,
free: 129.4 MB)
15/03/30 23:28:06 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/30 23:28:06 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/03/30 23:28:06 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[5] at saveAsTextFile at
paixu1.scala:35)
15/03/30 23:28:06 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/03/30 23:28:06 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1359
bytes)
15/03/30 23:28:06 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1359
bytes)
15/03/30 23:28:06 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
15/03/30 23:28:06 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
15/03/30 23:28:07 INFO BlockManager: Removing broadcast 1
15/03/30 23:28:07 INFO BlockManager: Removing block broadcast_1_piece0
15/03/30 23:28:07 INFO MemoryStore: Block broadcast_1_piece0 of size 1892 dropped from memory (free 135384714)
15/03/30 23:28:07 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:42172 in memory (size: 1892.0
B, free: 129.4 MB)
15/03/30 23:28:07 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/30 23:28:07 INFO BlockManager: Removing block broadcast_1
15/03/30 23:28:07 INFO MemoryStore: Block broadcast_1 of size 2544 dropped from memory (free 135387258)
15/03/30 23:28:07 INFO ContextCleaner: Cleaned broadcast 1
15/03/30 23:28:07 INFO HadoopRDD: Input split: hdfs://localhost:9000/datatnt/text1.txt:56+56
15/03/30 23:28:07 INFO HadoopRDD: Input split: hdfs://localhost:9000/datatnt/text1.txt:0+56
15/03/30 23:28:09 INFO FileOutputCommitter: Saved output of task 'attempt_201503302328_0001_m_000001_3' to
hdfs://localhost:9000/outputtnt/_temporary/0/task_201503302328_0001_m_000001
15/03/30 23:28:09 INFO SparkHadoopWriter: attempt_201503302328_0001_m_000001_3: Committed
15/03/30 23:28:09 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1719 bytes result sent to driver
15/03/30 23:28:09 INFO FileOutputCommitter: Saved output of task 'attempt_201503302328_0001_m_000000_2' to
hdfs://localhost:9000/outputtnt/_temporary/0/task_201503302328_0001_m_000000
15/03/30 23:28:09 INFO SparkHadoopWriter: attempt_201503302328_0001_m_000000_2: Committed
15/03/30 23:28:09 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1719 bytes result sent to driver
15/03/30 23:28:10 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 3874 ms on localhost (1/2)
15/03/30 23:28:10 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 3889 ms on localhost (2/2)
15/03/30 23:28:10 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/03/30 23:28:10 INFO DAGScheduler: Stage 1 (saveAsTextFile at paixu1.scala:35) finished in 3.892 s
15/03/30 23:28:10 INFO DAGScheduler: Job 1 finished: saveAsTextFile at paixu1.scala:35, took 4.521909 s
/////////////////////////////////////////
查看生成的文件信息:
hadoop fs -text /outputtnt/part-r-00000
一些存在错误的地方我测试了可能出错的所有原因···终于找到这些办法
关于在IDEA中如何正确的写一个简单的例子并正确生成jar包,以及要添加那些关键路径,在spark集群上如何正确运行jar包,到如何查看生成的文件
我感觉这两天我是爬着过来的···问题很简单···弯路走的太多···这些解决了终于可以试着写一些小程序了(但愿)
|