spark 中rdd与dataframe的合并(join)

以下是我写的代码:

 /*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// scalastyle:off println
package com.shine.ncc

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.mllib.classification.NaiveBayesModel
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming.Time
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkContext
import org.apache.spark.ml.feature.Tokenizer
import org.ansj.splitWord.analysis.ToAnalysis
import org.ansj.util.FilterModifWord
import java.util.Arrays
import org.apache.spark.mllib.feature.HashingTF
import scala.collection.JavaConversions._
import org.apache.spark.mllib.feature.IDF
import org.apache.spark.mllib.feature.IDFModel
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.util.Bytes

object NetworkNewsClassify1 {
  var sameModel = null 

  /** Case class for converting RDD to DataFrame */
  case class Record(content: String,time:String,title:String)


  /** Lazily instantiated singleton instance of SQLContext */
  object SQLContextSingleton {

    @transient  private var instance: SQLContext = _

    def getInstance(sparkContext: SparkContext): SQLContext = {
      if (instance == null) {
        instance = new SQLContext(sparkContext)
      }
      instance
    }
  }

  def main(args: Array[String]) {
//    if (args.length < 2) {
//      System.err.println("Usage: NetworkWordCount <hostname> <port>")
//      System.exit(1)
//    }

    StreamingExamples.setStreamingLogLevels()

    // Create the context with a 1 second batch size
    val sparkConf = new SparkConf().setAppName("NetworkNewsClassify")
    sparkConf.setMaster("local[2]");
    val ssc = new StreamingContext(sparkConf, Seconds(1))

    // Create a socket stream on target ip:port and count the   获取json信息
    val lines = ssc.socketTextStream("localhost", 9999, StorageLevel.MEMORY_AND_DISK_SER)
    val myNaiveBayesModel = NaiveBayesModel.load(ssc.sparkContext, "D:/myNaiveBayesModel")
    //将接送转换成rdd
    lines.foreachRDD((rdd: RDD[String], time: Time) => {
      // Get the singleton instance of SQLContext
      val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
      import sqlContext.implicits._

      val newsDF = sqlContext.read.json(rdd)
      newsDF.count();
      val featurizedData = newsDF.map{
          line => 
            val temp = ToAnalysis.parse(line.getAs("title"))
            //加入停用词 
            FilterModifWord.insertStopWords(Arrays.asList("r","n"))
            //加入停用词性???? 
            FilterModifWord.insertStopNatures("w",null,"ns","r","u","e")
            val filter = FilterModifWord.modifResult(temp)
            //此步骤将会只取分词,不附带词性
            val words = for(i<-Range(0,filter.size())) yield filter.get(i).getName
            //println(words.mkString("  ;  "));
            //计算每个词在文档中的词频
            new HashingTF(500000).transform(words)
      }.cache()
      if(featurizedData.count()>0){
        //计算每个词的TF-IDF
        val idf = new IDF()
        val idfModel = idf.fit(featurizedData)
        val tfidfData = idfModel.transform(featurizedData);
        //分类预测
        val resultData = myNaiveBayesModel.predict(tfidfData)
        println(resultData)

        //将result结果与newsDF信息join在一起
        //**??? 不会实现了。。。**
        //保存新闻到hbase中

      }

    })


    ssc.start()
    ssc.awaitTermination()
  }
}

其中newsDF是新闻信息,包含字段(title,body,date),resultData 是通过贝叶斯模型预测的新闻类型,我现在希望把result结果作为一个type字段与newsDF合并(join),保存到hbase中,这个合并的操作怎么做呢

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
如何给rdd/dataframe增加一个自增列?
如题 假设目前有一个dataframe转化过来的rdd a,b,c d,e,f g,h,i 现在我想增加一个自增列 1,a,b,c 2,d,e,f 3,g,h,i dataframe或者rdd形式的都可以 请问大佬们怎么实现?
pyspark sc.textFile 转为的RDD 再生成 dataframe 问题
首先 我用的是spark 1.5.1 所以 网上的 spark.read.csv 等方法对我来说并不适用 这是我的配置 ![图片说明](https://img-ask.csdn.net/upload/202002/05/1580873768_746424.png) 然后 我不能解决的问题是 在pyspark 中 由 sc.textFile转换的RDD 再生成 dataframe 接下来是我的代码: (数据有54列) df=sc.textFile("file:///home/hu/births_train.csv") # 删除了第一行 import pyspark.sql.types as typ lable=[("INFANT_ALIVE_AT_REPORT",typ.StringType()),("BIRTH_YEAR",typ.IntegerType()),\ ("BIRTH_MONTH",typ.IntegerType()),( "BIRTH_PLACE",typ.StringType()),\ ("MOTHER_AGE_YEARS",typ.IntegerType()),( "MOTHER_RACE_6CODE",typ.StringType()),\ ("MOTHER_EDUCATION",typ.StringType()),( "FATHER_COMBINED_AGE",typ.IntegerType()),\ ("FATHER_EDUCATION",typ.StringType()),( "MONTH_PRECARE_RECODE",typ.StringType()),\ ("CIG_BEFORE",typ.StringType()),( "CIG_1_TRI",typ.StringType()),\ ("CIG_2_TRI",typ.StringType()),( "CIG_3_TRI",typ.StringType()),\ ("MOTHER_HEIGHT_IN",typ.StringType()),( "MOTHER_BMI_RECODE",typ.StringType()),\ ("MOTHER_PRE_WEIGHT",typ.IntegerType()),( "MOTHER_DELIVERY_WEIGHT",typ.IntegerType()),\ ("MOTHER_WEIGHT_GAIN",typ.IntegerType()),( "DIABETES_PRE",typ.StringType()),\ ("DIABETES_GEST",typ.StringType()),( "HYP_TENS_PRE",typ.StringType()),\ ("HYP_TENS_GEST",typ.StringType()),( "PREV_BIRTH_PRETERM",typ.StringType()),\ ("NO_RISK",typ.StringType()),( "NO_INFECTIONS_REPORTED",typ.StringType()),\ ("LABOR_IND",typ.StringType()),( "LABOR_AUGM",typ.StringType()),\ ( "STEROIDS",typ.StringType()),( "ANTIBIOTICS",typ.StringType()),\ ( "ANESTHESIA",typ.StringType()),( "DELIV_METHOD_RECODE_COMB",typ.StringType()),\ ( "ATTENDANT_BIRTH",typ.StringType()),( "APGAR_5",typ.StringType()),\ ( "APGAR_5_RECODE",typ.StringType()),( "APGAR_10",typ.StringType()),\ ( "APGAR_10_RECODE",typ.StringType()),( "INFANT_SEX",typ.StringType()),\ ( "OBSTETRIC_GESTATION_WEEKS",typ.StringType()),( "INFANT_WEIGHT_GRAMS",typ.StringType()),\ ( "INFANT_ASSIST_VENTI",typ.StringType()),( "INFANT_ASSIST_VENTI_6HRS",typ.StringType()),\ ( "INFANT_NICU_ADMISSION",typ.StringType()),( "INFANT_SURFACANT",typ.StringType()),\ ( "INFANT_ANTIBIOTICS",typ.StringType()),( "INFANT_SEIZURES",typ.StringType()),\ ( "INFANT_NO_ABNORMALITIES",typ.StringType()),( "INFANT_ANCEPHALY",typ.StringType()),\ ( "INFANT_MENINGOMYELOCELE",typ.StringType()),( "INFANT_LIMB_REDUCTION",typ.StringType()),\ ( "INFANT_DOWN_SYNDROME",typ.StringType()),( "INFANT_SUSPECTED_CHROMOSOMAL_DISORDER",typ.StringType()),\ ( "INFANT_NO_CONGENITAL_ANOMALIES_CHECKED",typ.StringType()),( "INFANT_BREASTFED",typ.StringType())] schema=typ.StructType([typ.StructField(e[0],e[1],False ) for e in lable]) from pyspark.sql import Row df1= df.map(lambda p : Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6), p(7), p(8), p(9), p(10), p(11), p(12), p(13), p(14), p(15), p(16), p(17), p(18), p(19), p(20), p(21), p(22), p(23), p(24), p(25), p(26), p(27), p(28), p(29), p(30), p(31),p(32), p(33), p(34), p(35), p(36), p(37), p(38), p(39), p(40), p(41),p(42), p(43), p(44), p(45), p(46), p(47), p(48), p(49), p(50), p(51),p(52), p(53))) births=sqlContext.createDataFrame(df1,schema) 但是报错了 ,报错的图片如下 ![图片说明](https://img-ask.csdn.net/upload/202002/05/1580874372_282432.png) 所以这该怎么解决呢 ?? 谢谢
spark读取avro序列化的parquet时报错:Illegal Parquet type: FIXED_LEN_BYTE_ARRAY
avro格式定义如下图:![图片说明](https://img-ask.csdn.net/upload/202002/14/1581611055_583617.png) 然后spark正常读取生成的parquet则报错:Illegal Parquet type: FIXED_LEN_BYTE_ARRAY。问怎么读取parquet(不一定要用spark)?详细错误如下: org.apache.spark.sql.AnalysisException: Illegal Parquet type: FIXED_LEN_BYTE_ARRAY; at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:107) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:175) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:89) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:71) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:65) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:62) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:664) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:664) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:621) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:801) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
spark RDD中的元组如何按照指定格式保存到HDFS上?
请教一个问题:spark数据清洗的结果为RDD[(String, String)]类型的rdd,在这个RDD中,每一个元素都是一个元组。元组的key值是文件名,value值是文件内容,我现在想把整个RDD保存在HDFS上,让RDD中的每一个元素保存为一个文件,其中key值作为文件名,而value值作为文件内容。 应该如何实现呢? RDD好像不支持遍历,只能通过collect()方法保存为一个数组,再进行遍历,但是这样可能会把内存撑爆,目前的做法是先把RDD通过saveAsTextFile方法保存在HDFS上,然后再使用FSDataInputStream输入流对保存后的part文件进行遍历读取,使用输出流写到HDFS上,这样很耗时。 请问有没有好一点的方法,可以直接把RDD的内容写到HDFS上呢?
关于spark RDD求平均的问题
hi, 假设我有一个spark RDD里面记录的是(时段,分数,次数) 我现在想求:每个时段的平均分数,即:同一个时段下,总分数 / 总次数 不知有什么好方法没有,因为我发现无论是action操作也好,转换成其他Rdd也好, 总没有满意方法,只能分成两个rdd然后关联处理 求大侠帮忙,谢谢
spark运行scala的jar包
![图片说明](https://img-ask.csdn.net/upload/202002/05/1580887545_330719.png) ![图片说明](https://img-ask.csdn.net/upload/202002/05/1580887568_992291.png) ![图片说明](https://img-ask.csdn.net/upload/202002/05/1580887616_449280.png) 有人遇到过类似的问题吗? 我的尝试: 当没Master节点的Worker进程,运行会报错,当开启了Master节点的Worker进程有时不会报错,但是会说内存不够,但是我觉得不是这个问题,也能得出一定的结果,但并不是预期的结果。 执行的命令:bin/spark-submit --master spark://node1:7077 --class cn.itcast.WordCount_Online --executor-memory 1g --total-executor-cores 1 ~/data/spark_chapter02-1.0-SNAPSHOT.jar /spark/test/words.txt /spark/test/out jar包是在idea中打包的,用的是scala语言,主要作用是词频统计 scala代码: ``` package cn.itcast import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object WordCount_Online { def main(args: Array[String]):Unit={ val sparkConf = new SparkConf().setAppName("WordCount_Online") val sparkContext = new SparkContext(sparkConf) val data : RDD[String] = sparkContext.textFile(args(0)) val words :RDD[String] = data.flatMap(_.split(" ")) val wordAndOne :RDD[(String,Int)] = words.map(x => (x,1)) val result :RDD[(String,Int)] = wordAndOne.reduceByKey(_+_) result.saveAsTextFile(args(1)) sparkContext.stop() } } ``` 我也做了很多尝试,希望懂的人可以交流一下
spark pair RDD创建操作
对于一个文件,每一行如下: ID\t value:value(value的数量不固定) 如何创建RDD使得每一个value对应于一个ID? 希望是python的spark解答
spark的rdd 可以看做数组吗?那么 可以随机取里面的数据吗?
``` Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.9 (default, Sep 25 2018 20:42:16) SparkSession available as 'spark'. >>> sc=spark.read.text('/tmp/temp_file_5.part.gz') >>> sc.count() 19839 >>> 我想将这个文件分成4分, 0-5000,5000-10000,15000-19839 怎么将这个rrd分成4份了? 我想取 第h行的数据,能有好的办法吗? ```
对Spark RDD中的数据进行处理
Spark新手。 现在在程序中生成了一个VertexRDD[(String,String)]. 其中的值是如下这种形式的: (3477,267 6106 7716 8221 18603 19717 28189) (2631,18589 18595 25725 26023 26026 27866) (10969,18591 25949 25956 26041) (10218,9320 19950 20493 26031) (5860,18583 18595 25725 26233) (11501,1551 26187 27170) (5717,2596 5187 5720 18583 25725) (950,19667 20493 25725 26024 26033 26192 27279 27281) (13397,19943 26377) (2899,4720 8411 19081 20100 20184 20270 20480 20493 20573 20574 25891) (11424,19816 19819 19841 20244 27098) (8951,5914 18609 26057) (1909,8797 18608 19785 19786 27531) (12807,20040 20608 27159)(后面用到的数据) (17953,1718 6112 18603 18608) 前面的值是key,后面的一串字符是value(由空格隔开) 现在我想对于这个RDD,将每一条数据value中的空格隔开的每个值取出并两两组合,形成一个新的key-value的数据,然后形成一个新的RDD,比如 对(12807,20040 20608 27159)这一条数据,处理后得到的是 (20040,20608) (20040,27159) (20608,27159) 怎么才能实现?求问
spark中我需要判断一个rdd中的元素在另一个rdd中的位置
现在我遇到了个问题,我有两个rdd,我希望判断第一个rdd中的元素在第二个rdd中的第几个位置,如果没有就默认为0,请问这能做到吗?
spark创建dataframe导入phoenix如何禁止自动创建字段编号
请教:从HDFS里读一个文件,map开拿出数据,转换成dataframe类型,再放入phoenix里面。转换成dataframe后,为什么给数据自动加一个前缀"_1","_2"。这样导致数据放入phoenix的时候,列簇对应不上,phoenix表已经创建好,定义过列簇名,下面是代码,和报错 ![图片说明](https://img-ask.csdn.net/upload/201602/23/1456214548_907859.png) 我创建phoenix表的行键列簇名字已经定义好了:HANGJIAN , LIECU ,LIECU2 ,LEICU5 ,HANGJIAN5 spark转换rdd的时候自动添加了"_1", "_2","_3"' "_4", "_5" ![图片说明](https://img-ask.csdn.net/upload/201602/23/1456214560_967255.png) 能不能转换数据的时候 ,不自动 加: _1 _2 等等前缀,直接让数据存入phoenix表中。请问大神们是怎么做的?
spark读取不了本地文件是怎么回事
``` textFile=sc.textFile("file:///home/hduser/pythonwork/ipynotebook/data/test.txt") stringRDD=textFile.flatMap(lambda line:line.split(' ')) stringRDD.collect() ``` 我此路径下是有test文件的: ![图片说明](https://img-ask.csdn.net/upload/201805/18/1526634813_44673.png) 错误是: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 8.0 failed 4 times, most recent failure: Lost task 1.3 in stage 8.0 (TID 58, 192.168.56.103, executor 1): java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist 。 。 。 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599) 。 。 。 Caused by: java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist ``` 而且发现若我把代码中test.txt随便改一个名字,比如ttest.txt(肯定是没有的文件) 错误竟然发生了变化: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hduser/pythonwork/ipynotebook/data/tesst.txt at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:938) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:153) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) ``` 注意: 此时我是以spark集群跑的:'spark://emaster:7077' 若是以本地跑就可以找到本地的那个test.txt文件 找hdfs文件系统的文件可以找到(在spark集群跑情况下) 。。。处由于字数显示省略了些不重要的错误提示,若想知道的话可以回复我 跪求大神帮助~感激不尽!!!
Spark读取错误PrematureEOFfrominputStream
:主要问题java.io.EOFException: Premature EOF from inputStream 使用textFile或者newAPIHadoopFile都出现这个错误 写spark读取数据的时候一直报这个错误。 连count,repartition都过不去。数据读的比平常慢的多。 看数据文件,应该是很均匀的,应该不是数据倾斜的问题了吧。 下面是报错信息: ``` 16/09/15 23:27:57 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54) at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Driver stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54) at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ```
spark shell在存运算结果到hdfs时报java.io.IOException: Not a file: hdfs://mini1:9000/spark/res
scala> sc.textFile("hdfs://mini1:9000/spark").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://mini1:9000/spark/res2") 执行上面的代码出错,这个目录在hdfs下是有的,而且就算没有也会创建。还有就是我运行的代码中是保存到res2目录 ,这里为什么报没有res目录 18/11/05 19:06:44 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes java.io.IOException: Not a file: hdfs://mini1:9000/spark/res at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:320) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:330) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC$$iwC.<init>(<console>:41) at $iwC$$iwC.<init>(<console>:43) at $iwC.<init>(<console>:45) at <init>(<console>:47) at .<init>(<console>:51) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
测试spark集群入门级wordcount出错,求大神们帮忙解决啊
* Created by jyq on 10/14/15. */ 就这么点源代码 import org.apache.spark.{SparkConf,SparkContext,SparkFiles} object WordCount { def main(args: Array[String]):Unit= { val conf =new SparkConf().setAppName("WordCount").setMaster("spark://master:7077") val sc = new SparkContext(conf) sc.addFile("file:///home/jyq/Desktop/1.txt") val textRDD=sc.textFile(SparkFiles.get("file:///home/jyq/Desktop/1.txt")) val result = textRDD.flatMap(line =>line.split("\\s+") ).map(word=> (word, 1)).reduceByKey(_ + _) result.saveAsTextFile("/home/jyq/Desktop/2.txt") println("hello world") } } 在IDEA编译运行下输出的日志: Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: file: at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.<init>(Path.java:172) at org.apache.hadoop.fs.Path.<init>(Path.java:94) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:257) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:289) at WordCount$.main(WordCount.scala:16) at WordCount.main(WordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: file: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.failExpecting(URI.java:2854) at java.net.URI$Parser.parse(URI.java:3057) at java.net.URI.<init>(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:203) ... 41 more 15/10/15 20:08:36 INFO SparkContext: Invoking stop() from shutdown hook 15/10/15 20:08:36 INFO SparkUI: Stopped Spark web UI at http://192.168.179.111:4040 15/10/15 20:08:36 INFO DAGScheduler: Stopping DAGScheduler 15/10/15 20:08:36 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/10/15 20:08:36 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/10/15 20:08:36 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/10/15 20:08:36 INFO MemoryStore: MemoryStore cleared 15/10/15 20:08:36 INFO BlockManager: BlockManager stopped 15/10/15 20:08:36 INFO BlockManagerMaster: BlockManagerMaster stopped 15/10/15 20:08:36 INFO SparkContext: Successfully stopped SparkContext 15/10/15 20:08:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/10/15 20:08:36 INFO ShutdownHookManager: Shutdown hook called 15/10/15 20:08:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-d7ca48d5-4e31-4a07-9264-8d7f5e8e1032 15/10/15 20:08:36 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. Process finished with exit code 1
Spark RDD和HDFS数据一致性问题
这里想问个问题。 我用Spark SQL从HDFS load上来了一张表。 然后我现在有如下两种情况: 1. 新增数据都是通过Spark SQL load进去的 - 这时候我HDFS和RDD上面的数据是否一致 2. 我数据是直接load到了HDFS上面(例如是个分区表,增加了一个分区) - 这时候我HDFS和RDD上面的数据是否一致 麻烦给出详细的原理过程或者参考链接
spark streaming运行一段时间报以下异常,怎么解决
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1568735.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1568735.0 (TID 11808399, iZ94pshi327Z): java.lang.Exception: Could not compute split, block input-0-1438413230200 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/08/01 08:53:09 WARN AkkaUtils: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/08/01 08:53:28 WARN AkkaUtils: Error sending message [message = UpdateBlockInfo(BlockManagerId(0, iZ94w2tczvjZ, 41595),input-0-1438385673800,StorageLevel(false, false, false, false, 1),0,0,0)] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221) at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:62) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:384) at org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:360) at org.apache.spark.storage.BlockManager.dropOldBlocks(BlockManager.scala:1138) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$dropOldNonBroadcastBlocks(BlockManager.scala:1115) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcVJ$sp(BlockManager.scala:149) at org.apache.spark.util.MetadataCleaner$$anon$1.run(MetadataCleaner.scala:43) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) 15/08/01 08:53:42 WARN AkkaUtils: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] in 3 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/08/01 08:53:45 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:209) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) ... 1 more
spark jdbc连接impala报错Method not supported
各位好 我的spark是2.1.0,用的hive-jdbc 2.1.0,现在写入impala的时候报以下错: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2304) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) at com.aoyou.data.CustomerVisitProduct$.saveToHive(CustomerVisitProduct.scala:281) at com.aoyou.data.CustomerVisitProduct$.main(CustomerVisitProduct.scala:221) at com.aoyou.data.CustomerVisitProduct.main(CustomerVisitProduct.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 以下是代码实现 val sparkConf = new SparkConf().setAppName("save").set("spark.sql.crossJoin.enabled", "true"); val sparkSession = SparkSession .builder() .enableHiveSupport() .getOrCreate(); val dataframe = sparkSession.createDataFrame(rddSchema, new Row().getClass()) val property = new Properties(); property.put("user", "xxxxx") property.put("password", "xxxxx") dataframe.write.mode(SaveMode.Append).option("driver", "org.apache.hive.jdbc.HiveDriver").jdbc("jdbc:hive2://xxxx:21050/rawdata;auth=noSasl", "tablename", property) 请问这是怎么回事啊?感觉是驱动版本问题
spark 写入elasticsearch报错Could not write all entries
我在使用Spark将Rdd写入到elasticsearch集群的时候报出异常 ``` Could not write all entries [199/161664] (maybe ES was overloaded?). Bailing out... at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:250) at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:201) at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:163) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:49) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` RDD大概是5000W行数据,es集群有两个节点 ``` EsSpark.saveToEs(result, "userindex/users", Map("es.mapping.id" -> "uid")) ```
终于明白阿里百度这样的大公司,为什么面试经常拿ThreadLocal考验求职者了
点击上面↑「爱开发」关注我们每晚10点,捕获技术思考和创业资源洞察什么是ThreadLocalThreadLocal是一个本地线程副本变量工具类,各个线程都拥有一份线程私有的数
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过...
《奇巧淫技》系列-python!!每天早上八点自动发送天气预报邮件到QQ邮箱
此博客仅为我业余记录文章所用,发布到此,仅供网友阅读参考,如有侵权,请通知我,我会删掉。 补充 有不少读者留言说本文章没有用,因为天气预报直接打开手机就可以收到了,为何要多此一举发送到邮箱呢!!!那我在这里只能说:因为你没用,所以你没用!!! 这里主要介绍的是思路,不是天气预报!不是天气预报!!不是天气预报!!!天气预报只是用于举例。请各位不要再刚了!!! 下面是我会用到的两个场景: 每日下
死磕YOLO系列,YOLOv1 的大脑、躯干和手脚
YOLO 是我非常喜欢的目标检测算法,堪称工业级的目标检测,能够达到实时的要求,它帮我解决了许多实际问题。 这就是 YOLO 的目标检测效果。它定位了图像中物体的位置,当然,也能预测物体的类别。 之前我有写博文介绍过它,但是每次重新读它的论文,我都有新的收获,为此我准备写一个系列的文章来详尽分析它。这是第一篇,从它的起始 YOLOv1 讲起。 YOLOv1 的论文地址:https://www.c...
知乎高赞:中国有什么拿得出手的开源软件产品?(整理自本人原创回答)
知乎高赞:中国有什么拿得出手的开源软件产品? 在知乎上,有个问题问“中国有什么拿得出手的开源软件产品(在 GitHub 等社区受欢迎度较好的)?” 事实上,还不少呢~ 本人于2019.7.6进行了较为全面的回答,对这些受欢迎的 Github 开源项目分类整理如下: 分布式计算、云平台相关工具类 1.SkyWalking,作者吴晟、刘浩杨 等等 仓库地址: apache/skywalking 更...
20行Python代码爬取王者荣耀全英雄皮肤
引言 王者荣耀大家都玩过吧,没玩过的也应该听说过,作为时下最火的手机MOBA游戏,咳咳,好像跑题了。我们今天的重点是爬取王者荣耀所有英雄的所有皮肤,而且仅仅使用20行Python代码即可完成。 准备工作 爬取皮肤本身并不难,难点在于分析,我们首先得得到皮肤图片的url地址,话不多说,我们马上来到王者荣耀的官网: 我们点击英雄资料,然后随意地选择一位英雄,接着F12打开调试台,找到英雄原皮肤的图片...
简明易理解的@SpringBootApplication注解源码解析(包含面试提问)
欢迎关注文章系列 ,关注我 《提升能力,涨薪可待》 《面试知识,工作可待》 《实战演练,拒绝996》 欢迎关注我博客,原创技术文章第一时间推出 也欢迎关注公 众 号【Ccww笔记】,同时推出 如果此文对你有帮助、喜欢的话,那就点个赞呗,点个关注呗! 《提升能力,涨薪可待篇》- @SpringBootApplication注解源码解析 一、@SpringBootApplication 的作用是什...
西游记团队中如果需要裁掉一个人,会先裁掉谁?
2019年互联网寒冬,大批企业开始裁员,下图是网上流传的一张截图: 裁员不可避免,那如何才能做到不管大环境如何变化,自身不受影响呢? 我们先来看一个有意思的故事,如果西游记取经团队需要裁员一名,会裁掉谁呢,为什么? 西游记团队组成: 1.唐僧 作为团队teamleader,有很坚韧的品性和极高的原则性,不达目的不罢休,遇到任何问题,都没有退缩过,又很得上司支持和赏识(直接得到唐太宗的任命,既给袈...
Python语言高频重点汇总
Python语言高频重点汇总 GitHub面试宝典仓库 回到首页 目录: Python语言高频重点汇总 目录: 1. 函数-传参 2. 元类 3. @staticmethod和@classmethod两个装饰器 4. 类属性和实例属性 5. Python的自省 6. 列表、集合、字典推导式 7. Python中单下划线和双下划线 8. 格式化字符串中的%和format 9. 迭代器和生成器 10...
究竟你适不适合买Mac?
我清晰的记得,刚买的macbook pro回到家,开机后第一件事情,就是上了淘宝网,花了500元钱,找了一个上门维修电脑的师傅,上门给我装了一个windows系统。。。。。。 表砍我。。。 当时买mac的初衷,只是想要个固态硬盘的笔记本,用来运行一些复杂的扑克软件。而看了当时所有的SSD笔记本后,最终决定,还是买个好(xiong)看(da)的。 已经有好几个朋友问我mba怎么样了,所以今天尽量客观
程序员一般通过什么途径接私活?
二哥,你好,我想知道一般程序猿都如何接私活,我也想接,能告诉我一些方法吗? 上面是一个读者“烦不烦”问我的一个问题。其实不止是“烦不烦”,还有很多读者问过我类似这样的问题。 我接的私活不算多,挣到的钱也没有多少,加起来不到 20W。说实话,这个数目说出来我是有点心虚的,毕竟太少了,大家轻喷。但我想,恰好配得上“一般程序员”这个称号啊。毕竟苍蝇再小也是肉,我也算是有经验的人了。 唾弃接私活、做外
ES6基础-ES6的扩展
进行对字符串扩展,正则扩展,数值扩展,函数扩展,对象扩展,数组扩展。 开发环境准备: 编辑器(VS Code, Atom,Sublime)或者IDE(Webstorm) 浏览器最新的Chrome 字符串的扩展: 模板字符串,部分新的方法,新的unicode表示和遍历方法: 部分新的字符串方法 padStart,padEnd,repeat,startsWith,endsWith,includes 字...
Python爬虫爬取淘宝,京东商品信息
小编是一个理科生,不善长说一些废话。简单介绍下原理然后直接上代码。 使用的工具(Python+pycharm2019.3+selenium+xpath+chromedriver)其中要使用pycharm也可以私聊我selenium是一个框架可以通过pip下载 pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple/ 
阿里程序员写了一个新手都写不出的低级bug,被骂惨了。
你知道的越多,你不知道的越多 点赞再看,养成习惯 本文 GitHub https://github.com/JavaFamily 已收录,有一线大厂面试点思维导图,也整理了很多我的文档,欢迎Star和完善,大家面试可以参照考点复习,希望我们一起有点东西。 前前言 为啥今天有个前前言呢? 因为你们的丙丙啊,昨天有牌面了哟,直接被微信官方推荐,知乎推荐,也就仅仅是还行吧(心里乐开花)
Java工作4年来应聘要16K最后没要,细节如下。。。
前奏: 今天2B哥和大家分享一位前几天面试的一位应聘者,工作4年26岁,统招本科。 以下就是他的简历和面试情况。 基本情况: 专业技能: 1、&nbsp;熟悉Sping了解SpringMVC、SpringBoot、Mybatis等框架、了解SpringCloud微服务 2、&nbsp;熟悉常用项目管理工具:SVN、GIT、MAVEN、Jenkins 3、&nbsp;熟悉Nginx、tomca
Python爬虫精简步骤1 获取数据
爬虫的工作分为四步: 1.获取数据。爬虫程序会根据我们提供的网址,向服务器发起请求,然后返回数据。 2.解析数据。爬虫程序会把服务器返回的数据解析成我们能读懂的格式。 3.提取数据。爬虫程序再从中提取出我们需要的数据。 4.储存数据。爬虫程序把这些有用的数据保存起来,便于你日后的使用和分析。 这一篇的内容就是:获取数据。 首先,我们将会利用一个强大的库——requests来获取数据。 在电脑上安装
作为一个程序员,CPU的这些硬核知识你必须会!
CPU对每个程序员来说,是个既熟悉又陌生的东西? 如果你只知道CPU是中央处理器的话,那可能对你并没有什么用,那么作为程序员的我们,必须要搞懂的就是CPU这家伙是如何运行的,尤其要搞懂它里面的寄存器是怎么一回事,因为这将让你从底层明白程序的运行机制。 随我一起,来好好认识下CPU这货吧 把CPU掰开来看 对于CPU来说,我们首先就要搞明白它是怎么回事,也就是它的内部构造,当然,CPU那么牛的一个东
破14亿,Python分析我国存在哪些人口危机!
2020年1月17日,国家统计局发布了2019年国民经济报告,报告中指出我国人口突破14亿。 猪哥的朋友圈被14亿人口刷屏,但是很多人并没有看到我国复杂的人口问题:老龄化、男女比例失衡、生育率下降、人口红利下降等。 今天我们就来分析一下我们国家的人口数据吧! 更多有趣分析教程,扫描下方二维码关注vx公号「裸睡的猪」 即可查看! 一、背景 1.人口突破14亿 2020年1月17日,国家统计局发布
web前端javascript+jquery知识点总结
Javascript javascript 在前端网页中占有非常重要的地位,可以用于验证表单,制作特效等功能,它是一种描述语言,也是一种基于对象(Object)和事件驱动并具有安全性的脚本语言 ,语法同java类似,是一种解释性语言,边执行边解释。 JavaScript的组成: ECMAScipt 用于描述: 语法,变量和数据类型,运算符,逻辑控制语句,关键字保留字,对象。 浏览器对象模型(Br
Qt实践录:开篇
本系列文章介绍笔者的Qt实践之路。
在家远程办公效率低?那你一定要收好这个「在家办公」神器!
相信大家都已经收到国务院延长春节假期的消息,接下来,在家远程办公可能将会持续一段时间。 但是问题来了。远程办公不是人在电脑前就当坐班了,相反,对于沟通效率,文件协作,以及信息安全都有着极高的要求。有着非常多的挑战,比如: 1在异地互相不见面的会议上,如何提高沟通效率? 2文件之间的来往反馈如何做到及时性?如何保证信息安全? 3如何规划安排每天工作,以及如何进行成果验收? ......
作为一个程序员,内存和磁盘的这些事情,你不得不知道啊!!!
截止目前,我已经分享了如下几篇文章: 一个程序在计算机中是如何运行的?超级干货!!! 作为一个程序员,CPU的这些硬核知识你必须会! 作为一个程序员,内存的这些硬核知识你必须懂! 这些知识可以说是我们之前都不太重视的基础知识,可能大家在上大学的时候都学习过了,但是嘞,当时由于老师讲解的没那么有趣,又加上这些知识本身就比较枯燥,所以嘞,大家当初几乎等于没学。 再说啦,学习这些,也看不出来有什么用啊!
这个世界上人真的分三六九等,你信吗?
偶然间,在知乎上看到一个问题 一时间,勾起了我深深的回忆。 以前在厂里打过两次工,做过家教,干过辅导班,做过中介。零下几度的晚上,贴过广告,满脸、满手地长冻疮。   再回首那段岁月,虽然苦,但让我学会了坚持和忍耐。让我明白了,在这个世界上,无论环境多么的恶劣,只要心存希望,星星之火,亦可燎原。   下文是原回答,希望能对你能有所启发。   如果我说,这个世界上人真的分三六九等,
为什么听过很多道理,依然过不好这一生?
记录学习笔记是一个重要的习惯,不希望学习过的东西成为过眼云烟。做总结的同时也是一次复盘思考的过程。 本文是根据阅读得到 App上《万维钢·精英日课》部分文章后所做的一点笔记和思考。学习是一个系统的过程,思维模型的建立需要相对完整的学习和思考过程。以下观点是在碎片化阅读后总结的一点心得总结。
B 站上有哪些很好的学习资源?
哇说起B站,在小九眼里就是宝藏般的存在,放年假宅在家时一天刷6、7个小时不在话下,更别提今年的跨年晚会,我简直是跪着看完的!! 最早大家聚在在B站是为了追番,再后来我在上面刷欧美新歌和漂亮小姐姐的舞蹈视频,最近两年我和周围的朋友们已经把B站当作学习教室了,而且学习成本还免费,真是个励志的好平台ヽ(.◕ฺˇд ˇ◕ฺ;)ノ 下面我们就来盘点一下B站上优质的学习资源: 综合类 Oeasy: 综合
雷火神山直播超两亿,Web播放器事件监听是怎么实现的?
Web播放器解决了在手机浏览器和PC浏览器上播放音视频数据的问题,让视音频内容可以不依赖用户安装App,就能进行播放以及在社交平台进行传播。在视频业务大数据平台中,播放数据的统计分析非常重要,所以Web播放器在使用过程中,需要对其内部的数据进行收集并上报至服务端,此时,就需要对发生在其内部的一些播放行为进行事件监听。 那么Web播放器事件监听是怎么实现的呢? 01 监听事件明细表 名
3万字总结,Mysql优化之精髓
本文知识点较多,篇幅较长,请耐心学习 MySQL已经成为时下关系型数据库产品的中坚力量,备受互联网大厂的青睐,出门面试想进BAT,想拿高工资,不会点MySQL优化知识,拿offer的成功率会大大下降。 为什么要优化 系统的吞吐量瓶颈往往出现在数据库的访问速度上 随着应用程序的运行,数据库的中的数据会越来越多,处理时间会相应变慢 数据是存放在磁盘上的,读写速度无法和内存相比 如何优化 设计
一条链接即可让黑客跟踪你的位置! | Seeker工具使用
搬运自:冰崖的部落阁(icecliffsnet) 严正声明:本文仅限于技术讨论,严禁用于其他用途。 请遵守相对应法律规则,禁止用作违法途径,出事后果自负! 上次写的防社工文章里边提到的gps定位信息(如何防止自己被社工或人肉) 除了主动收集他人位置信息以外,我们还可以进行被动收集 (没有技术含量) Seeker作为一款高精度地理位置跟踪工具,同时也是社交工程学(社会工程学)爱好者...
作为程序员的我,大学四年一直自学,全靠这些实用工具和学习网站!
我本人因为高中沉迷于爱情,导致学业荒废,后来高考,毫无疑问进入了一所普普通通的大学,实在惭愧...... 我又是那么好强,现在学历不行,没办法改变的事情了,所以,进入大学开始,我就下定决心,一定要让自己掌握更多的技能,尤其选择了计算机这个行业,一定要多学习技术。 在进入大学学习不久后,我就认清了一个现实:我这个大学的整体教学质量和学习风气,真的一言难尽,懂的人自然知道怎么回事? 怎么办?我该如何更好的提升
前端JS初级面试题二 (。•ˇ‸ˇ•。)老铁们!快来瞧瞧自己都会了么
1. 传统事件绑定和符合W3C标准的事件绑定有什么区别? 传统事件绑定 &lt;div onclick=""&gt;123&lt;/div&gt; div1.onclick = function(){}; &lt;button onmouseover=""&gt;&lt;/button&gt; 注意: 如果给同一个元素绑定了两次或多次相同类型的事件,那么后面的绑定会覆盖前面的绑定 (不支持DOM事...
相关热词 c# 识别回车 c#生成条形码ean13 c#子控制器调用父控制器 c# 写大文件 c# 浏览pdf c#获取桌面图标的句柄 c# list反射 c# 句柄 进程 c# 倒计时 线程 c# 窗体背景色
立即提问