问题遇到的现象和发生背景
尝试用spark访问数据库时,出现 java.sql.SQLException
问题相关代码,请勿粘贴截图
sql语句如下
create table person (first varchar(30) DEFAULT NULL, last varchar(30) DEFAULT NULL, gender char(1) DEFAULT NULL, age int(4) DEFAULT NULL);
insert into person values('Barack','Obama','M',54);
insert into person values('Hillary','Clinton','f',34);
java代码如下
public void test3() {
SparkSession spark =
SparkSession.builder().appName("JavaALS").master("local[1]").getOrCreate();
SQLContext context = spark.sqlContext();
Map<String, String> options = new HashMap<>();
options.put("url", "jdbc:mariadb://localhost:3306/test");
options.put("user", "root");
options.put("password", "lyhao1999");
options.put("dbtable", "(select * from person where gender = 'M') as someone");
Dataset<Row> jdbcDF = context.read().format("jdbc").options(options).load();
jdbcDF.show();
}
运行结果及报错内容
Caused by: java.sql.SQLException: Out of range value for column 'age' : value age
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalLong(TextRowProtocol.java:348)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalInt(TextRowProtocol.java:254)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getInt(SelectResultSet.java:985)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$7(JdbcUtils.scala:431)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$7$adapted(JdbcUtils.scala:430)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:367)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:349)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我的解答思路和尝试过的方法
当表为空时程序正常运行
debug过程中我发现
java.lang.NumberFormatException: For input string: "age"
于是我尝试把字段暂时改为varchar
2022-02-09 14:35:16.763 INFO 1704 --- [ main] o.a.s.s.c.e.codegen.CodeGenerator : Code generated in 14.7514 ms
+-----+----+------+---+
|first|last|gender|age|
+-----+----+------+---+
|first|last|gender|age|
+-----+----+------+---+
2022-02-09 14:35:16.790 INFO 1704 --- [shutdown-hook-0] o.apache.spark.storage.DiskBlockManager : Shutdown hook called
这样虽然跑通了代码但查询结果变成了字段名
我想要达到的结果
spark访问数据库