如何使用Spark将GPS点切成trip 10C

您好 我需要使用spark完成时间序列的聚合,具体是需要将已经按照车辆id和时间排序后的小汽车gps数据的DataFrame按照 【col_sev】 列为1的划分为单独trip,处理前的数据是这样的

    val taxiRaw = spark.sparkContext.textFile("E:/demo_20180821.dat")
    import spark.implicits._
    val safeParse = Parse.safe(Parse.parseRecords)
    val taxiParsed = taxiRaw.map(safeParse)
    val taxiGood = taxiParsed.map(_.left.get).toDS
..... 各种数据清洗
    val taxiClean = ....toDF()
    demo.show(100)
+-------+-------------------+---------+----------+-------+
|col_car|          col_dttim|  col_lat|   col_lon|col_sev|
+-------+-------------------+---------+----------+-------+
|    ID1|2018-08-21 09:29:57| 39.88922|   116.347|      0|
|    ID1|2018-08-21 09:30:59| 39.88968|116.346998|      0|
|    ID1|2018-08-21 09:31:30| 39.89037|116.346978|      1|
|    ID1|2018-08-21 09:31:39|39.890758|116.346947|      1|
|    ID1|2018-08-21 09:33:05|39.895908| 116.34676|      1|
|    ID1|2018-08-21 09:33:45|39.896063|116.346745|      1|
|    ID1|2018-08-21 09:34:05| 39.89609|116.346735|      1|
|    ID1|2018-08-21 09:34:31|39.896453| 116.34672|      1|
|    ID1|2018-08-21 09:35:41|39.897587|116.346692|      1|
|    ID1|2018-08-21 09:37:15|39.898068|116.346638|      1|
|    ID1|2018-08-21 09:37:35|39.898123|116.346603|      1|
|    ID1|2018-08-21 09:38:35|39.898462|116.346615|      1|
|    ID1|2018-08-21 09:38:56|39.898408|116.346615|      1|
|    ID1|2018-08-21 09:39:05|39.898382|116.346622|      1|
|    ID1|2018-08-21 09:39:48|39.898593| 116.34664|      1|
|    ID1|2018-08-21 09:40:18|39.899062|116.346658|      1|
|    ID1|2018-08-21 09:40:28|39.899055|116.346662|      1|
|    ID1|2018-08-21 09:40:48|39.899097|116.346635|      1|
|    ID1|2018-08-21 09:41:20|39.899847|116.346443|      1|
|    ID1|2018-08-21 09:44:40| 39.89988|116.345462|      1|
|    ID1|2018-08-21 09:49:02|39.901228|116.343818|      1|
|    ID1|2018-08-21 09:52:07| 39.90414|116.337148|      1|
|    ID1|2018-08-21 09:52:59|39.905652|116.337548|      1|
|    ID1|2018-08-21 09:56:58| 39.91248|116.339273|      1|
|    ID1|2018-08-21 09:58:11|39.912655|116.342495|      0|
|    ID1|2018-08-21 09:58:38|39.912698|116.343038|      0|
|    ID1|2018-08-21 09:59:12|39.914267|116.343198|      0|
|    ID1|2018-08-21 10:00:40|39.917063|116.342582|      0|
|    ID1|2018-08-21 10:01:12| 39.91744|116.341958|      0|

后面还有很多车和不同的服务状态

需要整理成

|col_car|tripID|          Starttime|start_lat| start_lon|            Endtime|  end_lat|   end_lon|
+-------+------|-------------------+---------+----------+-------------------+---------+----------+
|    ID1|     2|2018-08-21 09:31:30| 39.89037|116.346978|2018-08-21 09:56:58| 39.91248|116.339273|

spark好像没有这种类似的算子提供不定长合并,groupby因为没有key也无法完成

2个回答

顺着你原先的思路往下做:demo 这个DataFrame 你已经处理好了。
1:val demoStart = demo(按需要的列分组,取col-dttim最小值).selectExpr(这里根据需要给列重命名) 这组是结果数据的Start部分
2:val dempEnd = demo(按需要的列分组,取col-dttim最大值).selectExpr(这里根据需要给列重命名) 这组是结果数据的End部分
3:val resultDF = demoStart.join(dempEnd, Seq(Join的列))

demo(按需要的列分组,取col-dttim最小值)、demo(按需要的列分组,取col-dttim最大值)
这部分操作有2种方式,一种是通过groupBy(分组).agg(min(col-dttim))、groupBy(分组).agg(max(col-dttim))
另一种是通过over Partition开窗函数

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
将图片切成圆形
public static Bitmap getcircleAvatar (Context context ,Bitmap avatr){ Bitmap bitmap =Bitmap . createBitmap( avatr .getWidth () , avatr .getHeight () , Bitmap .Config . ARGB_8888) ; Can
opencv将视频切成图片
  直接code #include <iostream> #include "cv.h" #include "opencv2/opencv.hpp" using namespace std; using namespace cv; void main() { string VideoPath; VideoCapture cap("../1.mp4"); doubl...
The Trip
A number of students are members of a club that travels annually to exotic locations. Their destinations in the past have included Indianapolis, Phoenix, Nashville, Philadelphia, San Jose, and Atlanta. This spring they are planning a trip to Eindhoven. nnThe group agrees in advance to share expenses equally, but it is not practical to have them share every expense as it occurs. So individuals in the group pay for particular things, like meals, hotels, taxi rides, plane tickets, etc. After the trip, each student's expenses are tallied and money is exchanged so that the net cost to each is the same, to within one cent. In the past, this money exchange has been tedious and time consuming. Your job is to compute, from a list of expenses, the minimum amount of money that must change hands in order to equalize (within a cent) all the students' costs.nnInputnnStandard input will contain the information for several trips. The information for each trip consists of a line containing a positive integer, n, the number of students on the trip, followed by n lines of input, each containing the amount, in dollars and cents, spent by a student. There are no more than 1000 students and no student spent more than $10,000.00. A single line containing 0 follows the information for the last trip.nnnOutputnnFor each trip, output a line stating the total amount of money, in dollars and cents, that must be exchanged to equalize the students' costs.nnnSample Input nn3n10.00n20.00n30.00n4n15.00n15.01n3.00n3.01n0nnSample Outputnn$10.00n$11.99
Trip
DescriptionnnAlice and Bob want to go on holiday. Each of them has planned a route, which is a list of cities to be visited in a given order. A route may contain a city more than once. nAs they want to travel together, they have to agree on a common route. None wants to change the order of the cities on his or her route or add other cities. Therefore they have no choice but to remove some cities from the route. Of course the common route should be as long as possible. nThere are exactly 26 cities in the region. Therefore they are encoded on the lists as lower case letters from 'a' to 'z'.nInputnnThe input consists of two lines; the first line is Alice's list, the second line is Bob's list. nEach list consists of 1 to 80 lower case letters with no spaces inbetween.nOutputnnThe output should contain all routes that meet the conditions described above, but no route should be listed more than once. Each route should be printed on a separate line. There is at least one such non-empty route, but never more than 1000 different ones. Output them in ascending order.nSample InputnnabcabcaanacbacbanSample Outputnnababanabacanabcbanacabanacacanacbaanacbca
使用spark将数据写入Hbase
--------------组装xml并捕获异常------------------- package wondersgroup_0628.com import java.io.{IOException, PrintWriter, StringReader, StringWriter} import java.util.Base64 import com.wonders.TXmltmp im...
【iOS开发】将视频录像切成一张张缩略图
/** * 获取网络视频的全部缩略图方法 * * @param videoURL 视频的链接地址 * * @return 视频截图 */ + (UIImage *)ihefe_previewImageWithVideoURL:(NSURL *)videoURL { AVAsset *asset = [AVAsset assetWithURL:videoURL]; AV
cocos-js 使用clipingnode 将正方形图片裁切成圆形图片
var stencil = new cc.Sprite("res/mask.png"); // 遮罩模板 -- 就是你想把图片变成的形状 var clipnode = new cc.ClippingNode(); clipnode.attr({ stencil: stencil // 将模板设置给clippingnode }); this.addChild(clipnode);
将png 切成ios android能用的格式
由于要求不高找了个在线切图的工具 http://images.my-addr.com/resize_png_online_tool-free_png_resizer_for_web.php [img]http://dl2.iteye.com/upload/attachment/0117/0245/2c2eec75-fd0f-3b73-8c06-1c58403e0960.jpg[/img] ...
将.net代码切成HTML源码
将.net代码切成HTML源码的方法?请多多指教!
将PSD切成HTML,尽量DIV+CSS+JQuery
rn rnrn rn rnrnrn把上述的PSD切成HTML,尽量DIV+CSS+JQuery。rn思路是什么,急求!
上报GPS点要求
GPS点要求及具体实施办法上报GPS点要求: 各单位在使用GPS测量时,采用“航点”连“航线”求面积并上报“航点”或采用“航迹”求面积并上报“航迹点”,以保证上报的GPS点求出的面积与测量面积一致,以免造成误差。
有关GPS点的查找
有什么办法我能从一堆GPS点(大概也就10000个点左右)中找出与我给的一个gps点最相近的那一个点?????????rn跪求各位神人,非常感谢!!!
GPS记录GPS点的概况的记录表
GPS记录GPS点的概况的记录表,很不错的表格。
unity3d将一张图片切成多张图片
unity3d将一张图片切成多张图片
php将一个数值切成N份
<?php /** * 将一个数值切成N份 * @param int $number 切的数值 * @param int $avgNumber 份数 * @return array */ function numberAvg($number, $avgNumber) { i
利用Shell将icon切成打包尺寸
拿到一个1024*1024的图片,如何可以切成Xcode打包需要的尺寸?1.利用工具,推荐一个工具 叫AppIcon,这个将png图片拖入AppIcon中直接可以转换。App Store中有下载,不过收费 https://itunes.apple.com/cn/app/appicon/id552721621?mt=122.第二个工具 App Icon Gear,这个工具也可以剪切,这个工具切换的比...
安卓 java方法将一张图片切成圆形
private void roundBitmap(){ //如果是圆的时候,我们应该把bitmap图片进行剪切成正方形, 然后再设置圆角半径为正方形边长的一半即可 Bitmap image = BitmapFactory.decodeResource(getResources(), R.drawable.me); Bitmap bitmap = null; //将长
将一个bitmap剪切成圆形的bitmap
/** * 实现圆形头像的处理 *  * @param bitmap * @return */ public static Bitmap formatBitmap(Bitmap bitmap) { // 图片的宽度 int width = bitmap.getWidth(); // 图片的高度 int height = bitmap.getHeight(); int
如何使用spark
为什么spark程序提交运行一次就结束了??? 每执行一次计算就要编写一次程序么??还是编写一次脚本不断地去submit应用程序?? 拿web应用来说,我写一个servlet,然后就可以丢到tomcat里面运行,浏览器每提交一次请求,我的servlet代码就执行一次, spark没有类似这样的功能么????????? 求大神指点一二
Spark开发代码优化点
案例1: 大量终端号码去重统计 采用方式 val arr = Array((1,Set(1,2)),(2,Set(2,3)),(2,Set(3,4))) sc.parallelize(arr) .reduceByKey(_++_) .foreach(println(_)) 将终端号放入set中,然后分段去重统计 测试:对比全局变量   sc.collec...
Spark 累加器注意点
注意点:存在多个action算子的时候,accumulator可能存在重复计算的情况 验证: 第一步先获取一个accumulator //accumulator LongAccumulator accum = sc.sc().longAccumulator(); 此时进行累加求和操作: JavaRDD&amp;lt;Integer&amp;gt; map = sc.parallelize(Array...
linux trip
去掉linux文件中的\r\n的\r,支持多个文件!
Ant Trip
Problem DescriptionnAnt Country consist of N towns.There are M roads connecting the towns.nnAnt Tony,together with his friends,wants to go through every part of the country. nnThey intend to visit every road , and every road must be visited for exact one time.However,it may be a mission impossible for only one group of people.So they are trying to divide all the people into several groups,and each may start at different town.Now tony wants to know what is the least groups of ants that needs to form to achieve their goal.nn nnInputnInput contains multiple cases.Test cases are separated by several blank lines. Each test case starts with two integer N(1<=N<=100000),M(0<=M<=200000),indicating that there are N towns and M roads in Ant Country.Followed by M lines,each line contains two integers a,b,(1<=a,b<=N) indicating that there is a road connecting town a and town b.No two roads will be the same,and there is no road connecting the same town.n nnOutputnFor each test case ,output the least groups that needs to form to achieve their goal.n nnSample Inputn3 3n1 2n2 3n1 3nn4 2n1 2n3 4n nnSample Outputn1n2n
Hiking Trip
Problem DescriptionnHiking in the mountains is seldom an easy task for most people, as it is extremely easy to get lost during the trip. Recently Green has decided to go on a hiking trip. Unfortunately, half way through the trip, he gets extremely tired and so needs to find the path that will bring him to the destination with the least amount of time. Can you help him?nYou've obtained the area Green's in as an R * C map. Each grid in the map can be one of the four types: tree, sand, path, and stone. All grids not containing stone are passable, and each time, when Green enters a grid of type X (where X can be tree, sand or path), he will spend time T(X). Furthermore, each time Green can only move up, down, left, or right, provided that the adjacent grid in that direction exists.nGiven Green's current position and his destination, please determine the best path for him. n nnInputnThere are multiple test cases in the input file. Each test case starts with two integers R, C (2 <= R <= 20, 2 <= C <= 20), the number of rows / columns describing the area. The next line contains three integers, VP, VS, VT (1 <= VP <= 100, 1 <= VS <= 100, 1 <= VT <= 100), denoting the amount of time it requires to walk through the three types of area (path, sand, or tree). The following R lines describe the area. Each of the R lines contains exactly C characters, each character being one of the following: ‘T’, ‘.’, ‘#’, ‘@’, corresponding to grids of type tree, sand, path and stone. The final line contains four integers, SR, SC, TR, TC, (0 <= SR < R, 0 <= SC < C, 0 <= TR < R, 0 <= TC < C), representing your current position and your destination. It is guaranteed that Green's current position is reachable – that is to say, it won't be a '@' square.nThere is a blank line after each test case. Input ends with End-of-File.n nnOutputnFor each test case, output one integer on one separate line, representing the minimum amount of time needed to complete the trip. If there is no way for Green to reach the destination, output -1 instead.n nnSample Inputn4 6n1 2 10nT...TTnTTT###nTT.@#Tn..###@n0 1 3 0nn4 6n1 2 2nT...TTnTTT###nTT.@#Tn..###@n0 1 3 0nn2 2n5 1 3nT@n@.n0 0 1 1n nnSample OutputnCase 1: 14nCase 2: 8nCase 3: -1
exas Trip
DescriptionnAfter a day trip with his friend Dick, Harry noticed a strange pattern of tiny holes in the door of his SUV. The local American Tire store sells fiberglass patching material only in square sheets. What is the smallest patch that Harry needs to fix his door?nAssume that the holes are points on the integer lattice in the plane. Your job is to find the area of the smallest square that will cover all the holes.nInputnThe first line of input contains a single integer T expressed in decimal with no leading zeroes, denoting the number of test cases to follow. The subsequent lines of input describe the test cases.nEach test case begins with a single line, containing a single integer n expressed in decimal with no leading zeroes, the number of points to follow; each of the following n lines contains two integers x and y, both expressed in decimal with no leading zeroes, giving the coordinates of one of your points.nYou are guaranteed that T ≤ 30 and that no data set contains more than 30 points. All points in each data set will be no more than 500 units away from (0,0).nOutputnPrint, on a single line with two decimal places of precision, the area of the smallest square containing all of your points.nSample Inputn2 4 -1 -1 1 -1 1 1 -1 1 4 10 1 10 -1 -10 1 -10 -1nSample Outputn4.00 242.00
an exciting trip
NULL 博文链接:https://ioio.iteye.com/blog/353294
旅行 The Trip
Problem A: The TriprnA number of students are members of a club that travels annually to exotic locations. Their destinations in the past have included Indianapolis, Phoenix, Nashville, Philadelphia, San Jose, and Atlanta. This spring they are planning a trip to Eindhoven.rnrnThe group agrees in advance to share expenses equally, but it is not practical to have them share every expense as it occurs. So individuals in the group pay for particular things, like meals, hotels, taxi rides, plane tickets, etc. After the trip, each student's expenses are tallied and money is exchanged so that the net cost to each is the same, to within one cent. In the past, this money exchange has been tedious and time consuming. Your job is to compute, from a list of expenses, the minimum amount of money that must change hands in order to equalize (within a cent) all the students' costs.rnThe Inputrn输入将包含若干组旅行的数据。每一组数据的第一行为一个正整数n,代表这次旅行中学生人数。接下来的n行每一行包含了一个学生的支出,精确到分。学生人数不超过1000,并且每个学生的支出不超过$100000。在最后一组数据结尾还有单独的一行,包含一个0.rnThe Outputrn对于每一组数据,只输出一行,即让每个学生平摊支出所需的最小总“交易”金额,精确到分。rn3rn10.00rn20.00rn30.00rn4rn15.00rn15.01rn3.00rn3.01rn0rnrnOutput for Sample Inputrnrn$10.00rn$11.99rnrn这一题我做了好久都做不对。发上来一起研究研究
Texas Trip
DescriptionnnAfter a day trip with his friend Dick, Harry noticed a strange pattern of tiny holes in the door of his SUV. The local American Tire store sells fiberglass patching material only in square sheets. What is the smallest patch that Harry needs to fix his door?nnAssume that the holes are points on the integer lattice in the plane. Your job is to find the area of the smallest square that will cover all the holes.nnInputnnThe first line of input contains a single integer T expressed in decimal with no leading zeroes, denoting the number of test cases to follow. The subsequent lines of input describe the test cases.nnEach test case begins with a single line, containing a single integer n expressed in decimal with no leading zeroes, the number of points to follow; each of the following n lines contains two integers x and y, both expressed in decimal with no leading zeroes, giving the coordinates of one of your points.nnYou are guaranteed that T ≤ 30 and that no data set contains more than 30 points. All points in each data set will be no more than 500 units away from (0,0).nnOutputnnPrint, on a single line with two decimal places of precision, the area of the smallest square containing all of your points.nnSample Inputnn2n4n-1 -1n1 -1n1 1n-1 1n4n10 1n10 -1n-10 1n-10 -1nSample Outputnn4.00n242.00
trip中文分词库
trip中文分词库分词
如何使用OpenNETCF.dll中的GPS?
OpenNETCF系列是微软MVP的贡献.rn在 OpenNETCF.dll中集成了GPS的类.所以想直接使用它,来缩短项目的开发周期.rn谁有关于OpenNETCF.dll中GPS的运用?
Sightseeing trip
DescriptionnnThere is a travel agency in Adelton town on Zanzibar island. It has decided to offer its clients, besides many other attractions, sightseeing the town. To earn as much as possible from this attraction, the agency has accepted a shrewd decision: it is necessary to find the shortest route which begins and ends at the same place. Your task is to write a program which finds such a route. nnIn the town there are N crossing points numbered from 1 to N and M two-way roads numbered from 1 to M. Two crossing points can be connected by multiple roads, but no road connects a crossing point with itself. Each sightseeing route is a sequence of road numbers y_1, ..., y_k, k>2. The road y_i (1<=i<=k-1) connects crossing points x_i and x_i+1, the road y_k connects crossing points x_k and x_1. All the numbers x_1,...,x_k should be different.The length of the sightseeing route is the sum of the lengths of all roads on the sightseeing route, i.e. L(y_1)+L(y_2)+...+L(y_k) where L(y_i) is the length of the road y_i (1<=i<=k). Your program has to find such a sightseeing route, the length of which is minimal, or to specify that it is not possible,because there is no sightseeing route in the town.nInputnnThe first line of input contains two positive integers: the number of crossing points N<=100 and the number of roads M<=10000. Each of the next M lines describes one road. It contains 3 positive integers: the number of its first crossing point, the number of the second one, and the length of the road (a positive integer less than 500).nOutputnnThere is only one line in output. It contains either a string 'No solution.' in case there isn't any sightseeing route, or it contains the numbers of all crossing points on the shortest sightseeing route in the order how to pass them (i.e. the numbers x_1 to x_k from our definition of a sightseeing route), separated by single spaces. If there are multiple sightseeing routes of the minimal length, you can output any one of them.nSample Inputnn5 7n1 4 1n1 3 300n3 1 10n1 2 16n2 3 100n2 5 15n5 3 20nSample Outputnn1 3 5 2
将GPS点导入ArcGIS并转换为shp图层文件
1.将GPS存为以下格式 2.添加数据 3.显示XY数据 x为经度,y未纬度,坐标系选84坐标系 4.转为shp文件 改名字,文件后缀是.shp, 选是
利用java将一组出租车的GPS数据描点连线
1,2008-02-02 15:36:08,116.51172,39.92123n1,2008-02-02 15:46:08,116.51135,39.93883n1,2008-02-02 15:46:08,116.51135,39.93883n1,2008-02-02 15:56:08,116.51627,39.91034n1,2008-02-02 16:06:08,116.47186,39.91248n1,2008-02-02 16:16:08,116.47217,39.92498n1,2008-02-02 16:26:08,116.47179,39.90718n1,2008-02-02 16:36:08,116.45617,39.90531n1,2008-02-02 17:00:24,116.47191,39.90577n1,2008-02-02 17:10:24,116.50661,39.9145n数据结构如上,现在问题是经纬度数据精确度很高,也很接近,假如直接用Graphics2D中的drawLine连线,由于数据太过相近而成一个点。请问有什么解决方法吗?比如处理数据或者其他方法
已知一堆GPS散点,如何求出GPS轮廓并得到这些点的坐标
如题,已知散点求坐标,或者已知一些xy坐标点,求出轮廓坐标。rn使用什么算法,或者matlab的方法都可以
GPS星下点轨迹绘制
MATLAB语言,GPS星下点轨迹绘制,可通过修改轨道六根数的大小以及循环次数可以实现其他卫星导航系统的星下点轨迹绘制
谷歌地图GPS制点专用
1, sites,cells,nei表中黄色表头为必填项。但是其列位置是固定的。 2, sites,cells,nei表中蓝色表头为可选填项。但是其列位置是固定的。 3, sites,cells,nei表中灰色表头为程序填写项。但是其列位置是固定的。 A、先GOOGLEMAP-制点专用使用说明文档修改保存路径。 B、再将必填项填好。 C、在“action”点击“打开工具窗口”=》“Create Sector”即生成 “台州GOOGLE_打点.kml” D、再用“GB2UTF8.exe”处理一下“台州GOOGLE_打点.kml”即可正常显示中文啦。
GPS导航仪点校正实例
关于GPS,设计的天宝GPS_RTK\天宝GPS_RTK_TSC软件使用说明书+培训教程+点校正实例.pdf )
3点GPS定位MATLAB仿真
自己写的3点确定目标点的MATLAB仿真文件。 包含MATLAB画圆的程序
gps与mapgis转点软件
当前手持GPS广泛应用于各项野外工作中,本软件可读取常用的GARMIN(高明)系列和麦哲伦系列手持GPS数据,并将实地定点获得的航迹数据导入到MAPGIS中进行数据处理,所有航点自动添加点号、坐标、定位时间、航程、航向、航速等属性数据。能高效进行航迹监控,达到杜绝造假,提高工作质量之功效,是地质矿产勘查监理的好帮手。
Spark ---Spark 的基本使用
Spark 的基本使用  1、执行第一个 Spark 程序  利用 Spark 自带的例子程序执行一个求 PI(蒙特卡洛算法)的程序:  $SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://hadoop02:7077 \ --executor-memor...
相关热词 c# login 居中 c# 考试软件 c# 自然语言分析 c# 分段读取文件 c# 泛型反射 c#打断点 c# 时间转多少秒 c# 线程函数加参数 c# modbus 读取 c#查询集合表