Intermittent problems with Kryo serialisation for crawls resumed from checkpoints

I'm hitting problems when re-using crawl state (checkpoints). I get a lot of errors like:


WARNING: com.google.common.cache.LocalCache processPendingNotifications Exception thrown by removal listener [Tue Mar 19 12:07:00 GMT 2019]
java.lang.IllegalArgumentException: Can not set org.archive.modules.fetcher.FetchStats field org.archive.crawler.frontier.WorkQueue.substats to java.lang.Byte
        at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
        at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
        at sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81)
        at java.lang.reflect.Field.set(Field.java:764)
        at com.esotericsoftware.kryo.serialize.FieldSerializer$CachedField.set(FieldSerializer.java:290)
        at com.esotericsoftware.kryo.serialize.FieldSerializer.readObjectData(FieldSerializer.java:209)
        at com.esotericsoftware.kryo.serialize.FieldSerializer.readObjectData(FieldSerializer.java:178)
        at com.esotericsoftware.kryo.Kryo.readObjectData(Kryo.java:512)
        at com.esotericsoftware.kryo.ObjectBuffer.readObjectData(ObjectBuffer.java:212)
        at org.archive.bdb.KryoBinding.entryToObject(KryoBinding.java:84)
        at com.sleepycat.collections.DataView.makeValue(DataView.java:595)
        at com.sleepycat.collections.DataCursor.getCurrentValue(DataCursor.java:349)
        at com.sleepycat.collections.DataCursor.initForPut(DataCursor.java:813)
        at com.sleepycat.collections.DataCursor.put(DataCursor.java:751)
        at com.sleepycat.collections.StoredContainer.putKeyValue(StoredContainer.java:321)
        at com.sleepycat.collections.StoredMap.put(StoredMap.java:279)
        at org.archive.util.ObjectIdentityBdbManualCache$1.onRemoval(ObjectIdentityBdbManualCache.java:119)
        at com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1954)
        at com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3457)
        at com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3433)
        at com.google.common.cache.LocalCache$Segment.put(LocalCache.java:2888)
        at com.google.common.cache.LocalCache.put(LocalCache.java:4146)
        at org.archive.util.ObjectIdentityBdbManualCache.dirtyKey(ObjectIdentityBdbManualCache.java:374)
        at org.archive.crawler.frontier.WorkQueue.makeDirty(WorkQueue.java:688)
        at org.archive.crawler.frontier.WorkQueueFrontier.processFinish(WorkQueueFrontier.java:1016)
        at org.archive.crawler.frontier.AbstractFrontier.finished(AbstractFrontier.java:569)
        at org.archive.crawler.framework.ToeThread.run(ToeThread.java:187)

One possible cause is that the Kryo serialisers are not getting set up right.

As I understand it, the reflection-based auto-registration magic attempts to register the classes needed, and as I understand the documentation this saves storage space but relies on classes getting registered in a consistent order (so the same classes get the same IDs).

However, this registration appears to happen on the Spring Lifecycle.start() event, e.g. org.archive.modules.net.BdbServerCache.start() or org.archive.crawler.frontier.WorkQueueFrontier.start() and AFAICT nothing is explicitly enforcing the order of these events.

It looks like the latter leads to

https://github.com/internetarchive/heritrix3/blob/05811705ed996122bea1f4e034c1ed5f7a07240f/modules/src/main/java/org/archive/modules/CrawlURI.java#L1808-L1811

(i.e. there we see Byte getting registered) and the former leads to

https://github.com/internetarchive/heritrix3/blob/05811705ed996122bea1f4e034c1ed5f7a07240f/modules/src/main/java/org/archive/modules/net/CrawlServer.java#L319-L321

(i.e. there's FetchStats) which seems suspicious. However, in both cases, the autoregistered class is the second class to get registered, not the first, so it's not clear why this would be the case.

I'm having trouble understanding exactly what goes on with Kryo 1 and thread context and therefore whether the reference IDs are global or ThreadLocal or AutoKyro-instance scoped.

I'm left to assume I must have missed something, otherwise this would never have worked reliably at all!

该提问来源于开源项目：internetarchive/heritrix3

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

8条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
weixin_39810441 2020-11-29 16:45
关注
Hm, well it seems each store gets it's own ObjectIdentityBdbManualCache and each of those has it's own AutoKryo instance so this seems reasonably safe. i.e. the code should be able to reload the checkpoint if the code that registers the classes for that part of the system has not been changed?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Stabilization of oscillating neural networks with time-delay by intermittent control
2021-02-21 13:24

In this paper, we study the exponential stabilization of oscillating neural networks with time-delay through the process called intermittent control. Some exponential stability criteria for the ...
Reliable H∞ control for an uncertain nonlinear discrete-time system with multiple intermittent sensor faults
2021-02-22 04:04

This paper focuses on the passive fault-tolerant control problem for a class of uncertain nonlinear discrete-time systems subject to multiple intermittent faults. The multiple intermittent faults ...
Consensus of first-order multi-agent systems with intermittent interaction
2021-02-09 23:12

This paper studies the consensus problem of multiple agents with continuous-time first-order dynamics, where each agent can only obtain its states relative to its neighbors at sampling instants....
EBRP: An Energy Balance Routing Protocol for Wireless Sensor Network with Intermittent Connectivity
2021-02-10 14:59

在无线传感器网络（Wireless Sensor Network, WSN）中，为了最大化网络寿命，一个重要的研究课题是如何在传感器节点间进行有效且省电的路由。由于监测环境中的事件稀疏性，传统的方法通常无法满足网络寿命最大化的...
Fault Detection for Linear Discrete Time-varying Systems with Intermittent Observations and Quantization Errors
2021-02-10 22:55

本文探讨了线性离散时变系统在多个间歇观测和量化误差下的故障检测问题。这种系统在工业领域中有着广泛的应用，例如在航天器、机器人控制以及无线网络数据传输过程中。由于各种技术原因，如传感器的故障或网络环境的...
Delivery Delay Analysis for Roadside Unit Deployment in Vehicular Ad Hoc Networks With Intermittent Connectivity
2021-02-22 02:26

Delivery Delay Analysis for Roadside Unit Deployment in Vehicular Ad Hoc Networks With Intermittent Connectivity
Fault detection for a class of uncertain linear discrete-time systems with intermittent measurements and probabilistic actuator failures
2021-02-09 22:44

在不确定性线性离散时间系统的故障检测领域，研究者们面临的一个重大挑战是如何处理系统的随机间歇测量、概率性的执行器故障、范数界限模型不确定性和随机模型不确定性等问题。本文所探讨的故障检测（FD）问题正是...
Synchronization of Coupled Networks with Mixed Delays by Intermittent Control
2021-02-21 08:39

由于提供的文件信息是关于一篇研究论文的摘要和部分内容，这篇论文的标题为《Synchronization of Coupled Networks with Mixed Delays by Intermittent Control》，描述和部分内容主要讨论了复杂网络中通过间歇控制...
RSU Deployment Scheme with Power Control for Highway Message Propagation in VANETs
2021-02-22 10:03

network topology, as the intermittent link between the vehicles will degrade the performance of message propagation. Hence, RSUs are deployed to extend vehicle coverage and improve network performance...
Social Contribution-Based Routing Protocol for Vehicular Network with Selfish Nodes
2021-02-09 12:57

Routing in vehicular network is a challenging task due to the characteristic of intermittent connectivity, especially when nodes behave selfishly in the real world. Previous works usually assume that ...
Studying the intermittent stable theorem and the synchronization of a delayed fractional nonlinear system
2021-02-09 09:34

We propose a novel intermittent stable theorem for the delayed fractional system and derive a new synchronization criterion for delayed fractional systems by means of fractional stable theorem and ...
Consensusability of continuous-time multi-agent systems with general linear dynamics and intermittent measurements
2021-02-09 07:33

Some sufficient and necessary conditions for consensusability in the case of state feedback are established, and it is shown that multi-agent systems with periodic sampling are consensusable if and ...
Comparison between continuous and intermittent ozonation for remediation of soils contaminated by polycyclic aromatic hydrocarbons
2020-02-11 11:26

本研究聚焦于多环芳烃（Polycyclic Aromatic Hydrocarbons, PAHs）污染土壤的修复，并详细探讨了连续臭氧化（continuous ozonation）与间歇臭氧化（intermittent ozonation）技术在土壤修复中的应用效果和差异。...
Exponential stability of BAM neural networks with delays via joint periodically intermittent and impulsive control
2020-02-20 06:19

时滞双向联想记忆神经网络的指数稳定性：联合周期间歇反馈脉冲控制，胡建强，梁金玲，本文研究了一类在联合周期间歇反馈脉冲控制策略下的带有离散时滞的双向联想记忆神经网络的指数稳定性问题。...
Exponential synchronization of reaction-diffusion neural networks with time delay in the leakage term based on periodically intermittent control
2021-02-11 15:29

### 指数同步的研究背景及意义近年来，随着信息技术的发展和神经网络理论的应用日益广泛，混沌同步在保密通信、信息处理以及生命科学等领域扮演着越来越重要的角色。混沌同步是指两个或多个混沌系统通过特定方式...
sveltekit-intermittent-error
2021-04-01 16:21

苗条构建一个Svelte项目所需的一切，都由；建立专案如果您看到此消息，则可能已经完成了此步骤。恭喜！ # create a new project in the current directorynpm init svelte@next# create a new project in my-app...
FastIt-Intermittent-Fasting-App
2021-04-12 03:30

9. **版本控制**：从提供的文件名`FastIt-Intermittent-Fasting-App-master`来看，项目很可能使用了Git进行版本控制，这样可以方便团队协作和代码管理。 10. **持续集成/持续部署（CI/CD）**：为了确保代码质量和...
matlabfcm函数代码-Intermittent-Demand:间歇性需求
2021-05-26 07:05

matlab fcm函数代码间歇性需求这些代码需要在我的研究项目中重复使用。不同的脚本执行不同的功能。问题是这些代码没有相互链接。 Matlab代码1. Croston方法（函数，预测）2.灰度预测（函数预测）3....
Intermittent superconductivity and unconventional vortex configurations in nanoscale superconducting noncircular systems
2021-02-07 20:00

### 间歇超导性和纳米尺度非圆形超导系统中的非常规涡旋构型 #### 摘要本文探讨了基于Bogoliubov-de Gennes理论在纳米尺度非圆形超导系统中涡旋态随磁通量变化的演化过程。对于相干长度与费米波长相近的超导体，...
没有解决我的问题, 去提问

Intermittent problems with Kryo serialisation for crawls resumed from checkpoints

8条回答 默认 最新

8条回答默认最新