weixin_39921689
weixin_39921689
2021-01-12 14:03

Titan Infer Schema does not work correctly in Titan/Hadoop.

Do this:

titan.hadoop.output.infer-schema=true

........Then see how it looks like its doing a bulk load but it does SchemaInfer, but then no data. Something is odd.

Titan/Hadoop2 into Cassandra.

该提问来源于开源项目:thinkaurelius/titan

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39621185 weixin_39621185 4月前

    can you spell out the steps to reproduce and exactly what's broken? I'm having trouble understanding the problem or reproducing it. I tried the following on titan05 HEAD and on 0.5.1-hadoop2's zipfile. - Started Cassandra (2.0.9) with no data, then bin/gremlin.sh - gremlin> g = HadoopFactory.open("conf/hadoop/titan-cassandra-output.properties"); g._

    First job counters:

    
          com.thinkaurelius.titan.hadoop.formats.util.SchemaInferencerMapReduce$Counters
                  EDGE_LABELS_CREATED=6
                  PROPERTY_KEYS_CREATED=3
          com.thinkaurelius.titan.hadoop.mapreduce.IdentityMap$Counters
                  IN_EDGE_COUNT=17
                  IN_EDGE_PROPERTY_COUNT=3
                  OUT_EDGE_COUNT=17
                  OUT_EDGE_PROPERTY_COUNT=3
                  VERTEX_COUNT=12
                  VERTEX_PROPERTY_COUNT=24
      

    Second job counters:

    
          com.thinkaurelius.titan.hadoop.formats.util.TitanGraphOutputMapReduce$Counters
                  EDGES_ADDED=17
                  EDGE_PROPERTIES_ADDED=3
                  SUCCESSFUL_TRANSACTIONS=2
                  VERTEX_PROPERTIES_ADDED=24
                  VERTICES_ADDED=12
      
    • gremlin> t = TitanFactory.open('conf/titan-cassandra.properties')
    • gremlin> t.V.map()
    
      ==>{name=alcmene, type=human}
      ==>{name=pluto, type=god}
      ==>{name=hercules, type=demigod}
      ==>{name=nemean, type=monster}
      ==>{name=jupiter, type=god}
      ==>{name=cerberus, type=monster}
      ==>{name=sea, type=location}
      ==>{name=tartarus, type=location}
      ==>{name=hydra, type=monster}
      ==>{name=sky, type=location}
      ==>{name=saturn, type=titan}
      ==>{name=neptune, type=god}
    
      
    • gremlin> t.E
    
      ==>e[31r0g-sg-4r9-2dc][1024-pet->3072]
      ==>e[31qo0-sg-6c5-e8][1024-brother->512]
      ==>e[31qqo-sg-6c5-lc][1024-brother->768]
      ==>e[31qww-sg-cnp-1ds][1024-lives->1792]
      ==>e[31qps-1kw-1lh-e8][2048-father->512]
      ==>e[31qyo-1kw-7x1-1z4][2048-battled->2560]
      ==>e[31qzk-1kw-7x1-268][2048-battled->2816]
      ==>e[31r1c-1kw-7x1-2dc][2048-battled->3072]
      ==>e[31qxs-1kw-9hx-1s0][2048-mother->2304]
      ==>e[31qn4-e8-1lh-74][512-father->256]
      ==>e[31qrk-e8-6c5-lc][512-brother->768]
      ==>e[31qsg-e8-6c5-sg][512-brother->1024]
      ==>e[31qu8-e8-cnp-zk][512-lives->1280]
      ==>e[31qw0-2dc-cnp-1ds][3072-lives->1792]
      ==>e[31qow-lc-6c5-e8][768-brother->512]
      ==>e[31qtc-lc-6c5-sg][768-brother->1024]
      ==>e[31qv4-lc-cnp-16o][768-lives->1536]
    
      
    点赞 评论 复制链接分享
  • weixin_39921689 weixin_39921689 4月前

    Hm. On the client's Hadoop2 cluster, once the SchemaInferencing happens, it throws some exception about not finding an enum. Then you have to update your properties file to NOT do schema inferencing and run it again to get the bulk load ... ?? If you don't see it, then its probably the client's cluster. They were having other problems with it so I believe they might have installed it incorrectly. Please close if what I'm saying is unreproducible.

    点赞 评论 复制链接分享
  • weixin_39621185 weixin_39621185 4月前

    What was the client config's input format? The only enum-related issue on Hadoop that I've seen recently was an outdated setting in conf/hadoop/rdf-input.properties. It contained this line:

    
    titan.hadoop.input.conf.format=n-triples
    

    But should have contained this line instead:

    
    titan.hadoop.input.conf.format=N_TRIPLES
    

    (fixed in 502528c9843c2fa31ce1ed142dd3f4e2a4cfe58a)

    Kind of a shot in the dark, but it's the only thing coming to mind.

    点赞 评论 复制链接分享
  • weixin_39921689 weixin_39921689 4月前

    It was GraphSON.

    On Nov 20, 2014, at 7:36 AM, Dan LaRocque notifications.com wrote:

    What was the client config's input format? The only enum-related issue on Hadoop that I've seen recently was an outdated setting in conf/hadoop/rdf-input.properties. It contained this line:

    titan.hadoop.input.conf.format=n-triples But should have contained this line instead:

    titan.hadoop.input.conf.format=N_TRIPLES (fixed in 502528c)

    Kind of a shot in the dark, but it's the only thing coming to mind.

    — Reply to this email directly or view it on GitHub.

    点赞 评论 复制链接分享
  • weixin_39621185 weixin_39621185 4月前

    That's what I used. I don't know to reproduce this. Will gladly take another look if we get more info.

    点赞 评论 复制链接分享

相关推荐