weixin_39767322
weixin_39767322
2020-12-30 14:55

Can't construct GIAB SV graph with alt paths

Using -a when constructing a graph with this VCF crashes:


wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
gzip -d human_g1k_v37.fasta.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz.tbi
vg construct -r human_g1k_v37.fasta -v HG002_SVs_Tier1_v0.6.vcf.gz -R 22 -f -a > GIAB_chr22.vg

terminate called after throwing an instance of 'std::runtime_error'
  what():  No node 726636 in graph

It runs through when not using -a. (with suspicious though seemingly unrelated warning for a lower-case alt.)

Stack:


Crash report for vg v1.15.0-209-g46c72be67 "Tufo"
Stack trace (most recent call last):
#19   Object "", at 0xffffffffffffffff, in 
#18   Object "/home/hickey/dev/vg/bin/vg", at 0x55599872d699, in _start
#17   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7ffbdb89cb96, in __libc_start_main
      Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7ffbdb89cb96]
#16   Object "/home/hickey/dev/vg/bin/vg", at 0x555998664f4a, in main
      Source "src/main.cpp", line 76, in main [0x555998664f4a]
#15   Object "/home/hickey/dev/vg/bin/vg", at 0x555998bf7507, in vg::subcommand::Subcommand::operator()(int, char**) const
    | Source "src/subcommand/subcommand.cpp", line 72, in operator()
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in operator() [0x555998bf7507]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_argtypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#14   Object "/home/hickey/dev/vg/bin/vg", at 0x555998c2d0cd, in main_construct(int, char**)
      Source "src/subcommand/construct_main.cpp", line 347, in main_construct [0x555998c2d0cd]
#13   Object "/home/hickey/dev/vg/bin/vg", at 0x555998ef2dcc, in vg::Constructor::construct_graph(std::vector<fastareference std::allocator> > const&, std::vector<:variantcallfile std::allocator> > const&, std::vector<fastareference std::allocator> > const&, std::function<void>)
      Source "src/constructor.cpp", line 2047, in construct_graph [0x555998ef2dcc]
#12   Object "/home/hickey/dev/vg/bin/vg", at 0x555998ef1b6d, in vg::Constructor::construct_graph(std::__cxx11::basic_string<char std::char_traits>, std::allocator<char> >, FastaReference&, vg::VcfBuffer&, std::vector<fastareference std::allocator> > const&, std::function<void>)
      Source "src/constructor.cpp", line 1900, in construct_graph [0x555998ef1b6d]
#11   Object "/home/hickey/dev/vg/bin/vg", at 0x555998ef07a7, in vg::Constructor::construct_graph(std::__cxx11::basic_string<char std::char_traits>, std::allocator<char> >, FastaReference&, vg::VcfBuffer&, std::vector<fastareference std::allocator> > const&, std::function<void>)::{lambda(vg::ConstructedChunk&)#2}::operator()(vg::ConstructedChunk&) const
    | Source "src/constructor.cpp", line 1724, in operator()
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in operator() [0x555998ef07a7]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_argtypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#10   Object "/home/hickey/dev/vg/bin/vg", at 0x555998c2ea0f, in std::_Function_handler<void main_construct char>::_M_invoke(std::_Any_data const&, vg::Graph&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_argtypes>(__args)...);
    |   318:       }
      Source "src/subcommand/construct_main.cpp", line 237, in _M_invoke [0x555998c2ea0f]
#9    Object "/home/hickey/dev/vg/bin/vg", at 0x555998de1855, in vg::VG::is_valid(bool, bool, bool, bool)
      Source "src/vg.cpp", line 6045, in is_valid [0x555998de1855]
#8    Object "/home/hickey/dev/vg/bin/vg", at 0x555998ca1616, in vg::Paths::for_each(std::function<void const> const&)
    | Source "src/path.cpp", line 153, in operator()
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in __cxa_end_catch [0x555998ca1616]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_argtypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#7    Object "/home/hickey/dev/vg/bin/vg", at 0x555998ddda0e, in std::_Function_handler<void const vg::vg::is_valid bool>::_M_invoke(std::_Any_data const&, vg::Path const&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_argtypes>(__args)...);
    |   318:       }
      Source "src/vg.cpp", line 5991, in _M_invoke [0x555998ddda0e]
#6    Object "/home/hickey/dev/vg/bin/vg", at 0x555998ddd29c, in vg::VG::get_node(long)
      Source "src/vg.cpp", line 4168, in get_node [0x555998ddd29c]
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7ffbdc4e3d23, in __cxa_throw
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7ffbdc4e3af0, in std::terminate()
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7ffbdc4e3ab5, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25", at 0x7ffbdc4dd956, in 
#1    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7ffbdb8bb800, in abort
      Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7ffbdb8bb800]
#0    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7ffbdb8b9e97, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7ffbdb8b9e97]
</_argtypes></void></_argtypes></void></_argtypes></void></_argtypes></void></fastareference></char></char></void></fastareference></char></char></void></fastareference></:variantcallfile></fastareference></_argtypes>

该提问来源于开源项目:vgteam/vg

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

13条回答

  • weixin_39983383 weixin_39983383 4月前

    My first guess is that's a vcflib error, probably in the canonicalize function.

    This used to work, but I had to make a lot of tweaks to get the GIAB SV graph made. Stripping the sequences and placing tags in the ALT fields helped.

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    I'm not too sure I'm following. Are you suggesting stripping the explicit SVs out of the VCF and replacing them symbolic alleles? This is the opposite of what I've normally had to do to build SV graphs in the past.

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    Note, I'm not using -S here, so I'm pretty sure canonicalize() doesn't get called.

    点赞 评论 复制链接分享
  • weixin_39687881 weixin_39687881 4月前

    OK, the underlying problem here is that we are trying to validate the chunk as a complete graph, but it isn't a complete graph. In particular, the last node has been held back to potentially have more reference sequence added to it. However, that last node also happens to participate in an alt path, so we're left with path mappings that reference a node that doesn't appear until the next chunk. This leaves the chunk invalid on its own, and the validator actually crashes instead of handling this case.

    This might be a bit of a problem, since we run the chunk through a VG object, and while it is fine with edges to nonexistent nodes I don't think the VG object wants to contain path mappings that belong to nodes not in the graph.

    If the alt paths visit the first and last nodes of chunks, they may change in meaning as we splice those nodes' sequences. But I don't think that the semantics of those anchoring non-variable reference nodes are really important, so we can continue to do it.

    I think the easiest solution here is removing the VG-object-based re-chunking (since we upped the maximum chunk size a long time ago to the point where we really shouldn't hit it here), and also moving the validation of individual constructed graphs to happen before the start/end node surgery.

    点赞 评论 复制链接分享
  • weixin_39687881 weixin_39687881 4月前

    Actually, if we never create a VG object, we don't have the validation code available to run. So I guess I will cut it altogether.

    点赞 评论 复制链接分享
  • weixin_39687881 weixin_39687881 4月前

    Also, it looks like giant message read support got lost in the libvgio refactor. So if I change it to not re-chunk the graph I need to fix libvgio and bring back giant protobuf decoding support.

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    Works perfectly. Thanks!

    On Tue, May 7, 2019 at 1:34 PM Adam Novak wrote:

    Closed #2251 https://github.com/vgteam/vg/issues/2251 via #2253 https://github.com/vgteam/vg/pull/2253.

    — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2251#event-2324957756, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG373WTE6E53LVYSLENK3DPUG4S5ANCNFSM4HLAULJA .

    点赞 评论 复制链接分享
  • weixin_39983383 weixin_39983383 4月前

    Thanks Adam!

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    Actually, I don't think this fixed it: There seem to be many empty alt paths that shouldn't be empty. The VCFSnarlTraversal finder spits out lots of warnings, ex:

    
    vg snarls -r travs GIAB_chr22.vg -v HG002_SVs_Tier1_v0.6.vcf.gz > /dev/null
    [VCFTraversalFinder] Warning: No alt path (prefix=_alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_) found in graph for variant.  It will be ignore
    d:
    22  16060522    HG3_Ill_Spiral_19316    T   TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC    10  NoConsensusGT   BREAKSIMLENGTH=37;CGcall
    s=0;CGexactcalls=0;ClusterIDs=HG2_Ill_GATKHC_30210:HG3_Ill_Spiral_19316:HG3_Ill_GATKHC_30135:HG3_Ill_250bpfermikitraw_24368;ClusterMaxEditDist=0
    ;ClusterMaxShiftDist=0;ClusterMaxSizeDiff=0;DistBack=-32059374;DistForward=32886;DistMin=-32059374;DistMinlt1000=TRUE;DistPASSHG2gt49Minlt1000=.
    ;DistPASSMinlt1000=.;END=16060522;ExactMatchIDs=HG2_Ill_GATKHC_30210:HG3_Ill_Spiral_19316:HG3_Ill_GATKHC_30135:HG3_Ill_250bpfermikitraw_24368;HG
    003_GT=./.;HG004_GT=./.;HG2count=1;HG3count=3;HG4count=0;Illcalls=4;Illexactcalls=4;MendelianError=.;MultiTech=FALSE;MultiTechExact=FALSE;NumClu
    sterSVs=4;NumExactMatchSVs=4;NumTechs=1;NumTechsExact=1;PBcalls=0;PBexactcalls=0;REFWIDENED=22:16060521-16060560;REPTYPE=DUP;SVLEN=35;SVTYPE=INS
    ;TRall=TRUE;TRgt100=TRUE;TRgt10k=FALSE;TenXcalls=0;TenXexactcalls=0;segdup=TRUE;sizecat=20to49
    

    Looking at the alt paths for this insertion variant, we can see they are both empty, which is incorrect:

    
    vg paths -v GIAB_chr22.vg  -Q _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_ -E  
    _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_0 0
    _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_1 0
    

    I'm not sure if this is a regression from the fix above, or something that's being revealed now that it runs without crashing.

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    There's something weird going on. It's not just the alt paths, the variant itself isn't winding up in the graph (with or without -a)

    
    vg construct -r human_g1k_v37.fasta -v HG002_SVs_Tier1_v0.6_norm.vcf.gz -R 22:16060510-16060540 -fa | vg view -
    H   VN:Z:1.0
    S   1   TTTTTCTTTTTCT
    S   2   TTCTTTCTTTCTTTCTTT
    P   22  1+,2+   13M,18M 
    P   _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_0         
    P   _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_1         
    L   1   +   2   +   0M
    
    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    Looks like this is INFO field related. Stripping it all out with vcfkeepinfo is sufficient to fix

    
    vcfkeepinfo HG002_SVs_Tier1_v0.6.vcf.gz NA | bgzip > HG002_SVs_Tier1_v0.6.clean.vcf.gz
    tabix -f -p vcf !$
    vg construct -r human_g1k_v37.fasta -v HG002_SVs_Tier1_v0.6.clean.vcf.gz -R 22:16060510-16060540 -fa | vg view -
    H   VN:Z:1.0
    S   1   TTTTTCTTTTTCT
    S   2   TTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCT
    S   3   TTC
    S   4   TTCTTTCTTTCTTTCTTT
    P   22  1+,4+   13M,18M 
    P   _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_0         
    P   _alt_e50449bacbb11f76c0cab1e7724e4dd7b706fb2c_1 2+,3+   32M,3M  
    L   1   +   2   +   0M
    L   1   +   4   +   0M
    L   2   +   3   +   0M
    L   3   +   4   +   0M
    

    Would guess the END tag is to blame.

    点赞 评论 复制链接分享
  • weixin_39767322 weixin_39767322 4月前

    Luckily, the script used to make the graph for the paper stripped out the info fields too.

    There's still a bug here. INFO fields shouln't cause variants to be silently skipped.

    点赞 评论 复制链接分享
  • weixin_39687881 weixin_39687881 4月前

    I can replicate this by extracting just that variant from the VCF, and then building 22:16060500-16060600 with construct. I get a graph with 4 nodes, 3 edges, and 2 empty alt paths.

    This shouldn't have anything to do with the chunking, because such a little graph should all be being built on one chunk.

    点赞 评论 复制链接分享

相关推荐