weixin_39940957
weixin_39940957
2020-11-30 10:24

Converting rasm file to netCDF3 using xarray

This would help new users like https://github.com/pydata/xarray/issues/1113 and simplify the RTD build process (https://github.com/pydata/xarray/issues/1106).

The problem is that it is not as trivial as expected. On the latest master:

python
import xarray as xr
ds = xr.tutorial.load_dataset('rasm')
ds.to_netcdf('rasm.nc', format='NETCDF3_CLASSIC', engine='scipy')

Throws an error:

python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/mowglie/Documents/git/xarray/xarray/backends/api.py in to_netcdf(dataset, path, mode, format, group, engine, writer, encoding)
    516     try:
--> 517         dataset.dump_to_store(store, sync=sync, encoding=encoding)
    518         if isinstance(path, BytesIO):

/home/mowglie/Documents/git/xarray/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding)
    754         if sync:
--> 755             store.sync()
    756 

/home/mowglie/Documents/git/xarray/xarray/backends/scipy_.py in sync(self)
    149         super(ScipyDataStore, self).sync()
--> 150         self.ds.flush()
    151 

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in flush(self)
    388         if hasattr(self, 'mode') and self.mode in 'wa':
--> 389             self._write()
    390     sync = flush

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write(self)
    400         self._write_gatt_array()
--> 401         self._write_var_array()
    402 

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write_var_array(self)
    448             for name in variables:
--> 449                 self._write_var_metadata(name)
    450             # Now that we have the metadata, we know the vsize of

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write_var_metadata(self, name)
    466         for dimname in var.dimensions:
--> 467             dimid = self._dims.index(dimname)
    468             self._pack_int(dimid)

ValueError: '2' is not in list

该提问来源于开源项目:pydata/xarray

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

6条回答

  • weixin_39940957 weixin_39940957 5月前

    Should we try to get to the source of this, or should I simply use ncks to do the conversion?

    点赞 评论 复制链接分享
  • weixin_39610721 weixin_39610721 5月前

    Can you write it as netcdf3 using engine="netcdf4"? This might be a scipy bug.

    点赞 评论 复制链接分享
  • weixin_39940957 weixin_39940957 5月前

    Yes, it works. But opening it with scipy throws a new error (to xarray's defense, the file I created with ncks also can't be opened with scipy- but it can with ncview).

    I think all this gets way to complicated for a Sunday evening and a simple demo file ;-)

    点赞 评论 复制链接分享
  • weixin_39923262 weixin_39923262 5月前

    I also think this is a scipy bug. After converting the file to netCDF3 CLASSIC mode, I get an error in the scipy backend...

    shell
    $ ncks -3 rasm.nc rasm.nc
    
    $ ncdump -k rasm.nc 
    classic
    $ ncdump -h rasm.nc 
    netcdf rasm {
    dimensions:
        time = 36 ;
        y = 205 ;
        x = 275 ;
    variables:
        double Tair(time, y, x) ;
            Tair:_FillValue = 9.96920996838687e+36 ;
            Tair:units = "C" ;
            Tair:long_name = "Surface air temperature" ;
            Tair:dimensions = "2" ;
            Tair:type_preferred = "double" ;
            Tair:time_rep = "instantaneous" ;
            Tair:coordinates = "yc xc" ;
        double time(time) ;
            time:dimensions = "1" ;
            time:long_name = "time" ;
            time:type_preferred = "int" ;
            time:units = "days since 0001-01-01" ;
            time:calendar = "noleap" ;
        double xc(y, x) ;
            xc:long_name = "longitude of grid cell center" ;
            xc:units = "degrees_east" ;
            xc:bounds = "xv" ;
        double yc(y, x) ;
            yc:long_name = "latitude of grid cell center" ;
            yc:units = "degrees_north" ;
            yc:bounds = "yv" ;
    
    // global attributes:
            :title = "/workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc" ;
            :institution = "U.W." ;
            :source = "RACM R1002RBRxaaa01a" ;
            :output_frequency = "daily" ;
            :output_mode = "averaged" ;
            :convention = "CF-1.4" ;
            :references = "Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429." ;
            :comment = "Output from the Variable Infiltration Capacity (VIC) model." ;
            :nco_openmp_thread_number = 1 ;
            :NCO = "\"4.6.0\"" ;
            :history = "Tue Dec 27 13:38:40 2016: ncks -3 rasm.nc rasm.nc\n",
                "history deleted for brevity" ;
    }
    Python
    
    In [1]: from scipy.io import netcdf
       ...: f = netcdf.netcdf_file('rasm.nc', 'r')
       ...: for k, v in f.variables.items():
       ...:     print(k, v.dimensions)
       ...:     
    yc ('y', 'x')
    xc ('y', 'x')
    time b'1'
    Tair b'2'
    
    In [2]: import xarray as xr
       ...: xr.open_dataset('rasm.nc', engine='netcdf4')
    <xarray.dataset>
    Dimensions:  (time: 36, x: 275, y: 205)
    Coordinates:
      * time     (time) datetime64[ns] 1980-09-16T12:00:00 1980-10-17 ...
        xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
        yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
      * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
      * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
    Data variables:
        Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...
    </xarray.dataset>

    I don't know where scipy is getting the b'1' and b'2' dimensions. I can push the converted dataset to xarray-data but that doesn't really solve the problem of using scipy.

    点赞 评论 复制链接分享
  • weixin_39610721 weixin_39610721 5月前

    It looks like scipy gets confused between the the dimensions = '1' netCDF attribute and the variable's actual dimensions, so this is another scipy bug. When it sets the dimensions netCDF attribute, it overwrites the Python attribute of the same name: https://github.com/scipy/scipy/blob/c48dfa43eae3474f06353ed3664caed945e9aee1/scipy/io/netcdf.py#L837-L849

    The simple work around is to remove the dimensions attribute from each of these variables.

    点赞 评论 复制链接分享
  • weixin_39923262 weixin_39923262 5月前

    I see. I should have looked at the attributes. https://github.com/pydata/xarray-data/issues/7 fixes these issue and the dataset can now be read with scipy.

    点赞 评论 复制链接分享

相关推荐