weixin_39907526
weixin_39907526
2021-01-11 21:29

[docdb] Track "too many open files" cluster wide and display in UI


% bin/ysqlsh
ysqlsh (11.2-YB-2.2.0.0-b0)

| Node Count: 1 | Replication Factor: 1                                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL Shell          : bin/ysqlsh                                                                 |
| YCQL Shell          : bin/ycqlsh                                                                 |
| YEDIS Shell         : bin/redis-cli                                                              |
| Web UI              : http://127.0.0.1:7000/                                                     |
| Cluster Data        : /Users/zhihongyu/yugabyte-data                                             |

I adjusted max files :


% launchctl limit maxfiles
    maxfiles    1048576        1048576

I was following '1. LOAD SAMPLE DATASET' from https://download.yugabyte.com/ on Mac


yugabyte=# \c yb_demo;
You are now connected to database "yb_demo" as user "yugabyte".
yb_demo=# \i share/schema.sql
CREATE TABLE
CREATE TABLE
CREATE TABLE
ysqlsh:share/schema.sql:48: FATAL:  terminating connection due to unexpected postmaster exit
ysqlsh:share/schema.sql:48: ERROR:  Invalid argument: Invalid table definition: Timed out waiting for Table Creation

It seems the operation stopped at the third table


CREATE TABLE orders(

Trying to run bin/ysqlsh again resulted in:


ysqlsh: could not connect to server: Connection refused
    Is the server running on host "localhost" (::1) and accepting
    TCP/IP connections on port 5433?
could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 5433?

该提问来源于开源项目:yugabyte/yugabyte-db

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

10条回答

  • weixin_39907526 weixin_39907526 4月前

    Darwin Zhihongs-Air.attlocal.net 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64

    点赞 评论 复制链接分享
  • weixin_39907526 weixin_39907526 4月前

    After killing yugabyte-2.2.0.0/bin/yb-master and running './bin/yugabyted start', I didn't see the yb_demo DB from previous run.

    This time, the demo loading passed.

    点赞 评论 复制链接分享
  • weixin_39849254 weixin_39849254 4月前

    is it possible to attach some logs from yb-tserver and postgres logs ? (https://docs.yugabyte.com/latest/troubleshoot/nodes/check-logs/)

    点赞 评论 复制链接分享
  • weixin_39907526 weixin_39907526 4月前

    From yugabyte-data/node-1/disk-1/tserver.err

    
    libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: random_device failed to open /dev/urandom: Too many open files
    *** Aborted at 1596671160 (unix time) try "date -d " if you are using GNU date ***
    PC: @     0x7fff6bd3633a __pthread_kill
    *** SIGABRT () received by PID 1725 (TID 0x700009c92000) stack trace: ***
        @     0x7fff6bdeb5fd _sigtramp
        @        0x100000400 (unknown)
        @     0x7fff6bcbd808 abort
        @     0x7fff68f25458 abort_message
        @     0x7fff68f168a7 demangling_terminate_handler()
        @     0x7fff6aa51a5f _objc_terminate()
        @     0x7fff68f24886  std::__terminate()
        @     0x7fff68f271a1  __cxxabiv1::failed_throw()
        @     0x7fff68f27169 __cxa_throw
        @     0x7fff68f0155a  std::__1::__throw_system_error()
        @     0x7fff68ef8dd2  std::__1::random_device::random_device()
        @        0x11111bcd1  yb::GetRandomSeed32()
        @        0x1110a48d3  yb::MemTracker::MemTracker()
        @        0x1110ac378  std::__1::shared_ptr<>::make_shared<>()
        @        0x1110a3ca7  yb::MemTracker::CreateChild()
        @        0x1110a6671  yb::MemTracker::FindOrCreateTracker()
        @        0x10e0ae78f  yb::tablet::Tablet::OpenKeyValueTablet()
        @        0x10e0ad540  yb::tablet::Tablet::Open()
        @        0x10e0ea97b  yb::tablet::TabletBootstrap::OpenTablet()
        @        0x10e0e4908  yb::tablet::TabletBootstrap::Bootstrap()
        @        0x10e0f7826  yb::tablet::BootstrapTablet()
        @        0x10dd6ce1a  yb::tserver::TSTabletManager::OpenTablet()
        @        0x1111425c0  yb::ThreadPool::DispatchThread()
        @        0x111136651  yb::Thread::SuperviseThread()
        @     0x7fff6bdf7109 _pthread_start
    

    yugabyte-data/node-1/disk-1/master.err

    
    *** Aborted at 1596674426 (unix time) try "date -d " if you are using GNU date ***
    PC: @     0x7fff6bd32756 __semwait_signal
    *** SIGTERM () received by PID 1722 (TID 0x11a4badc0) stack trace: ***
        @     0x7fff6bdeb5fd _sigtramp
        @                0x0 (unknown)
        @     0x7fff68f016d9  std::__1::this_thread::sleep_for()
        @        0x10c73514f  yb::server::TotalMemWatcher::MemoryMonitoringLoop()
        @        0x10a6ec6df main
        @     0x7fff6bbeecc9 start
    

    As I showed in description, launchctl returned 1048576 (I changed the setting prior to loading sample).

    点赞 评论 复制链接分享
  • weixin_39849254 weixin_39849254 4月前

    can you check at the start of .INFO, it should print the ulimit.

    点赞 评论 复制链接分享
  • weixin_39907526 weixin_39907526 4月前

    After entering the previous comment, I ran 'bin/yb-ctl wipe_restart' and loaded the sample again (so I don't have previous logs). No crash this time.

    ulimit from yb-master.INFO and yb-tserver.INFO shows 1048576

    I guess the crash was due to lower ulimit value at the time of first attempt.

    点赞 评论 复制链接分享
  • weixin_39849254 weixin_39849254 4月前

    yes that was in the log

    
    libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: random_device failed to open /dev/urandom: Too many open files
    
    点赞 评论 复制链接分享
  • weixin_39907526 weixin_39907526 4月前

    I wonder if there can be some hint provided to user that ulimit needs to be increased, in case TSTabletManager encounters similar exception. e.g. show message on command line or Web UI.

    Thanks

    点赞 评论 复制链接分享
  • weixin_39849254 weixin_39849254 4月前

    e.g. show message on command line or Web UI.

    Can you open an issue regarding this ?

    We have recommendations in getting started https://docs.yugabyte.com/latest/quick-start/install/macos/.

    And deployment checklist https://docs.yugabyte.com/latest/deploy/checklist/.

    点赞 评论 复制链接分享
  • weixin_39907526 weixin_39907526 4月前

    For showing message on command line or Web UI, can this issue be modified for that ?

    There is no other actionable item for this issue.

    Thanks

    点赞 评论 复制链接分享

相关推荐