如何在这个代码片中编写tokens？

// a skeleton implementation of a tokeniser

#include "tokeniser.h"
#include <iostream>
#include <ctype.h>

// to shorten the code
using namespace std ;

////////////////////////////////////////////////////////////////////////

namespace Assignment_Tokeniser
{

    // is the token of the given kind or does it belong to the given grouping?
    bool token_is_in(Token token,TokenKind kind_or_grouping)
    {
        TokenKind kind = token_kind(token) ;

        // check identity first
        if ( kind == kind_or_grouping ) return true ;

        switch(kind_or_grouping)
        {
        default:
            return false ;
        }
    }

    // the current input character, initiliased to ' ' which we ignore
    // it is an int so that the EOF marker is not confused with a legal character
    static int ch = ' ' ;

    // the current line number and column, initialised to line 1 column 0
    static int line_num = 1 ;
    static int column_num = 0 ;

    // the line number and column for the first character in the current token
    static int start_line = 0 ;
    static int start_column = 0 ;

    // generate a context string for the given token
    // it shows the line before the token,
    // the line containing the token, and
    // a line with a ^ marking the token's position
    // tab stops are every 8 characters
    // in the context string, tabs are replaced by spaces (1 to 8)
    // so that the next character starts on an 8 character boundary
    string token_context(Token token)
    {
        return "" ;
    }

    // read next character if not at the end of input
    // and update the line and column numbers
    static void nextch()
    {
        extern int read_char() ;

        if ( ch == EOF ) return ;

        if ( ch == '\n' )           // if last ch was newline ...
        {
            line_num++ ;            // increment line number
            column_num = 0 ;        // reset column number
        }

        ch = getchar() ;            // read the next character from stdin
        column_num++ ;              // increment the column number
    }

    ////////////////////////////////////////////////////////////////////////

    // called when we find end of input or we have a bad token
    Token parse_eoi()
    {
        // simulate end of input in case this is handling a bad token rather than a real end of input
        ch = EOF ;

        // return an eoi token
        return new_token(tk_eoi,"",start_line,start_column) ;
    }

    // return the next token object by reading more of the input
    Token next_token()
    {
        // you must read input using the nextch() function
        // the last character read is in the static variable ch
        // always read one character past the end of the token being returned

        // this loop reads one character at a time until it reaches end of input
        while ( ch != EOF )
        {
            start_line = line_num ;                 // remember current position in case we find a token
            start_column = column_num ;

            switch(ch)                              // ch is always the next char to read
            {
            case ' ':                               // ignore space, tab, CR and LF
            case '\t':
            case '\r':
            case '\n':
                nextch() ;                          // read one more character and try again
                break ;
                                                    // add additional case labels here for characters that can start tokens
                                                    // call a parse_* function to complete and return each kind of token
            default:
                return parse_eoi() ;                // the next character cannot start a token, return an EOI token
            }
        }

        start_line = line_num ;                     // remember current position so EOI token is correct
        start_column = column_num ;

        return parse_eoi() ;                         // return an EOI token
    }
}

在这个代码片中，添加 indentifier，integer的token
要求：
1.所有输入都必须用nextch（）函数。
2.如果达到输入结束，则return tk_eoi()

3.如果发现一个字符不能作为token的一部分，或者不是space" "、tab“\t",carriage return“\r”或newline“\n”，则return token tk_eoi()

4.所有token必须是输入中的连续字符

5.搜索下一个token的开始时，所有space、tab,carriage return和newline都将被忽略。

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
threenewbee 2019-04-29 21:47
关注
你这个应该是词法解析程序。
不知道按照什么语言的语法来。假设按照C语言来说
indentifier的规则是：
下划线、字母开头，后面跟字母、数字、下划线
integer的规则是
任意数字开头，如果是0开头，下一个字符可以是x或者X，如果是0x或者0X开头，后面可以是0-9a-fA-F，否则是0-9

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何在这个代码片中编写tokens？ c++ 开发语言
2019-04-29 17:49

回答 2 已采纳你这个应该是词法解析程序。不知道按照什么语言的语法来。假设按照C语言来说 indentifier的规则是：下划线、字母开头，后面跟字母、数字、下划线 integer的规则是任意数字开头
用nltk去停用词如何分行？(语言-python) nlp python 数据分析有问必答
2022-02-18 14:27

回答 2 已采纳可以用for循环写成嵌套列表进行处理，示例如下，获取按行去除停用词的分词结果，并可以直接写入csv或者excel中： from nltk.corpus import stopwords from nl
如何在laravel 5中使用代码编写测试时重用JWT令牌？ laravel php
2015-11-09 07:01

回答 2 已采纳 You can grab the token from json by using call to $I->grabDataFromJsonResponse(). Example assum
程序员要如何创建一门编程语言？
2022-06-07 10:24

程序员大咖的博客作者 | Md Shuvo 译者 | 弯月出品 | CSDN（ID：CSDNnews）虽然每位开发人员都掌握了一种甚至多种编程语言，但你是否曾想过自己动手创建一种编程语言？首先，我们来看看什么是编程语言：编程语言是用来定义计算机程序...
在html文件的<!DOCTYPE html>上面加上{ % load static% }会出现unexpected tokens的错误应该如何解决？ django html 前端有问必答
2022-02-16 13:49

回答 5 已采纳 <!DOCTYPE html> 需要放在第一行。
python代码不理解其作用 python 深度学习自然语言处理
2022-01-25 12:08

回答 3 已采纳这个是深度学习里面，把一句话中的每个单词或者字转化为数字序列的方式，然后基于数字序列去检索对应的向量的操作。
如何解决XLnet分类存在的问题？ xlnet 有问必答深度学习自然语言处理
2022-01-10 19:57

回答 2 已采纳检查一下传入的tokenizer参数，导致抛出None值无encode_plus属性错误。
每日五分钟：学习编写编程语言
2021-06-15 21:17

Daniel Tan的博客最近想为祖国贡献，就来学学写语言。大家可能觉得写一个语言很难，但是其实，怎么说呢，其实理解理解一下会发现没有想象中的难懂。今天就先看看语言的bian yi
Reference ー在 PHP 中这个符号表示什么？ php
2010-09-17 16:24

回答 17 已采纳 Incrementing / Decrementing Operators ++ increment operator -- decrement operator Example Na
语法编译器中的nextch（）怎么写？ c++ 开发语言
2020-05-07 09:36

回答 1 已采纳 https://blog.csdn.net/qq_36721220/article/details/103599712
这个符号在 PHP 中是什么意思？ php
2010-09-17 16:24

回答 17 已采纳 Incrementing / Decrementing Operators ++ increment operator -- decrement operator Example Na
打破国外垄断，开发中国人自己的编程语言（1）：编写解析表达式的计算器
2020-07-28 15:00

蒙娜丽宁的博客本文是《打破国外垄断，开发中国人自己的编程语言》系列文章的第1篇。本系列文章的主要目的是教大家学会如何从零开始设计一种编程语言（marvel语言），并使用marvel语言开发一些真实的项目，如移动App、Web应用等。
c语言使用linux shell的vim编程，在使用指针的时候，值莫名其妙消失了。 c语言
2023-03-17 21:12

回答 4 已采纳该回答引用GPTᴼᴾᴱᴺᴬᴵ根据提供的代码和信息，出现问题的地方可能是在fillCommandStructure函数中： void fillCommandStructure(Command *cp,
Langchain+本地大语言模型进行数据库操作的实战代码
2023-06-23 11:10

herosunly的博客本文讲解了Langchain+本地大语言模型进行数据库操作的实战代码，希望能对尝试使用开源大语言模型进行SQL操作的同学们有所帮助。文章目录 1. 前言 2. 代码思路剖析 3. 实战代码
自然语言处理实战：新闻文本分类（附代码）
2020-08-13 18:06

南有芙蕖的博客自然语言处理实战：新闻文本分类 ——本文比赛来源于天池零基础入门NLP - 新闻文本分类。目录自然语言处理实战：新闻文本分类一、赛题理解1、学习目标2、赛题数据3、数据标签4、评测指标5、数据读取6、解题思路二...
DevChat：VSCode中基于大模型的AI智能编程助手
2023-10-30 10:15

herosunly的博客 DevChat是由Merico公司精心...作为一款全方位的AI智能编程助手，不仅能够完成代码编写，而且还能够完成单元测试、Debug调试、代码文档编写和高效总结。在保证编码质量的同时，DevChat也非常注重用户隐私和数据安全。
如何计算 ChatGPT 的 Tokens 数量？
2023-12-05 09:15

Xin学数据的博客本文主要介绍了 GPT 如何计算 Tokens 的方法，官方提供了两种方式：网页计算和接口计算。网页计算不需要技术，只需要魔法即可体验，而接口计算，事实上接口计算包含了两种方法，一种使用tiktoken，则需要点 Python ...
AI Code Generation:人工智能LLM大模型对编程的影响
2023-07-02 22:23

禅与计算机程序设计艺术的博客随着计算机科学的发展，机器学习和自然语言处理等技术已经使得AI能够帮助程序员更快速、更准确地编写代码。近年来，AI代码生成的技术已经成为了人工智能领域的热门研究方向之一。本文将探讨AI代码生成对编程的影响，...
【自然语言处理】【大模型】CodeGeeX：用于代码生成的多语言预训练模型
2023-05-07 16:27

BQW_的博客这个任务由来已久，解决的方案也层出不穷。近期，通过将程序看作是语言序列，利用深度学习的transformer架构进行建模，显著的改善了代码生成的质量。特别是当大规模的开源代码数据与大语言模型相结合。 OpenAI的...
ChatGPT在编程中的应用
2023-03-21 22:10

-飞鹤-的博客 ChatGPT是一个大型语言模型（Large Language Model，LLM），是一种基于生成式预训练变换模型（Generative Pre-trained Transformer，简称GPT）的聊天机器人，由美国OpenAI团队研发。它可以根据聊天的上下文生成自然...
没有解决我的问题, 去提问

悬赏问题

¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗？
¥15 请教：如何用postman调用本地虚拟机区块链接上的合约？
¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题：[h264 @ 000000004faf7500]no frame？
¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败

如何在这个代码片中编写tokens？

1条回答 默认 最新

悬赏问题

1条回答默认最新