[SQL] 基于Calcite的解析器

Posted by Akilis on 26 Jun, 2024

Contents

原理

SQL Parser Compiler

步骤

1 pom.xml

项目自身仅修改附加模板、数据文件,不修改 Parser.jj 文件;Calcite版本升级时只需要修改项目pom依赖的Calcite版本号,而不用去反复拷贝Calcite文件,合并Parser.jj。

    ```
    <project>
      <build>
        <plugins>
                                <plugin>
                    <!-- Extract parser grammar template from calcite-core.jar and put
                         it under ${project.build.directory} where all freemarker templates are. -->
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-dependency-plugin</artifactId>
                    <version>2.8</version>
                    <executions>
                        <execution>
                            <id>unpack-parser-template</id>
                            <phase>initialize</phase>
                            <goals>
                                <goal>unpack</goal>
                            </goals>
                            <configuration>
                                <artifactItems>
                                    <artifactItem>
                                        <groupId>org.apache.calcite</groupId>
                                        <artifactId>calcite-core</artifactId>
                                        <type>jar</type>
                                        <overWrite>true</overWrite>
                                        <outputDirectory>${project.build.directory}/</outputDirectory>
                                        <includes>**/Parser.jj</includes>
                                    </artifactItem>
                                </artifactItems>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>

    <!-- adding fmpp code gen -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <executions>
                        <execution>
                            <id>copy-fmpp-resources</id>
                            <phase>initialize</phase>
                            <goals>
                                <goal>copy-resources</goal>
                            </goals>
                            <configuration>
                                <outputDirectory>${project.build.directory}/codegen</outputDirectory>
                                <resources>
                                    <resource>
                                        <directory>src/main/codegen</directory>
                                        <filtering>false</filtering>
                                    </resource>
                                </resources>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
          <plugin>

          <!-- javacc + fmpp -->
                      <groupId>org.codehaus.mojo</groupId>
                      <artifactId>javacc-maven-plugin</artifactId>
                      <executions>
                          <execution>
                              <id>javacc</id>
                              <goals>
                                  <goal>javacc</goal>
                              </goals>
                              <configuration>
                                  <sourceDirectory>${project.build.directory}/generated-sources/fmpp</sourceDirectory>
                                  <includes>
                                      <include>**/Parser.jj</include>
                                  </includes>
                                  <lookAhead>2</lookAhead>
                                  <isStatic>false</isStatic>
                              </configuration>
                          </execution>
                          <execution>
                              <id>javacc-test</id>
                              <phase>generate-test-sources</phase>
                              <goals>
                                  <goal>javacc</goal>
                              </goals>
                              <configuration>
                                  <sourceDirectory>${project.build.directory}/generated-test-sources/fmpp</sourceDirectory>
                                  <outputDirectory>${project.build.directory}/generated-test-sources/javacc</outputDirectory>
                                  <includes>
                                      <include>**/Parser.jj</include>
                                  </includes>
                                  <lookAhead>2</lookAhead>
                                  <isStatic>false</isStatic>
                              </configuration>
                          </execution>
                      </executions>
                  </plugin>
                  <plugin>
                      <groupId>org.apache.drill.tools</groupId>
                      <artifactId>drill-fmpp-maven-plugin</artifactId>
                      <executions>
                          <execution>
                              <configuration>
                                  <config>src/main/codegen/config.fmpp</config>
                                  <output>${project.build.directory}/generated-sources/fmpp</output>
                                  <templates>src/main/codegen/templates</templates>
                              </configuration>
                              <id>generate-fmpp-sources</id>
                              <phase>validate</phase>
                              <goals>
                                  <goal>generate</goal>
                              </goals>
                          </execution>
                      </executions>
                  </plugin>
        </plugins>
      </build>
    </project>
    ```

2 修改config.fmpp,重命名新Parser

    ```
    找到

    package: "org.apache.calcite.sql.parser.impl",
    将下方的class,替换成一个你自己的类名,后面会用到。例如

    class: "JackySqlParserImpl",
    ```

3 修改ftl,自定义新语法

    ```
    import org.apache.calcite.sql.SqlJacky;
    ...

    SqlNode SqlJacky() :
    {
         SqlNode stringNode;
    }
    {
        <JACKY> <JOB>
        stringNode = StringLiteral()
        {
            return new SqlJacky(getPos(), token.image);
        }
    }
    ...

    SqlNode SqlStmt():
          ...
           stmt = SqlJacky()


    <DEFAULT, DQID, BTID> TOKEN :
    |   < JACKY: "JACKY">
    |   < JOB: "JOB">
    ```

4 实现新语法解析类

    ```
    package org.apache.calcite.sql;
    import org.apache.calcite.sql.parser.SqlParserPos;
    import org.apache.calcite.sql.util.SqlVisitor;
    import org.apache.calcite.sql.validate.SqlValidator;
    import org.apache.calcite.sql.validate.SqlValidatorScope;
    import org.apache.calcite.util.Litmus;
    public class SqlJacky extends SqlNode {
        private String jackyString;
        private SqlParserPos pos;
        public  SqlJacky(SqlParserPos pos, String jackyString){
            super(pos);
            this.pos = pos;
            this.jackyString = jackyString;
        }

        public String getJackyString(){
            System.out.println("getJackyString");
            return this.jackyString;
        }

        @Override
        public SqlNode clone(SqlParserPos sqlParserPos) {
            System.out.println("clone");
            return null;
        }

        @Override
        public void unparse(SqlWriter sqlWriter, int i, int i1) {
            sqlWriter.keyword("jacky");
            sqlWriter.keyword("job");
            sqlWriter.print("\n");
            sqlWriter.keyword("" + jackyString + "");
        }

        @Override
        public void validate(SqlValidator sqlValidator, SqlValidatorScope sqlValidatorScope) {
            System.out.println("validate");
        }

        @Override
        public <R> R accept(SqlVisitor<R> sqlVisitor) {
            System.out.println("accept");
            return null;
        }

        @Override
        public boolean equalsDeep(SqlNode sqlNode, Litmus litmus) {
            System.out.println("equalsDeep");
            return false;
        }
    }
    ```

5 编译生成新Parser及测试

    ```
    package cn.flinkhub;
    import org.apache.calcite.avatica.util.Casing;
    import org.apache.calcite.avatica.util.Quoting;
    import org.apache.calcite.schema.SchemaPlus;
    import org.apache.calcite.sql.SqlNode;
    import org.apache.calcite.sql.parser.SqlParser;
    import org.apache.calcite.tools.FrameworkConfig;
    import org.apache.calcite.tools.Frameworks;
    import org.apache.calcite.sql.parser.impl.JackySqlParserImpl;
    public class CustomParser {
        public static void main(String[] args) {
            SchemaPlus rootSchema = Frameworks.createRootSchema(true);
            final FrameworkConfig config = Frameworks.newConfigBuilder()
                    .parserConfig(SqlParser.configBuilder()
                            //.setLex(Lex.ORACLE)
                            .setParserFactory(JackySqlParserImpl.FACTORY)
                            .setCaseSensitive(false)
                            .setQuoting(Quoting.BACK_TICK)
                            .setQuotedCasing(Casing.TO_UPPER)
                            .setUnquotedCasing(Casing.TO_UPPER)
                            //.setConformance(SqlConformanceEnum.ORACLE_12)
                            .build())
                    .build();
    //        "jacky 'select ids, name from test where id < 5'";
            String sql = "jacky job  'select ids, name from test where id < 5'";
            SqlParser parser = SqlParser.create(sql, config.getParserConfig());
            try {
                SqlNode sqlNode = parser.parseStmt();
                System.out.println(sqlNode.toString());
            } catch (Exception e) {
                e.printStackTrace();
            }

        }
    }
    ```

实践

1 开发

2 测试

可借助 Calcite built-in SqlParserTest 测试工具,高效、严谨测试。

    ```
    <dependency>
        <groupId>org.apache.calcite</groupId>
        <artifactId>calcite-testkit</artifactId>
        <version>${calcite.version}</version>
    </dependency>
    ```

3 SqlNode

4 print SQL string

5 UDF

5.1 function type signature

5.2 resolve

设计案例

SqlCodec

  1. 设计一套编解码器 Codec,实现解析指定dialect 的 sql,然后翻译成另一种 dialect 的 sql
    Codec on Parser

  2. Codec 转换 sourceSql 为 targetSql 的流程
    Flow


   SQL    OLAP    BI    PARSER    CALCITE   

 Share on: Email