比开源快30倍的自研SQL Parser设计与实践


代码逻辑清晰,方便开发人员调试和排错;
性能更好:有更多代码优化的空间交给开发人员,可以使用更优秀的算法和数据结构提升性能;
自主可控:无licence约束,可读性和可维护性更高;
不需要额外依赖第三方词法语法代码生成工具。
对开发人员的技术要求较高,需了解编译原理技术;
开发工作量较大,实现MySQL常用语法的各类分支,需要投入很多时间和精力;
需要长时间、大规模测试才会趋于稳定。

selECT c1 From T1;public interface Replaceable {boolean replace(Node expr, Node target);}public class BetweenNode implements Replaceable {public Node beginExpr;public Node endExpr;@Overridepublic int hashCode(){...}@Overridepublic boolean equals(Object obj) {...}@Overridepublic boolean replace(SQLExpr expr, SQLExpr target) {if (expr == beginExpr) {setBeginExpr(target);return true;}if (expr == endExpr) {setEndExpr(target);return true;}return false;}}
支持AST Clone:如果保持原AST结构不变,克隆出一个新的AST,在新的AST修改节点结构,比如:增加Hint,删减where条件,增加limit 限制等。
维护AST 父子关系:自动生成的解析器维护了父到子节点的关系,是单向的引用关系。手写代码可以增加子节点对父节点的引用,构建AST节点的双向引用关系,实现节点的快速“回跳”,使得AST的遍历效率更高。
public abstract class Node { public abstract List<Node> getChildren()}
public class BetweenNode extends Node { public Node beginExpr; public Node endExpr; @Override public List<Node> getChildren() { return Arrays.<Node>asList(beginExpr, this.endExpr); } @Override public BetweenNode clone() { BetweenNode x = new BetweenNode(); if (beginExpr != null) { x.setBeginExpr(beginExpr.clone()); } if (endExpr != null) { x.setEndExpr(endExpr.clone()); } return x; } public void setBeginExpr(Node beginExpr) { if (beginExpr != null) { beginExpr.setParent(this); } this.beginExpr = beginExpr; } public void setEndExpr(Node endExpr) { if (endExpr != null) { endExpr.setParent(this); } this.endExpr = endExpr; }}public interface InsertValueHandler {Object newRow() throws SQLException;void processInteger(Object row, int index, Number value);void processString(Object row, int index, String value);void processDate(Object row, int index, String value);void processDate(Object row, int index, java.util.Date value);void processTimestamp(Object row, int index, String value);void processTimestamp(Object row, int index, java.util.Date value);void processTime(Object row, int index, String value);void processDecimal(Object row, int index, BigDecimal value);void processBoolean(Object row, int index, boolean value);void processNull(Object row, int index);void processFunction(Object row, int index, String funcName, Object... values);void processRow(Object row);void processComplete();}public class BatchInsertHandler implements InsertValueHandler {...}public class Application {BatchInsertHandler handler = new BatchInsertHandler();parser.parseInsertHeader(); // 头部:解析 insert into xxx values 部分parser.parseValues(handler); // 批量值:values (xxx), (xxx), (xxx) 部分}
-- 常量折叠示例SELECT * FROM T1WHERE c_week BETWEEN CAST(date_format(date_add('day', -day_of_week('20180605'), date('20180605')), '%Y%m&d') as bigint) AND CAST(date_format(date_add('day', -day_of_week('20180606'), date('20180606')), '%Y%m&d') as bigint) ------------折叠后-----------SELECT * from T1WHERE c_week BETWEEN 20180602 and 20180603-- 函数转换示例SELECT * FROM T1WHERE DATE_FORMAT(t1.'pay_time', '%Y%m%d') >= '20180529'AND DATE_FORMAT(t1.'pay_time', '%Y%m%d') <= '20180529'-----------转化后, 更好利用索引------------SELECT * FROM T1WHERE t1.'pay_time' >= '2018-05-29 00:00:00'AND t1.'pay_time' < '2018-05-30 00:00:00'
参考文献
[1] Pattis, Richard E.'EBNF: A Notation to Describe Syntax'(PDF).ICS.UCI.edu.University of California, Irvine. p. 1. Retrieved 2021-02-26.
[2] Parr, Terence and Fisher, Kathleen (2011). 'LL (*) the foundation of the ANTLR parser generator'.ACM SIGPLAN Notices.46(6): 425–436.doi:10.1145/1993316.1993548.
[3] Rosenkrantz, D. J.; Stearns, R. E. (1970).'Properties of Deterministic Top Down Grammars'.Information and Control.17(3): 226–256.doi:10.1016/s0019-9958(70)90446-8.
[4] Gurari, Eitan (1999).'CIS 680: DATA STRUCTURES: Chapter 19: Backtracking Algorithms'. Archived fromthe originalon 17 March 2007.
[5] Pirahesh, Hamid; Hellerstein, Joseph M.'Extensible/Rule Based Query Rewrite Optimization in Starburst'.citeseerx.ist.psu.edu. Retrieved 2020-04-06.
SQL Server on Linux入门教程
SQL Server数据库一直只提供Windows下的版本,2016年微软宣布推出可运行在Linux系统下的SQL Server数据库,该版本目前还是早期预览版本。本课程主要介绍SQL Server On Linux的基本知识。
