mirror of
https://github.com/alex-s168/website.git
synced 2025-09-10 01:05:07 +02:00
init
This commit is contained in:
224
pages/article-make-regex-engine-1.typ
Normal file
224
pages/article-make-regex-engine-1.typ
Normal file
@@ -0,0 +1,224 @@
|
||||
#import "../common.typ": *
|
||||
#import "../simple-page-layout.typ": *
|
||||
#import "../core-page-style.typ": *
|
||||
|
||||
#simple-page(
|
||||
gen-table-of-contents: true
|
||||
)[
|
||||
|
||||
#section[
|
||||
#title[Making a simple RegEx engine]
|
||||
|
||||
#title[Part 1: Introduction to RegEx]
|
||||
|
||||
#sized-p(small-font-size)[
|
||||
Written by alex_s168
|
||||
]
|
||||
]
|
||||
|
||||
#if is-web {section[
|
||||
Note that the #min-pdf-link[PDF Version] of this page might look a bit better styling wise.
|
||||
]}
|
||||
|
||||
#section[
|
||||
= Introduction
|
||||
If you are any kind of programmer,
|
||||
you've probably heard of #flink("https://en.wikipedia.org/wiki/Regular_expression", "RegEx")
|
||||
|
||||
RegEx (Regular expression) is kind of like a small programming language
|
||||
used to define string search and replace patterns.
|
||||
|
||||
\
|
||||
RegEx might seem overwhelming at first, but you can learn the most important features of RegEx very quickly.
|
||||
|
||||
\
|
||||
It is important to mention that there is not a single standard for RegEx syntax,
|
||||
but instead each "implementation" has it's own quirks, and additional features.
|
||||
Most common features however behave identically on most RegEx "engines"/implementations.
|
||||
]
|
||||
|
||||
#section[
|
||||
= Syntax
|
||||
The behavior of RegEx expressions / patterns depends on the match options passed to the RegEx engine.
|
||||
|
||||
Common match options: <match-options>
|
||||
- Anchored at start and end of line
|
||||
- Case insensitive
|
||||
- multi-line or instead whole string
|
||||
]
|
||||
|
||||
#section[
|
||||
== "Atoms"
|
||||
In this article, we will refer to single expression parts as "atoms".
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Characters
|
||||
Just use the character that you want to match. For example ```re a``` to match an `a`.
|
||||
This however does not work for all characters, because many are part of special RegEx syntax.
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Escaped Characters <escaped-chars>
|
||||
Thee previously mentioned special characters like `[` can be matched by putting a backslash in front of them: ```re \[```
|
||||
|
||||
#context html-frame[
|
||||
#table(
|
||||
columns: (auto,auto),
|
||||
table.header(
|
||||
[Pattern],
|
||||
[Description]
|
||||
),
|
||||
|
||||
[```re \\```], [match a literal backslash],
|
||||
[```re \n```], [match a new-line],
|
||||
)
|
||||
]
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Character Groups <char-groups>
|
||||
RegEx engines already define some groups of characters that can make writing RegEx expressions quicker.
|
||||
|
||||
#context html-frame[
|
||||
#table(
|
||||
columns: (auto,auto),
|
||||
table.header(
|
||||
[Pattern],
|
||||
[Description],
|
||||
),
|
||||
|
||||
[```re .```], [any character except for line breaks],
|
||||
[```re \s```], [any whitespace or line break],
|
||||
[```re \S```], [any character except whitespaces or line breaks],
|
||||
[```re \d```], [any digit from 0 to 9],
|
||||
[```re \D```], [any character except digits from 0 to 9],
|
||||
[```re \w```], [a letter, digit, or underscore],
|
||||
[```re \W```], [any character except for letters, digits, and underscores],
|
||||
)
|
||||
]
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Anchors
|
||||
```re ^``` is used to assert the beginning of a line in multi-line mode,
|
||||
or the beginning of the string in whole-string mode.
|
||||
|
||||
```re $``` is used to assert the end of a line in multi-line mode,
|
||||
or the end of the string in whole-string mode.
|
||||
|
||||
The behaviours of these depend on the #slink(<match-options>)[match options]
|
||||
]
|
||||
|
||||
#section[
|
||||
== Greedy VS Lazy <greedy>
|
||||
Some combinators will either match "lazy", or "greedy".
|
||||
|
||||
Lazy is when the engine only matches as many characters required to get to the next step.
|
||||
This should almost always be used.
|
||||
|
||||
Greedy matching is when the engine tries to match as many characters as possible.
|
||||
The problem with this is that it might cause "backtracking",
|
||||
which happens when the engine goes back in the pattern multiple times to ensure that as many characters
|
||||
as possible where matched. This can cause big performance issues.
|
||||
]
|
||||
|
||||
#section[
|
||||
== Combinators
|
||||
Multiple atoms can be combined together to form more complex patterns.
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Chain
|
||||
When two expressions are next to each other, they will be chained together,
|
||||
which means that both will be evaluated in-order.
|
||||
|
||||
Example: ```re x\d``` matches a `x` and then a digit, like for example `x9`
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Or
|
||||
Two expressions separated by a `|` cause the RegEx engine to first try to match the left side,
|
||||
and only if it fails, it tries the right side instead.
|
||||
|
||||
Note that "or" has a long left and right scope,
|
||||
which means that ```re ab|cd``` will match either ```re ab``` or ```re cd```
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Or-Not
|
||||
Tries to match the expression on the left to it, but won't error if it doesn't succeed.
|
||||
|
||||
Note that "or-not" has a short left scope,
|
||||
which means that ```re ab?``` will always match ```re a```, and then try to match ```re b```
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Repeated
|
||||
A expression followed by either a ```re *``` for #slink(<greedy>)[greedy] repeat,
|
||||
or a ```re *?``` for #slink(<greedy>)[lazy] repeat.
|
||||
|
||||
This matches as many times as possible, but can also match the pattern zero times.
|
||||
|
||||
Note that this has a short left scope.
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Repeated At Least Once
|
||||
A expression followed by either a ```re +``` for #slink(<greedy>)[greedy] repeat,
|
||||
or a ```re +?``` for #slink(<greedy>)[lazy] repeat.
|
||||
|
||||
This matches as many times as possible, and at least one time.
|
||||
|
||||
Note that this has a short left scope.
|
||||
]
|
||||
|
||||
#section[
|
||||
=== (Non-Capture) Group <non-capture-group>
|
||||
Groups multiple expressions together for scoping.
|
||||
|
||||
Example: ```re (?:abc)``` will just match `abc`
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Capture Group
|
||||
Similar to #slink(<non-capture-group>)[Non-Capture Groups] except that they capture the matched text.
|
||||
This allows the matched text of the inner expression to be extracted later.
|
||||
|
||||
Capture group IDs are enumerated from left to right, starting with 1.
|
||||
|
||||
Example: ```re (abc)de``` will match `abcde`,
|
||||
and store `abc` in group 1.
|
||||
]
|
||||
|
||||
#section[
|
||||
=== Character Set
|
||||
By surrounding multiple characters in square brackets,
|
||||
the engine will match any of them.
|
||||
Special characters or expressions won't be parsed inside them,
|
||||
which means that this can also be used to escape characters.
|
||||
|
||||
For example: ```re [abc]``` will match either `a`, `b` or `c`.
|
||||
|
||||
and ```re [ab(?:c)]``` will match either `a`, `b`, `(`, `?`, `:`, `c`, or `)`.
|
||||
|
||||
#slink(<char-groups>)[Character groups] and #slink(<escaped-chars>)[escaped characters]
|
||||
still work inside character sets.
|
||||
|
||||
Character sets can also contain ranges.
|
||||
For example: ```re [0-9a-z]``` will match either any digit, or any lowercase letter.
|
||||
]
|
||||
|
||||
#section[
|
||||
= Conclusion
|
||||
RegEx is perfect for when you just want to match some patterns,
|
||||
but the syntax can make patterns very hard to read or modify.
|
||||
|
||||
In the next article, we will start to dive into implementing RegEx.
|
||||
|
||||
Stay tuned!
|
||||
]
|
||||
|
||||
|
||||
|
||||
]
|
81
pages/index.typ
Normal file
81
pages/index.typ
Normal file
@@ -0,0 +1,81 @@
|
||||
#import "../common.typ": *
|
||||
#import "../simple-page-layout.typ": *
|
||||
|
||||
#let gen-page(content) = {
|
||||
core-page-style[
|
||||
#if is-web {
|
||||
table(
|
||||
stroke: none,
|
||||
columns: (25%, 50%, 25%),
|
||||
[],
|
||||
[
|
||||
#html-style("position: absolute; left: 28%; width: 100%")[
|
||||
#box(width: 50%, content)
|
||||
]
|
||||
],
|
||||
)
|
||||
} else {
|
||||
content
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
#let tree-list(..elements) = {
|
||||
gen-tree-from-headings(elemfn: (content, x) => [
|
||||
#html-opt-elem("p", (style:"line-height:1.1"))[
|
||||
#html-style("display:flex; text-indent:0pt;")[
|
||||
#html-style("margin-right: 11pt;", content)
|
||||
#html-style("flex:1;", x.body)
|
||||
]
|
||||
]
|
||||
], elements.pos())
|
||||
}
|
||||
|
||||
#gen-page[
|
||||
|
||||
#br()
|
||||
#title[alex_s168's]
|
||||
#br()
|
||||
|
||||
Articles
|
||||
#br()
|
||||
#tree-list(
|
||||
(level:1, body: [ Making a simple RegEx engine ]),
|
||||
(level:2, body: html-href("article-make-regex-engine-1.typ.desktop.html")[ Part 1: Introduction to RegEx ]),
|
||||
)
|
||||
#br()
|
||||
|
||||
Socials
|
||||
#br()
|
||||
#tree-list(
|
||||
(level:1, body: html-href("https://github.com/alex-s168")[ GitHub ]),
|
||||
(level:1, body: [Discord: alex_s168]),
|
||||
(level:1, body: html-href("mailto:alexandernutz68@gmail.com")[ E-Mail ]),
|
||||
(level:1, body: html-href("https://codeberg.org/alex-s168")[ Codeberg ]),
|
||||
)
|
||||
#br()
|
||||
|
||||
Working on
|
||||
#br()
|
||||
#tree-list(
|
||||
(level:1, body: [ Programming languages and compilers ]),
|
||||
(level:2, body: [ #link("https://github.com/vxcc-backend/vxcc")[ vxcc-old ]: (discontinued) Simple optimizing compiler backend ]),
|
||||
(level:2, body: [ #link("https://github.com/alex-s168/uiuac")[ uiuac ]: (discontinued) Optimizing compiler for the #link("https://uiua.org")[Uiua programming language] ]),
|
||||
(level:2, body: [ #link("https://github.com/Lambda-Mountain-Compiler-Backend/lambda-mountain")[ LSTS's standard library ] ]),
|
||||
(level:2, body: [ #link("https://github.com/h6-lang/h6")[ h6 ]: Minimal stack-based programming language ]),
|
||||
(level:2, body: [ #link("https://github.com/alex-s168/lil-rs")[ lil-rs ]: Rust implementation of #link("http://beyondloom.com/decker/lil.html")[lil] ]),
|
||||
|
||||
(level:1, body: [ Misc. ]),
|
||||
(level:2, body: [ #link("https://github.com/alex-s168/tpre")[ tpre ]: Fast and minimal RegEx engine ]),
|
||||
|
||||
(level:1, body: [ PCBs ]),
|
||||
(level:2, body: [ #link("project-etc-nand.typ.desktop.html")[ etc-nand ]: #link("https://github.com/ETC-A/etca-spec/")[ ETC.A ] CPU from NAND gates ]),
|
||||
|
||||
(level:1, body: [ FPGA designs ]),
|
||||
(level:2, body: [ RMII MAC in #link("https://www.chisel-lang.org/")[ Chisel ] ]),
|
||||
)
|
||||
|
||||
#br()#br()
|
||||
This website is written almost entierly in typst.
|
||||
|
||||
]
|
84
pages/project-etc-nand.typ
Normal file
84
pages/project-etc-nand.typ
Normal file
@@ -0,0 +1,84 @@
|
||||
#import "../common.typ": *
|
||||
#import "../simple-page-layout.typ": *
|
||||
#import "../components/pcb-view.typ": *
|
||||
|
||||
#let pcb-size-percent = 80
|
||||
#let qpcb(file) = {
|
||||
let p = res-path()+"etc-nand/"+file
|
||||
pcb(p+"_front.png", p+"_back.png", size-percent: pcb-size-percent)
|
||||
}
|
||||
|
||||
#simple-page(
|
||||
gen-table-of-contents: true
|
||||
)[
|
||||
|
||||
|
||||
#section[
|
||||
#title[ etc-nand ]
|
||||
]
|
||||
|
||||
#if is-web {section[
|
||||
Note that the #min-pdf-link[PDF Version] of this page might look a bit better styling wise.
|
||||
|
||||
You can click the PCB images to switch to the other side.
|
||||
]}
|
||||
|
||||
#section[
|
||||
= Overview
|
||||
|
||||
etc-nand is a real-world #link("https://github.com/ETC-A/etca-spec/")[ ETC.A ] CPU built from almost only quad NAND gate ICs (74hc00)
|
||||
|
||||
It will probably be finished in a few months.
|
||||
]
|
||||
|
||||
#section[
|
||||
== Estimates
|
||||
|
||||
Estimated gate count:
|
||||
- 2800 NAND gates
|
||||
- 320 tristate buffers
|
||||
|
||||
#br()
|
||||
Estimated component counts:
|
||||
- 700x 74hc00 quad NAND gates
|
||||
- 40x 74HC54 octal tristate buffers
|
||||
- a few simple resistors
|
||||
]
|
||||
|
||||
#section[
|
||||
== Planned Specifications
|
||||
ETC.A base instruction set + byte operations + S&F + Von Neumann
|
||||
|
||||
The CPU will communicate with peripherals over a 16 bit data + 15 bit address memory bus
|
||||
]
|
||||
|
||||
#section[
|
||||
= Purchase
|
||||
You will be able to purchase one in the future.
|
||||
|
||||
Stay tuned!
|
||||
]
|
||||
|
||||
#section[
|
||||
= Images
|
||||
Images of PCBs that are either already manifactured or currently beeing manifactured by JLCPCB.
|
||||
]
|
||||
|
||||
#section[
|
||||
== 16 bit register
|
||||
#context qpcb("reg16")
|
||||
]
|
||||
|
||||
#section[
|
||||
== 8 bit ALU slice
|
||||
A #link(<add8>)[8 bit adder module] will be placed in the middle
|
||||
#context qpcb("alu8")
|
||||
]
|
||||
|
||||
#section[
|
||||
== 8 bit adder <add8>
|
||||
#context qpcb("add8")
|
||||
]
|
||||
|
||||
|
||||
]
|
Reference in New Issue
Block a user