feat: implement sb, sh, lb, lh support via masking
test: add simple cpu test
test: add basic testbenches