Page 1 of 2

Wanted: A Find Subroutine

Posted: Sun 14 Oct, 2007 7:43 pm
by Demon
I am working on a sentence generator that may possibly become the engine behind a new version of the Nakamuru AI program. As of now it indexes a large of words according to what part of speech they are into lists. These lists become extremely large and the indexing process takes as long as seven minutes for a well-populated word database.

To avoid indexing all together and save a substantial amount of time and RAM, I need a program that can:
- Find the Xth occurrence of a string in another (which inString can't do unless you put it in a for loop and that takes a very long time to have to do this every single word).
- Return the number of occurrences of the string that was found in the "haystack."


How it would work:
X = -1: Return the number of occurrences of Ans in Str1.
X >= 0: Would return the location of the Xth occurrence of Ans in Str1.

If the Ans was not found in Str1, then zero would be returned.


Examples:

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
-1->X
"FIND ME
Asm(prgmFIND
Ans = 3

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
0->X
"FIND ME
Asm(prgmFIND
Ans = 10

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
1->X
"FIND ME
Asm(prgmFIND
Ans = 27

Posted: Mon 15 Oct, 2007 12:56 am
by KermMartian
If you look at inString, you'll see you can specify its start index. :P

Posted: Mon 15 Oct, 2007 4:25 am
by Demon
I know about that, but that would still be too slow.

The word database can be as long as several hundred words and using inString is slow when scanning a string that contains even just fifty average-sized words.

The generator will be looking for specific types of words when building a sentence. If I used inString, then I'd have to make the generator scan the word database for all instances of the type of word it's looking for, take note of where all the words of that type were found, and then choose whatever word fits. That could take probably up to ten seconds, and that's okay for just one word, but that's too slow for building a sentence in a reasonable amount of time. It'd still have to index the positions for words of whatever type the generator is looking for, and would be worse than pre-indexing.

And I want to completely avoid indexing, anyway.

Posted: Mon 15 Oct, 2007 6:45 am
by driesguldolf
The reason why TI takes so long is because some characters are 2 bytes big, and thus you can't simply 'calculate' the position of the nth character(you'd have to scan through the entire string)
Ofcourse when you only use UPPERCASE characters I could make one for you, otherwise I'd end up scanning everything anyway (unless you don't mind the possibility of finding a 'wrong' substring)

Posted: Mon 15 Oct, 2007 11:15 am
by Demon
There are only uppercase letters, numbers and the 'EE' token used as a delimiter, so that'd be fine.

Posted: Mon 15 Oct, 2007 11:44 pm
by driesguldolf
Done.

BIG FAT WARNING: Make a backup of everything before using this program or you risk losing everything because of a stupid mistake I made

EDIT: link updated: v0.9b
http://www.mediafire.com/?anheticyxtn

What it does:
  • X is negative: returns number of instances of Ans found in Str1
  • X is positive: returns the position of the xth instance of Ans found in Str1 (1 based, 0 means that Ans isn't a substring of Str1)
  • If you don't provide a string in Ans then I'm 99% sure that you will not like the outcome (unless someone knows how to find the type of Ans)
Because we all like screenshots:
Image
Find all instances

Image
Find nth position


Just to be safe I'll post the source: (fully TASM compatible)
EDIT: Scroll down for updated source (all links are updated)

Posted: Tue 16 Oct, 2007 12:15 am
by Demon
Thaaaaaaanks!!!


Oh, and is this kind of (or exactly if I'm really lucky) what you're talking about finding the type of 'Ans'?

(From Johan Rydh's "AnsType' subroutine):

Code: Select all

.nolist

	#include	"ti83plus.inc"

.list

.org	$9D93

.db	t2ByteTok, tAsmCmp





; AnsType



; Syntax: Asm(prgmVARTEST

; Return: Theta, type:

;	1 Real

;	2 List (Real)

;	3 Matrix

;	4 String

;	5 Complex number 

;	6 Complex list 

;	0 if error (Ans doesn't exist))





; Internal/OS

;	00    Real

;	01    List (Real)

;	02    Matrix

;	04    String

;	12 0C Complex number 

;	13 0D Complex list 





	B_CALL(_AnsName)

	RST	10h		; Find Ans

	JP	C,ERR		; If Ans does not exist



	AND	$1F		; Get only type

	CP	4

	JR	C,SET_INC	; Real, RList, Matrix



	DEC	A		; Str = 3



	CP	3

	JR	Z,SET_INC	; Str



	SUB	7		; cplx=5, clist=6



SET_INC:

	INC	A		; Add one to returning number

SET:	B_CALL(_SetXXOP1)	; OP1 = (float) A

END:	B_CALL(_StoTheta)	; Saves result into Real Theta

	RET			; Exit program



ERR:	XOR	A		; A=0

	JR	SET



.end

.end

Posted: Tue 16 Oct, 2007 6:56 am
by driesguldolf
It is always a pleasure to help people!

Hold on, I'm 99%sure there is a flaw, I'll fix it when I can (though you will probabely not notice the error :P)

Posted: Tue 16 Oct, 2007 7:52 am
by Timendus
driesguldolf wrote:

Code: Select all

;; ==== searchstring ====
;;
;; Searches for the first next occurence of the string pointed at by de (= child string)
;; in the string pointed at by hl (= parent string)
;;
;; Author
;;  Dries Guldolf (dguldolf@vub.ac.be)
;;
;; Pre
;;  de = points at child string
;;  hl = points at parent string
;;
;; Post
;;  c-flag = set if not found, reset otherwise
;;  hl = points at character after occurence (if found)
;;  Interrupts are disabled
;;  Consider af, bc, de, hl, af', bc', de' and hl' destroyed
;;
;; Warning
;;  Don't include the length prefix in the pointers
;;  This routine is designed to be able to continue searching where it left of last time
;;  This is also a bit specified for this situation
;;
Author tag, nice one. We hadn't thought of that yet :) Not that we should really need it for the Vera project, but for others it could certainly be useful.

Oh, and usually people refer to the needle and the haystack in these kind of search routines, not the parent and the child :)

Edit: Oh, one more thing: you should use colons after the tags. I added the colons and ran it through my parser as a test.

http://timendus.student.utwente.nl/~ver ... string.xml

Posted: Tue 16 Oct, 2007 1:11 pm
by driesguldolf
BIG FAT WARNING: Make a backup of all your data before using this utility!
  • Fixed an ugly bug: the calc crashes when X was to large
  • Proper error checking
  • Enhanced source code layout
  • If someone can tell me how to recognize a 2 byte token I can support that too
EDIT: link updated: v0.9b
http://www.mediafire.com/?anheticyxtn

Screenshots:
Image Image Image

Image Image

Readme:

Code: Select all

 >>>> README <<<<

Created by Dries Guldolf v0.9
Steal whatever you want, though I would appreciate it if you gave me credit
Keep in mind that I'm not responsible for any damage this program could cause to your calculator or anything else

Explanation:
 Does a binary search (for the string stored in Ans) on Str1
 This program WILL fail when the string contains 2 byte tokens! The offset returned assumes that every token is one byte, it should still return the correct amount of instances.

Errors:
 ERR:DATA TYPE - Occurs when Ans isn't a string.
 ERR:UNDEFINED - Occurs when X, Ans or Str1 wasn't found.
 ERR:INVALID - Occurs when Ans or Str1 was archived.

Behaviour:
 if X is negative, it returns the number if instances of Ans in Str1
 if X is positive, it returns the position of the Xth instance of Ans in Str1
 ! X is zero based
 ! position is one based (returns zero if Ans isn't a substring of Str1)

Examples:
 \\ -1->X:"ABABABA"->Str1:"ABA":Asm(prgmFIND \\
 Results in 3 because when a string is found it continues after the first char of the found string.

 \\ 1->X:"ABCABCDEFABC"->Str1:"ABC":Asm(prgmFIND \\
 Results in 4 (the 2nd instance of "ABC").

 \\ 3->X:"WHAT ARE YOU LOOKING FOR?"->Str1:"NOTHING":Asm(prgmFIND \\
 Results in 0, it's pretty obvious why this is so.

 \\ 0->X:"Disp aaa:XXX"->Str1:"XXX":Asm(prgmFIND \\
 Results in 9 because I said not to use 2 byte tokens.

 \\ -1->X:"Disp aaa:XXX"->Str1:"a":Asm(prgmFIND \\
 Should still result in 3
Source:
Look at page 2 (first post) for updated (v0.9b) source

Posted: Tue 16 Oct, 2007 8:48 pm
by Demon
I can't download it.
It either downloads a zero byte file or the browser waits for about five minutes and times out.

Posted: Tue 16 Oct, 2007 8:58 pm
by driesguldolf
hmmm... weird...

Do any of these work: If these don't work I can always email it

Posted: Tue 16 Oct, 2007 11:19 pm
by Demon
Got it.
Thanks.

Posted: Wed 17 Oct, 2007 9:55 am
by Timendus
Changed "Author" to "Authors", parsed it again:
http://timendus.student.utwente.nl/~ver ... string.xml

Posted: Wed 17 Oct, 2007 1:22 pm
by calc84maniac
driesguldolf wrote:
  • If someone can tell me how to recognize a 2 byte token I can support that too
He doesn't want two byte token support, because that's the major slowdown factor in the TI-OS routines.