Wanted: A Find Subroutine

A General Discussion forum for TI calculators

Moderator: MaxCoderz Staff

User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Wanted: A Find Subroutine

Post by Demon »

I am working on a sentence generator that may possibly become the engine behind a new version of the Nakamuru AI program. As of now it indexes a large of words according to what part of speech they are into lists. These lists become extremely large and the indexing process takes as long as seven minutes for a well-populated word database.

To avoid indexing all together and save a substantial amount of time and RAM, I need a program that can:
- Find the Xth occurrence of a string in another (which inString can't do unless you put it in a for loop and that takes a very long time to have to do this every single word).
- Return the number of occurrences of the string that was found in the "haystack."


How it would work:
X = -1: Return the number of occurrences of Ans in Str1.
X >= 0: Would return the location of the Xth occurrence of Ans in Str1.

If the Ans was not found in Str1, then zero would be returned.


Examples:

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
-1->X
"FIND ME
Asm(prgmFIND
Ans = 3

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
0->X
"FIND ME
Asm(prgmFIND
Ans = 10

Code: Select all

"WILL YOU FIND ME? CAN YOU FIND ME? WHEN WILL YOU FIND ME?->Str1
1->X
"FIND ME
Asm(prgmFIND
Ans = 27
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
KermMartian
Calc Wizard
Posts: 549
Joined: Tue 05 Jul, 2005 11:28 pm
Contact:

Post by KermMartian »

If you look at inString, you'll see you can specify its start index. :P
Image Image Image
User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Post by Demon »

I know about that, but that would still be too slow.

The word database can be as long as several hundred words and using inString is slow when scanning a string that contains even just fifty average-sized words.

The generator will be looking for specific types of words when building a sentence. If I used inString, then I'd have to make the generator scan the word database for all instances of the type of word it's looking for, take note of where all the words of that type were found, and then choose whatever word fits. That could take probably up to ten seconds, and that's okay for just one word, but that's too slow for building a sentence in a reasonable amount of time. It'd still have to index the positions for words of whatever type the generator is looking for, and would be worse than pre-indexing.

And I want to completely avoid indexing, anyway.
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

The reason why TI takes so long is because some characters are 2 bytes big, and thus you can't simply 'calculate' the position of the nth character(you'd have to scan through the entire string)
Ofcourse when you only use UPPERCASE characters I could make one for you, otherwise I'd end up scanning everything anyway (unless you don't mind the possibility of finding a 'wrong' substring)
User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Post by Demon »

There are only uppercase letters, numbers and the 'EE' token used as a delimiter, so that'd be fine.
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

Done.

BIG FAT WARNING: Make a backup of everything before using this program or you risk losing everything because of a stupid mistake I made

EDIT: link updated: v0.9b
http://www.mediafire.com/?anheticyxtn

What it does:
  • X is negative: returns number of instances of Ans found in Str1
  • X is positive: returns the position of the xth instance of Ans found in Str1 (1 based, 0 means that Ans isn't a substring of Str1)
  • If you don't provide a string in Ans then I'm 99% sure that you will not like the outcome (unless someone knows how to find the type of Ans)
Because we all like screenshots:
Image
Find all instances

Image
Find nth position


Just to be safe I'll post the source: (fully TASM compatible)
EDIT: Scroll down for updated source (all links are updated)
Last edited by driesguldolf on Wed 17 Oct, 2007 4:20 pm, edited 3 times in total.
User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Post by Demon »

Thaaaaaaanks!!!


Oh, and is this kind of (or exactly if I'm really lucky) what you're talking about finding the type of 'Ans'?

(From Johan Rydh's "AnsType' subroutine):

Code: Select all

.nolist

	#include	"ti83plus.inc"

.list

.org	$9D93

.db	t2ByteTok, tAsmCmp





; AnsType



; Syntax: Asm(prgmVARTEST

; Return: Theta, type:

;	1 Real

;	2 List (Real)

;	3 Matrix

;	4 String

;	5 Complex number 

;	6 Complex list 

;	0 if error (Ans doesn't exist))





; Internal/OS

;	00    Real

;	01    List (Real)

;	02    Matrix

;	04    String

;	12 0C Complex number 

;	13 0D Complex list 





	B_CALL(_AnsName)

	RST	10h		; Find Ans

	JP	C,ERR		; If Ans does not exist



	AND	$1F		; Get only type

	CP	4

	JR	C,SET_INC	; Real, RList, Matrix



	DEC	A		; Str = 3



	CP	3

	JR	Z,SET_INC	; Str



	SUB	7		; cplx=5, clist=6



SET_INC:

	INC	A		; Add one to returning number

SET:	B_CALL(_SetXXOP1)	; OP1 = (float) A

END:	B_CALL(_StoTheta)	; Saves result into Real Theta

	RET			; Exit program



ERR:	XOR	A		; A=0

	JR	SET



.end

.end
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

It is always a pleasure to help people!

Hold on, I'm 99%sure there is a flaw, I'll fix it when I can (though you will probabely not notice the error :P)
Last edited by driesguldolf on Tue 16 Oct, 2007 2:18 pm, edited 1 time in total.
User avatar
Timendus
Calc King
Posts: 1729
Joined: Sun 23 Jan, 2005 12:37 am
Location: Netherlands
Contact:

Post by Timendus »

driesguldolf wrote:

Code: Select all

;; ==== searchstring ====
;;
;; Searches for the first next occurence of the string pointed at by de (= child string)
;; in the string pointed at by hl (= parent string)
;;
;; Author
;;  Dries Guldolf (dguldolf@vub.ac.be)
;;
;; Pre
;;  de = points at child string
;;  hl = points at parent string
;;
;; Post
;;  c-flag = set if not found, reset otherwise
;;  hl = points at character after occurence (if found)
;;  Interrupts are disabled
;;  Consider af, bc, de, hl, af', bc', de' and hl' destroyed
;;
;; Warning
;;  Don't include the length prefix in the pointers
;;  This routine is designed to be able to continue searching where it left of last time
;;  This is also a bit specified for this situation
;;
Author tag, nice one. We hadn't thought of that yet :) Not that we should really need it for the Vera project, but for others it could certainly be useful.

Oh, and usually people refer to the needle and the haystack in these kind of search routines, not the parent and the child :)

Edit: Oh, one more thing: you should use colons after the tags. I added the colons and ran it through my parser as a test.

http://timendus.student.utwente.nl/~ver ... string.xml
http://clap.timendus.com/ - The Calculator Link Alternative Protocol
http://api.timendus.com/ - Make your life easier, leave the coding to the API
http://vera.timendus.com/ - The calc lover's OS
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

BIG FAT WARNING: Make a backup of all your data before using this utility!
  • Fixed an ugly bug: the calc crashes when X was to large
  • Proper error checking
  • Enhanced source code layout
  • If someone can tell me how to recognize a 2 byte token I can support that too
EDIT: link updated: v0.9b
http://www.mediafire.com/?anheticyxtn

Screenshots:
Image Image Image

Image Image

Readme:

Code: Select all

 >>>> README <<<<

Created by Dries Guldolf v0.9
Steal whatever you want, though I would appreciate it if you gave me credit
Keep in mind that I'm not responsible for any damage this program could cause to your calculator or anything else

Explanation:
 Does a binary search (for the string stored in Ans) on Str1
 This program WILL fail when the string contains 2 byte tokens! The offset returned assumes that every token is one byte, it should still return the correct amount of instances.

Errors:
 ERR:DATA TYPE - Occurs when Ans isn't a string.
 ERR:UNDEFINED - Occurs when X, Ans or Str1 wasn't found.
 ERR:INVALID - Occurs when Ans or Str1 was archived.

Behaviour:
 if X is negative, it returns the number if instances of Ans in Str1
 if X is positive, it returns the position of the Xth instance of Ans in Str1
 ! X is zero based
 ! position is one based (returns zero if Ans isn't a substring of Str1)

Examples:
 \\ -1->X:"ABABABA"->Str1:"ABA":Asm(prgmFIND \\
 Results in 3 because when a string is found it continues after the first char of the found string.

 \\ 1->X:"ABCABCDEFABC"->Str1:"ABC":Asm(prgmFIND \\
 Results in 4 (the 2nd instance of "ABC").

 \\ 3->X:"WHAT ARE YOU LOOKING FOR?"->Str1:"NOTHING":Asm(prgmFIND \\
 Results in 0, it's pretty obvious why this is so.

 \\ 0->X:"Disp aaa:XXX"->Str1:"XXX":Asm(prgmFIND \\
 Results in 9 because I said not to use 2 byte tokens.

 \\ -1->X:"Disp aaa:XXX"->Str1:"a":Asm(prgmFIND \\
 Should still result in 3
Source:
Look at page 2 (first post) for updated (v0.9b) source
Last edited by driesguldolf on Wed 17 Oct, 2007 4:21 pm, edited 2 times in total.
User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Post by Demon »

I can't download it.
It either downloads a zero byte file or the browser waits for about five minutes and times out.
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

hmmm... weird...

Do any of these work: If these don't work I can always email it
User avatar
Demon
Regular Member
Posts: 85
Joined: Wed 31 Jan, 2007 12:11 am
Location: (806), Texas
Contact:

Post by Demon »

Got it.
Thanks.
"Python has dynamic typing, and dynamic binding, which means that not only does it make a great secretary, it is also pretty damn kinky." --Henry the Adequate

<My Artwork>
User avatar
Timendus
Calc King
Posts: 1729
Joined: Sun 23 Jan, 2005 12:37 am
Location: Netherlands
Contact:

Post by Timendus »

Changed "Author" to "Authors", parsed it again:
http://timendus.student.utwente.nl/~ver ... string.xml
http://clap.timendus.com/ - The Calculator Link Alternative Protocol
http://api.timendus.com/ - Make your life easier, leave the coding to the API
http://vera.timendus.com/ - The calc lover's OS
User avatar
calc84maniac
Regular Member
Posts: 112
Joined: Wed 18 Oct, 2006 7:34 pm
Location: The ex-planet Pluto
Contact:

Post by calc84maniac »

driesguldolf wrote:
  • If someone can tell me how to recognize a 2 byte token I can support that too
He doesn't want two byte token support, because that's the major slowdown factor in the TI-OS routines.
~calc84maniac has spoken.

Projects:
F-Zero 83+
Project M (Super Mario for 83+)
Post Reply