Xuefu Zhang brought up this point:
Reading the document, I found one thing that seems to be debatable:
1. Creating a function w/o database name means "in the current database of the session".
2. Creating a temp function 2/o database name means global in the system as built-in functions.
I understand the consideration of backward compatibility, but the discrepancy can confuse the user a great deal. Why cannot we change #1 in the same way for temp functions?
1'. Creating a function w/o database name means global in the system as built-in functions.
It would make sense for there to be some consistency with the default database name for temporary/permanent functions. I've also been playing with qualifying built-in functions with schema names and I don't really like the effect on "describe functions":
hive> show functions;
Many of these functions will not work if they are called as they are labelled, such as any of the operators (sys.+ wouldn't work), or any keywords that are implemented as functions. So I'm wondering if the built-in functions can be in the registry as non-qualified function names. However, I think that if we do have permanent functions, that they should be qualified. But we also want consistency in default db name between temp and permanent functions.
So how about this behavior:
- Built-in functions are not qualified
- User-defined functions (temp/permanent) that are created without a database name will be created using the database name "default".
- Function resolution of non-qualified functions will be in the following order:
1. Lookup using non-qualified name, which will catch built-ins
2. Lookup by qualifying function name with "default". This will catch non-qualified user-defined functions
3. Lookup by qualifying function name with user's current database.
I suppose a future enhancement could be to allow users to specify a custom set of db names when resolving function names, but I think the above would be a suitable default. Does this approach make sense?