The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Type checking/inference, practical application?

Page 1 of 4Next

I'm building a programming language (https://tablam.org) with the hope to be truly practical for all-around programming (alike python) and not just to be part of an RDBMS. I land on the relational stuff because, as a user, I think is pretty simple to understand. Also, I was a FoxPro developer looking to replicate that user/developer experience in modern times.

Now I'm deciding if adding a type checker+inference to the lang for the first version. What for me is YES/NO is how hard could be and how practical.

The main trouble is that the demand to write the headings for complex JOINs/Queries become unpractical in certain cases so I think I need to also add type inference, so the lang computes the types for me in certain scenarios. A classic example is a CROSS JOIN between several tables. This is not as obvious to me, and not much info as to how actually implement it.

I stumble upon these promising resources:

The first claim to solve subtyping for records and the second have a foundation for the type inference with the lens of a relational model.

The lang I'm aiming for has (or will):

  • Scripting feels, to be used alike python, so you can do terminal apps, web apps, etc. It means loops, user-defined types, functions, etc.
  • 0-2 dimensional vectors (a scalar is a vector of 1 row/column. A empty value is a vector of 0/0 with the type attached). This is like an in-build APL/kdb+/pandas interface
  • BTrees/HashMaps
  • Sequences/streams (iterator/generator builders so can load data from external sources)
  • All the above are relations with all the relational operators (read-only) defined and potential CRUD operators if the source is read-write.
  • Algebraic types (so, no nulls)

Deciding if for polymorphism/reuse going to multi-methods or type-classes, not sure which model could be more friendly.


So, going to the point, for the people with more experience using this model:

  • Is a static type checker practical in use, even in the presence of complex queries?
  • Is type inference like in the paper above(or any other) already implemented somewhere?
  • Algebraic types can be added without any complications?
  • Which way to go for support abstractions: Generics, Type classes, multi-methods, other(?) that are known to combine well with the relational model?

P.D: I have read about TTM and a lot of related stuff - years ago!- but if I missing or overlooking things, I appreciate pointers. Hopefully, not just theoretical!

Quote from mamcx on July 3, 2021, 9:17 pm

I'm building a programming language (https://tablam.org) with the hope to be truly practical for all-around programming (alike python) and not just to be part of an RDBMS. I land on the relational stuff because, as a user, I think is pretty simple to understand. Also, I was a FoxPro developer looking to replicate that user/developer experience in modern times.

Now I'm deciding if adding a type checker+inference to the lang for the first version. What for me is YES/NO is how hard could be and how practical.

The main trouble is that the demand to write the headings for complex JOINs/Queries become unpractical in certain cases so I think I need to also add type inference, so the lang computes the types for me in certain scenarios. A classic example is a CROSS JOIN between several tables. This is not as obvious to me, and not much info as to how actually implement it.

I stumble upon these promising resources:

The first claim to solve subtyping for records and the second have a foundation for the type inference with the lens of a relational model.

The lang I'm aiming for has (or will):

  • Scripting feels, to be used alike python, so you can do terminal apps, web apps, etc. It means loops, user-defined types, functions, etc.
  • 0-2 dimensional vectors (a scalar is a vector of 1 row/column. A empty value is a vector of 0/0 with the type attached). This is like an in-build APL/kdb+/pandas interface
  • BTrees/HashMaps
  • Sequences/streams (iterator/generator builders so can load data from external sources)
  • All the above are relations with all the relational operators (read-only) defined and potential CRUD operators if the source is read-write.
  • Algebraic types (so, no nulls)

Deciding if for polymorphism/reuse going to multi-methods or type-classes, not sure which model could be more friendly.


So, going to the point, for the people with more experience using this model:

  • Is a static type checker practical in use, even in the presence of complex queries?

Yes. Tutorial D (implemented in Rel -- see https://reldb.org) is statically typed. There are mechanisms in the TTM Inheritance Model that Tutorial D resolves at runtime -- e.g., TREAT_AS_xxx(...) -- that could be complete static but aren't. Making them completely static would be straightforward.

  • Is type inference like in the paper above(or any other) already implemented somewhere?
  • Algebraic types can be added without any complications?
  • Which way to go for support abstractions: Generics, Type classes, multi-methods, other(?) that are known to combine well with the relational model?

Tutorial D supports multimethods, though there has been some debate over the years as to what degree it should or shouldn't do so.

P.D: I have read about TTM and a lot of related stuff - years ago!- but if I missing or overlooking things, I appreciate pointers. Hopefully, not just theoretical!

There are implementations. See the Projects section of http://thethirdmanifesto.com

Type systems and their relationship with the relational model have been a subject of considerable debate here over the years.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Thanks for the reply. More implemented projects than I remember, that's good!.

Type systems and their relationship with the relational model have been a subject of considerable debate here over the years.

Exist some level of consensus, or at least which ones are easier/practical to implement?

Quote from mamcx on July 3, 2021, 9:17 pm

I'm building a programming language (https://tablam.org) with the hope to be truly practical for all-around programming (alike python) and not just to be part of an RDBMS. I land on the relational stuff because, as a user, I think is pretty simple to understand. Also, I was a FoxPro developer looking to replicate that user/developer experience in modern times.

Now I'm deciding if adding a type checker+inference to the lang for the first version. What for me is YES/NO is how hard could be and how practical.

Between what you've said and my brief look at your website, it appears that your goals/efforts, and my goals/efforts (Muldis Data Language / Muldis Object Notation / etc) seem to have a lot in common.  Compare with https://github.com/muldis for reference.

In my opinion, for a FIRST version, I feel that it is considerably easier to implement a language whose types and type checks are implemented at runtime in the general case, with compile time checks done just for a subset that are relatively easy, with the assumption it is impossible to do things 100% at compile time without either severely limiting language functionality or else requiring orders of magnitude more time or intelligence to solve how to do it

For my efforts with Muldis Data Language etc, the single most fundamental feature of the language from which all the power comes is that the language is homoiconic, its native source code format is data in the language, for example structure types representing functions or types or user-defined values, and so on.  This means you can bootstrap a more complicated language over a simpler one, you can generate code without string manipulation, and any compile-time checks are just functions that analyze code-as-data like normal data analysis.

In Muldis Data Language, all types in the general case are represented to the language in terms of ordinary predicate functions that return true or false if their argument is a member of the type, so in that sense all types are algebraic, and subtyping or unions or whatever are supported.

 

Quote from mamcx on July 4, 2021, 2:41 am

Thanks for the reply. More implemented projects than I remember, that's good!.

Type systems and their relationship with the relational model have been a subject of considerable debate here over the years.

Exist some level of consensus, or at least which ones are easier/practical to implement?

TTM's type system and Inheritance Model (IM) is implementable. It's almost entirely (except for a few relatively minor details) implemented in Rel  and I use it to do real (what I call "desktop data crunching") work, the sort of things for which others typically use Python and/or MS Excel or MS Access. So it's certainly practical. Whether that type system is good, bad, indifferent, or abominable is a matter of debate.

Outside of mostly general agreement that SQL is flawed and TTM is at least a step in the right direction (else we wouldn't be here on the TTM forum), everything here is a matter of debate.

My general advice -- here and elsewhere -- with pet languages is to implement the language you would like to use. If anyone else ever wants to use it, that's a bonus, but not a necessity.

Otherwise, attempting to get consensus or design advice from other developers before implementation is likely to fall into the usual pit of seeking opinions from three developers and getting five different and incompatible views.

Implementing static typing with type inference is notionally straightforward and can be explained in an ad-hoc fashion in a few lines of explanation or pseudocode. The complexity is in the details, the desired features, and the acceptable performance.

Regarding the latter, I'm reminded of a colleague who devised a type system whose resolution was uncommonly rigorous and had some marvellous features, but notably computationally intensive. So his idea was that the compiler would quickly issue a runnable -- though not necessarily perfect in a type-safety sense -- executable that you could deploy, whilst it would crunch away at checking rigour in the background on a server dedicated to that purpose. Then hours, days, months, years later it could pop up (or email, I guess) a type safety error, e.g., "Error in line 34 of SplorkFizzPop.lang in project MyTestExample compiled on 03/04/2009 or 4475 days ago: x may not be type-compatible with y when x is a ComplexNumber." Etc.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from mamcx on July 3, 2021, 9:17 pm

I'm building a programming language (https://tablam.org) with the hope to be truly practical for all-around programming (alike python) and not just to be part of an RDBMS. I land on the relational stuff because, as a user, I think is pretty simple to understand. Also, I was a FoxPro developer looking to replicate that user/developer experience in modern times.

Why would you do that? Apart from the usual headbanging argument (the relief when you finally stop), do you have any real prospect of creating a language with more than zero users?

Zero is usual. One is good: that's you, at least it's good enough you use it yourself. Two is insane: someone uses it who is not you.

But again why? What specific language problem did you solve to finally get more than zero users? If you answer is 'to replicate the FoxPro experience' be warned: it's a big job. I should know: I've already done it, and it took years.

Now I'm deciding if adding a type checker+inference to the lang for the first version. What for me is YES/NO is how hard could be and how practical.

That's not the question. The very first question Q1 is: what kind of type system do you want? Python has dynamic types and objects. C++/Java/C# have a class-based OO type system. TTM describes something closer to functional programming. What will you choose?

The main trouble is that the demand to write the headings for complex JOINs/Queries become unpractical in certain cases so I think I need to also add type inference, so the lang computes the types for me in certain scenarios. A classic example is a CROSS JOIN between several tables. This is not as obvious to me, and not much info as to how actually implement it.

I stumble upon these promising resources:

The first claim to solve subtyping for records and the second have a foundation for the type inference with the lens of a relational model.

It's not that hard. Headings are just sets of attribute names, and heading inference across operators of the Relational Algebra is boringly simple. The complexity comes from the earlier decision regarding type system but even then, that really isn't the hard part.

The lang I'm aiming for has (or will):

  • Scripting feels, to be used alike python, so you can do terminal apps, web apps, etc. It means loops, user-defined types, functions, etc.

The scripting feel suggests a dynamic type system, so see Q1. The rest is easy.

  • 0-2 dimensional vectors (a scalar is a vector of 1 row/column. A empty value is a vector of 0/0 with the type attached). This is like an in-build APL/kdb+/pandas interface

Why? And why the empty value?

  • BTrees/HashMaps

Why? Isn't that just implementation?

  • Sequences/streams (iterator/generator builders so can load data from external sources)

Yes, but heaps of work to do.

  • All the above are relations with all the relational operators (read-only) defined and potential CRUD operators if the source is read-write.

No. Read TTM on relational assignment.

  • Algebraic types (so, no nulls)

Deciding if for polymorphism/reuse going to multi-methods or type-classes, not sure which model could be more friendly.


So, going to the point, for the people with more experience using this model:

  • Is a static type checker practical in use, even in the presence of complex queries?

In my opinion, I would use no other. If it compiles, it runs.

  • Is type inference like in the paper above(or any other) already implemented somewhere?
  • Algebraic types can be added without any complications?

Yes. TTM nearly has them.

  • Which way to go for support abstractions: Generics, Type classes, multi-methods, other(?) that are known to combine well with the relational model?

P.D: I have read about TTM and a lot of related stuff - years ago!- but if I missing or overlooking things, I appreciate pointers. Hopefully, not just theoretical!

Generics seem to be the way to go, type classes not so much. But multiple dispatch is kind of nice.

Andl - A New Database Language - andl.org
Quote from Darren Duncan on July 4, 2021, 7:10 am

In my opinion, for a FIRST version, I feel that it is considerably easier to implement a language whose types and type checks are implemented at runtime in the general case, with compile time checks done just for a subset that are relatively easy, with the assumption it is impossible to do things 100% at compile time without either severely limiting language functionality or else requiring orders of magnitude more time or intelligence to solve how to do it

Yeah, this is what I have done so far. But "compile time checks done just for a subset" is a nice goal, but figure WHICH is this subset is what is not as obvious to me. One major reason for doing this is actually to get nice error messages, and type checking already visits the code to get the info for them (be it a compile or runtime).

Quote from Dave Voorhis on July 4, 2021, 8:55 am

Otherwise, attempting to get consensus or design advice from other developers before implementation is likely to fall into the usual pit of seeking opinions from three developers and getting five different and incompatible views.

Of course. Also building something like this is in part from being stubborn and think you may have a merit ;) so I understand everyone has his way. But also I like to learn from the experience of others and see if certain directions have more potential than others.

Quote from dandl on July 4, 2021, 10:39 am

But again why? What specific language problem did you solve to finally get more than zero users? If you answer is 'to replicate the FoxPro experience' be warned: it's a big job. I should know: I've already done it, and it took years.

Yes, I understand that this kind of project is more for fun and their chances to get real traction are very very small. I'm already starting this years ago (with crude attempts) and only kind of seriously this past 2.

That's not the question. The very first question Q1 is: what kind of type system do you want? Python has dynamic types and objects. C++/Java/C# have a class-based OO type system. TTM describes something closer to functional programming. What will you choose?

I like static-type checking. Purely by the actual implementation as an interpreter, this looks like a dynamic type system, but I plan to do that checks in the "compiler" phase so at runtime I can elide a lot of branches and decisions. This is leveraging a stagging/futamura projection where the first pass take decision and the second pass compile to lambdas and do just computations. This is how is done now. So the act of type check become more natural here.

For the POW f user, the system is similar to Rust/Go, where you have plain structs, enums and using traits/generics/multi-methods finish it, so not OO and more functional + relational operators.

  • 0-2 dimensional vectors (a scalar is a vector of 1 row/column. A empty value is a vector of 0/0 with the type attached). This is like an in-build APL/kdb+/pandas interface
  • BTrees/HashMaps

Why? And why the empty value? / Why? Isn't that just implementation?

This is to make easier to see all of the above as rows/columns. A 1d/2d dimensional array map neatly to this, and let me do the 3 main containers with a single syntax/interface:

[col1:int, col2:int, col3; //header
 1, 2, 3; //row1
 1, 2, 3; //row2
]

//Same representation, vary internal storage by:
Vec[..]
Tree[..]
Map[..]
  • Sequences/streams (iterator/generator builders so can load data from external sources)

Yes, but heaps of work to do.

This was my first instinct too, so I dedicate a lot of effort to deal with this (mostly fighting Rust, because in other languages is far easier to implement) but I think I have this solved and can surface any iterator/generator from rust easily now. Including doing JOINs and combining them with the structures above. Not done any performance effort but the path to it is clear to me, only need to finish the rest of the lang.

  • All the above are relations with all the relational operators (read-only) defined and potential CRUD operators if the source is read-write.

No. Read TTM on relational assignment.

Maybe I don't understand the issue here? I think I need to do this because not all external sources could be easily updatable with CRUD operations (or is better to make a hand-coded implementation)

Generics seem to be the way to go, type classes not so much. But multiple dispatch is kind of nice.

Thanks!

Quote from mamcx on July 4, 2021, 6:31 pm
Quote from Darren Duncan on July 4, 2021, 7:10 am

In my opinion, for a FIRST version, I feel that it is considerably easier to implement a language whose types and type checks are implemented at runtime in the general case, with compile time checks done just for a subset that are relatively easy, with the assumption it is impossible to do things 100% at compile time without either severely limiting language functionality or else requiring orders of magnitude more time or intelligence to solve how to do it

Yeah, this is what I have done so far. But "compile time checks done just for a subset" is a nice goal, but figure WHICH is this subset is what is not as obvious to me. One major reason for doing this is actually to get nice error messages, and type checking already visits the code to get the info for them (be it a compile or runtime).

My general approach is incremental based on locality of reference.  So for example start with the errors one can detect at compile time considering only the source of an individual statement or routine definition.  With this you can catch things like if parameter or lexical variable references point to something that exists.  A level out is considering an entire package definition, such as whether references to types or routines from within the same package as the invoker are valid.  As these tend to be compilation units that level of check should be relatively easy to detect early at compile time.  A level out is considering the wider context of other packages, whether references and uses to things declared in them seem correct.

Quote from mamcx on July 4, 2021, 6:31 pm
Quote from Darren Duncan on July 4, 2021, 7:10 am

In my opinion, for a FIRST version, I feel that it is considerably easier to implement a language whose types and type checks are implemented at runtime in the general case, with compile time checks done just for a subset that are relatively easy, with the assumption it is impossible to do things 100% at compile time without either severely limiting language functionality or else requiring orders of magnitude more time or intelligence to solve how to do it

Yeah, this is what I have done so far. But "compile time checks done just for a subset" is a nice goal, but figure WHICH is this subset is what is not as obvious to me. One major reason for doing this is actually to get nice error messages, and type checking already visits the code to get the info for them (be it a compile or runtime).

I would disagree. Strong type checking is a deeply pervasive decision. I much prefer it for big durable programs (like application suites) and find it a nuisance for ad hoc scripts. Your choice, but I think you have to choose.

Quote from dandl on July 4, 2021, 10:39 am

But again why? What specific language problem did you solve to finally get more than zero users? If you answer is 'to replicate the FoxPro experience' be warned: it's a big job. I should know: I've already done it, and it took years.

Yes, I understand that this kind of project is more for fun and their chances to get real traction are very very small. I'm already starting this years ago (with crude attempts) and only kind of seriously this past 2.

That's not the question. The very first question Q1 is: what kind of type system do you want? Python has dynamic types and objects. C++/Java/C# have a class-based OO type system. TTM describes something closer to functional programming. What will you choose?

I like static-type checking. Purely by the actual implementation as an interpreter, this looks like a dynamic type system, but I plan to do that checks in the "compiler" phase so at runtime I can elide a lot of branches and decisions. This is leveraging a stagging/futamura projection where the first pass take decision and the second pass compile to lambdas and do just computations. This is how is done now. So the act of type check become more natural here.

For the POW f user, the system is similar to Rust/Go, where you have plain structs, enums and using traits/generics/multi-methods finish it, so not OO and more functional + relational operators.

That works, but those languages are quite low level. You really need excellent support for business data types. My list is: bool, int, float, decimal, time, text, blob, enum (string), record and collections thereof. Records/tuples are a serious problem if you want to go the TTM route.

In Powerflex (the language product my company built) the runtime was at least 10x the compiler, but we had to write all our own. Whose language libraries do you propose to use?

  • 0-2 dimensional vectors (a scalar is a vector of 1 row/column. A empty value is a vector of 0/0 with the type attached). This is like an in-build APL/kdb+/pandas interface
  • BTrees/HashMaps

Why? And why the empty value? / Why? Isn't that just implementation?

This is to make easier to see all of the above as rows/columns. A 1d/2d dimensional array map neatly to this, and let me do the 3 main containers with a single syntax/interface:

I'll pass on this for now. I really don't get it.

  • All the above are relations with all the relational operators (read-only) defined and potential CRUD operators if the source is read-write.

No. Read TTM on relational assignment.

Maybe I don't understand the issue here? I think I need to do this because not all external sources could be easily updatable with CRUD operations (or is better to make a hand-coded implementation)

  • The basic (Codd) Relational Algebra defines 6 operators: select, project, join, rename, union, not.
  • The Extended RA adds 3: extend, aggregate, while.
  • Update is just assign: replace the old value of some variable by a new value.

That's all there is. It does everything in SQL.

 

Andl - A New Database Language - andl.org
Quote from dandl on July 5, 2021, 5:31 am
Quote from mamcx on July 4, 2021, 6:31 pm
Quote from Darren Duncan on July 4, 2021, 7:10 am

In my opinion, for a FIRST version, I feel that it is considerably easier to implement a language whose types and type checks are implemented at runtime in the general case, with compile time checks done just for a subset that are relatively easy, with the assumption it is impossible to do things 100% at compile time without either severely limiting language functionality or else requiring orders of magnitude more time or intelligence to solve how to do it

Yeah, this is what I have done so far. But "compile time checks done just for a subset" is a nice goal, but figure WHICH is this subset is what is not as obvious to me. One major reason for doing this is actually to get nice error messages, and type checking already visits the code to get the info for them (be it a compile or runtime).

I would disagree. Strong type checking is a deeply pervasive decision. I much prefer it for big durable programs (like application suites) and find it a nuisance for ad hoc scripts. Your choice, but I think you have to choose.

David, you seem to be talking about a different thing than I and mancx were.

We were talking about when type checking happens, compile time vs runtime.  You're talking about strong type checking which is orthogonal to this and completely different.

Strong type checking says for example an attempt to use a text where a number is expected would fail, while weak type checking would not fail and instead try to extract a number from the text.

Whereas, I think mancx and I both agreed, or at least I think, we want to have the strong typing regardless, and compile versus runtime checking is just a matter of when the failure is seen, rather than whether the failure happens.

Page 1 of 4Next