Monday, March 13, 2006

Somebody woke up on the wrong side of the bed, fortunately it wasn't me this time :)

This is a nice little rant about the limitations of OO-as-a-religion. I think you could substitute any of the following for OO and the core of the rant would remain quite valid:

  • procedural programming/design
  • modular programming/design
  • SOA/SO/service-orientation
  • Java
  • .NET

Every concept computer science has developed over the past many decades came into being to solve a problem. And for the most part each concept did address some or all of that problem. Procedures provided reuse. Modules provided encapsulation. Client/server provided scalability. And so on...

The thing is, that very few of these new concepts actually obsolete any previous concepts. For instance, OO doesn't elminate the value of procedural reuse. In fact, using the two in concert is typically the best of both worlds.

Similarly, SO is a case where a couple ideas ran into each other while riding bicycle. "You got messaging in my client/server!" "No! You got client/server in my messaging!" "Hey! This tastes pretty good!" SO is good stuff, but doesn't replace client/server, messaging, OO, procedural design or any of the previous concepts. It merely provides an interesting lense through which we can view these pre-existing concepts, and perhaps some new ways to apply them.

Anyway, I enjoyed the rant, even though I remain a staunch believer in the power of OO.

Monday, March 13, 2006 11:44:39 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 

 Wednesday, March 01, 2006

This recent MSDN article talks about SPOIL: Stored Procedure Object Interface Layer.

This is an interesting, and generally good idea as I see it. Unfortunately this team, like most of Microsoft, apparently just doesn't understand the concept of data hiding in OO. SPOIL allows you to use your object's properties as data elements for a stored procedure call, which is great as long as you only have public read/write properties. But data hiding requires that you will have some private fields that simply aren't exposed as public read/write properties. If SPOIL supported using fields as data elements for a stored procedure call it would be totally awesome!

The same is true for LINQ. It works against public read/write properties, which means it is totally useless if you want to use it to load "real" objects that employ basic concepts like encapsulation and data hiding. Oh sure, you can use LINQ (well, dlinq really) to load a DTO (data transfer object - an object with only public read/write properties and no business logic) and then copy the data from the DTO into your real object. Or you could try to use the DTO as the "data container" inside your real object rather than using private fields. But frankly those options introduce complexity that should be simply unnecessary...

While it is true that loading private fields requires reflection - Microsoft could solve this. They do own the CLR after all... It is surely within their power to provide a truly good solution to the problem, that supports data mapping and also allows for key OO concepts like encapsulation and data hiding.

Wednesday, March 01, 2006 10:00:01 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [10]  | 

 Friday, August 05, 2005

Ted Neward believes that “distributed objects” are, and always have been, a bad idea, and John Cavnar-Johnson tends to agree with him.

 

I also agree. "Distributed objects" are bad.

 

Shocked? You shouldn’t be. The term “distributed objects” is most commonly used to refer to one particular type of n-tier implementation: the thin client model.

 

I discussed this model in a previous post, and you’ll note that I didn’t paint it in an overly favorable light. That’s because the model is a very poor one.

 

The idea of building a true object-oriented model on a server, where the objects never leave that server is absurd. The Presentation layer still needs all the data so it can be shown to the user and so the user can interact with it in some manner. This means that the “objects” in the middle must convert themselves into raw data for use by the Presentation layer.

 

And of course the Presentation layer needs to do something with the data. The ideal is that the Presentation layer has no logic at all, that it is just a pass-through between the user and the business objects. But the reality is that the Presentation layer ends up with some logic as well – if only to give the user a half-way decent experience. So the Presentation layer often needs to convert the raw data into some useful data structures or objects.

 

The end result with “distributed objects” is that there’s typically duplicated business logic (at least validation) between the Presentation and Business layers. The Presentation layer is also unnecessarily complicated by the need to put the data into some useful structure.

 

And the Business layer is complicated as well. Think about it. Your typical OO model includes a set of objects designed using OOD sitting on top of an ORM (object-relational mapping) layer. I typically call this the Data Access layer. That Data Access layer then interacts with the real Data layer.

 

But in a “distributed object” model, there’s the need to convert the objects’ data back into raw data – often quasi-relational or hierarchical – so it can be transferred efficiently to the Presentation layer. This is really a whole new logical layer very akin to the ORM layer, except that it maps between the Presentation layer’s data structures and the objects rather than between the Data layer’s structures and the objects.

 

What a mess!

 

Ted is absolutely right when he suggests that “distributed objects” should be discarded. If you are really stuck on having your business logic “centralized” on a server then service-orientation is a better approach. Using formalized message-based communication between the client application and your service-oriented (hence procedural, not object-oriented) server application is a better answer.

 

Note that the terminology changed radically! Now you are no longer building one application, but rather you are building at least two applications that happen to interact via messages. Your server doesn't pretend to be object-oriented, but rather is service-oriented - which is a code phrase for procedural programming. This is a totally different mindset from “distributed objects”, but it is far better.

 

Of course another model is to use mobile objects or mobile agents. This is the model promoted in my Business Objects books and enabled by CSLA .NET. In the mobile object model your Business layer exists on both the client machine (or web server) and application server. The objects physically move between the two machines – running on the client when user interaction is required and running on the application server to interact with the database.

 

The mobile object model allows you to continue to build a single application (rather than 2+ applications with SO), but overcomes the nasty limitations of the “distributed object” model.

Friday, August 05, 2005 9:12:29 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [2]  | 

 Tuesday, July 26, 2005
Here's an article that appears to provide a really good overview of the whole mobile agent/object concept. I've only skimmed through it, but the author appears to have a good grasp on the concepts and portrays them with some good diagrams.
Monday, July 25, 2005 11:16:49 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [7]  | 

 Sunday, July 24, 2005

Mike has requested my thoughts on 3-tier and the web – a topic I avoided in my previous couple entries (1 and 2) because I don’t find it as interesting as a smart/intelligent client model. But he’s right, the web is widely used and a lot of poor people are stuck building business software in that environment, so here’s the extension of the previous couple entries into the web environment.

 

In my view the web server is an application server, pure and simple. Also, in the web world it is impractical to run the Business layer on the client because the client is a very limited environment. This is largely why the web is far less interesting.

 

To discuss things in the web world I break the “Presentation” layer into two parts – Presentation and UI. This new Presentation layer is purely responsible for display and input to/from the user – it is the stuff that runs on the browser terminal. The UI layer is responsible for all actual user interaction – navigation, etc. It is the stuff that runs on the web server: your aspx pages in .NET.

 

In web applications most people consciously (or unconsciously) duplicate most validation code into the Presentation layer so they can get it to run in the browser. Thus is expensive to create/maintain, but is an unfortunate evil required to have a half-way decent user experience in the web environment. You must still have that logic in your actual Business layer of course, because you can never trust the browser - it is too easily bypassed (Greasemonkey anyone?). This is just the way it is on the web, and will be until we get browsers that can run complete code solutions in .NET and/or Java (that's sarcasm btw).

 

On the server side, the web server IS an application server. It fills the exact same role of the mainframe or minicomputer over the past 30 years of computing. For "interactive" applications, it is preferable to run the UI layer, Business layer and Data Access layer all on the web server. This is the simplest (and thus cheapest) model, and provides the best performance[1]. It can also provide very good scalability because it is relatively trivial to create a web farm to scale out to many servers. By creating a web farm you also get very good fault tolerance at a low price-point. Using ISA as a reverse proxy above the web farm you can get good security.

 

In many organizations the reverse proxy idea isn’t acceptable (not being a security expert I can’t say why…) and so they have a policy saying that the web server is never allowed to interact directly with the database server – thus forcing the existence of an application server that at a minimum runs the Data Access layer. Typically this application server is behind a second firewall. While this security approach hurts performance (often by as much as 50%), it is relatively easily achieved with CSLA .NET or similar architecture/frameworks.

 

In other situations people prefer to put the Business layer and Data Access layer on the application server behind the second firewall. This means that the web server only runs the UI layer. Any business processing, validation, etc. must be deferred across the network to the application server. This has a much higher impact on performance (in a bad way).

 

However, this latter approach can have a positive scalability impact in certain applications. Specifically applications where there’s not much interactive content, but instead there’s a lot of read-only content. Most read-only content (by definition) has no business logic and can often be served directly from the UI layer. In such applications the IO load for the read-only content can be quite enough to keep the web server very busy. By offloading all business processing to an application server overall scalability may be improved.

 

Of course this only really works if the interactive (OLTP) portions of the application are quite limited in comparison to the read-only portions.

 

Also note that this latter approach suffers from the same drawbacks as the thin client model discussed in my previous post. The most notable problem is that you must come up with a way to do non-chatty communication between the UI layer and the Business layer, without compromising either layer. This is historically very difficult to pull off. What usually happens is that the “business objects” in the Business layer require code to externalize their state (data) into a tabular format such as a DataSet so the UI layer can easily use the data. Of course externalizing object state breaks encapsulation unless it is done with great care, so this is an area requiring extra attention. The typical end result are not objects in a real OO sense, but rather are “objects” comprised of a set of atomic, stateless methods. At this point you don’t have objects at all – you have an API.

 

In the case of CSLA .NET, I apply the mobile object model to this environment. I personally believe it makes things better since it gives you the flexibility to run some of your business logic on the web application server and some on the pure application server as appropriate. Since the Business layer is installed on both the web and application servers, your objects can run in either place as needed.

 

In short, to make a good web app it is almost required that you must compromise the integrity of your layers and duplication some business logic into the Presentation layer. It sucks, but its life in the wild world of the web. If you can put your UI, Business and Data Access layers on the web application server that’s best. If you can’t (typically due to security) then move only the Data Access layer and keep both UI and Business layers on the web application server. Finally, if you must put the Business layer on a separate application server I prefer to use a mobile object model for flexibility, but recognize that a pure API model on the application server will scale higher and is often required for applications with truly large numbers of concurrent users (like 2000+).

 

 

[1] As someone in a previous post indirectly noted, there’s a relationship between performance and scalability. Performance is the response time of the system for a user. Scalability is what happens to performance as the number of users and/or transactions is increased.

Sunday, July 24, 2005 8:47:26 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [7]  | 

 Saturday, July 23, 2005

In my last post I talked about logical layers as compared to physical tiers. It may be the case that that post (and this one) are too obvious or basic. But I gotta say that I consistently am asked about these topics at conferences, user groups and via email. The reality is that none of this is all that obvious or clear to the vast majority of people in our industry. Even for those that truly grok the ideas, there’s far from universal agreement on how an application should be layered or how those layers should be deployed onto tiers.

 

In one comment on my previous post Magnus points out that my portrayal of the application server merely as a place for the Data Access layer flies in the face of Magnus’ understanding of n-tier models. Rather, Magnus (and many other people) is used to putting both the Business and Data Access layers on the application server, with only the Presentation layer on the client workstation (or presumably the web server in the case of a web application, though the comment didn’t make that clear).

 

The reality is that there are three primary models to consider in the smart/rich/intelligent client space. There are loose analogies in the web world as well, but personally I don’t find that nearly as interesting, so I’m going to focus on the intelligent client scenarios here.

 

Also one quick reminder – tiers should be avoided. This whole post assumes you’ve justified using a 3-tier model to obtain scalability or security as per my previous post. If you don’t need 3-tiers, don’t use them – and then you can safely ignore this whole entry :)

 

There’s the thick client model, where the Presentation and Business layers are on the client, and the Data Access is on the application server. Then there’s the thin client model where only the Presentation layer is on the client, with the Business and Data Access layers on the application server. Finally there’s the mobile object model, where the Presentation and Business layers are on the client, and the Business and Data Access layers are on the application server. (Yes, the Business layer is in two places) This last model is the one I discuss in my Expert VB.NET and C# Business Objects books and which is supported by my CSLA .NET framework.

 

The benefit to the thick client model is that we are able to provide the user with a highly interactive and responsive user experience, and we are able to fully exploit the resources of the client workstation (specifically memory and CPU). At the same time, having the Data Access layer on the application server gives us database connection pooling. This is a very high scaling solution, with a comparatively low cost, because we are able to exploit the strengths of the client, application and database servers very effectively. Moreover, the user experience is very good and development costs are relatively low (because we can use the highly productive Windows Forms technology).

 

The drawback to the thick client model is a loss of flexibility – specifically when it comes to process-oriented tasks. Most applications have large segments of OLTP (online transaction processing) functionality where a highly responsive and interactive user experience is of great value. However, most applications also have some important segments of process-oriented tasks that don’t require any user interaction. In most cases these tasks are best performed on the application server, or perhaps even directly in the database server itself. This is because process-oriented tasks tend to be very data intensive and non-interactive, so the closer we can do them to the database the better. In a thick client model there’s no natural home for process-oriented code near the database – the Business layer is way out on the client after all…

 

Another perceived drawback to the thick client is deployment. I dismiss this however, given .NET’s no-touch deployment options today, and ClickOnce coming in 2005. Additionally, any intelligent client application requires deployment of our code – the Presentation layer at least. Once you solve deployment of one layer you can deploy other layers as easily, so this whole deployment thing is a non-issue in my mind.

 

In short, the thick client model is really nice for interactive applications, but quite poor for process-oriented applications.

 

The benefit to the thin client model is that we have greater control over the environment into which the Business and Data Access layers are deployed. We can deploy them onto large servers, multiple servers, across disparate geographic locations, etc. Another benefit to this model is that it has a natural home for process-oriented code, since the Business layer is already on the application server and thus is close to the database.

 

Unfortunately history has shown that the thin client model is severely disadvantaged compared to the other two models. The first disadvantage is scalability in relationship to cost.  With either of the other two models as you add more users you intrinsically add more memory and CPU to your overall system, because you are leveraging the power of the client workstation. With a thin client model all the processing is on the servers, and so client workstations add virtually no value at all – their memory and CPU is wasted. Any scalability comes from adding larger or more numerous server hardware rather than by adding cheaper (and already present) client workstations.

 

The other key drawback to the thin client model is the user experience. Unless you are willing to make “chatty” calls from the thin Presentation layer to the Business layer across the network on a continual basis (which is obviously absurd), the user experience will not be interactive or responsive. By definition the Business layer is on a remote server, so the user’s input can’t be validated or processed without first sending it across the network. The end result is roughly equivalent to the mainframe user experiences users had with 3270 terminals, or the experience they get on the web in many cases. Really not what we should expect from an “intelligent” client…

 

Of course deployment remains a potential concern in this model, because the Presentation layer must still be deployed to the client. Again, I dismiss this as a main issue any longer due to no-touch deployment and ClickOnce.

 

In summary, the thin client model is really nice for process-oriented (non-interactive) applications, but is quite inferior for interactive applications.

 

This brings us to the mobile object model. You’ll note that neither the thick client nor thin client model is optimal, because almost all applications have some interactive and some non-interactive (process-oriented) functionality. Neither of the two “purist” models really addresses both requirements effectively. This is why I am such a fan of the mobile object (or mobile agent, or distributed OO) model, as it provides a compromise solution. I find this idea so compelling that it is the basis for my books.

 

The mobile object model literally has us deploy the Business layer to both the client and application server. Given no-touch deployment and/or ClickOnce this is quite practical to achieve in.NET (and in Java interestingly enough). Coupled with .NET’s ability to pass objects across the network by value (another ability shared with Java), all the heavy lifting to make this concept work is actually handled by .NET itself, leaving us to merely enjoy the benefits.

 

The end result is that the client has the Presentation and Business layers, meaning we get all the benefits of the thick client model. The user experience is responsive and highly interactive. Also we are able to exploit the power of the client workstation, offering optimal scalability at a low cost point.

 

But where this gets really exciting is the flexibility offered. Since the Business layer also runs on the application server, we have all the benefits of the thin client model. Any process-oriented tasks can be performed by objects running on the application server, meaning that all the power of the thin client model is at our disposal as well.

 

The drawback to the mobile object approach is complexity. Unless you have a framework to handle the details of moving an object to the client and application server as needed this model can be hard to implement. However, given a framework that supports the concept the mobile object approach is no more complex than either the thick or thin client models.

 

In summary, the mobile object model is great for both interactive and non-interactive applications. I consider it a “best of both worlds” model and CSLA .NET is specifically designed to make this model comparatively easy to implement in a business application.

 

At the risk of being a bit esoteric, consider the broader possibilities of a mobile object environment. Really a client application or an application server (Enterprise Services or IIS) are merely hosts for our objects. Hosts provide resources that our objects can use. The client “host” provides access to the user resource, while a typical application server “host” provides access to the database resource. In some applications you can easily envision other hosts such as a batch processing server that provides access to a high powered CPU resource or a large memory resource.

 

Given a true mobile object environment, objects would be free to move to a host that offers the resources an object requires at any point in time. This is very akin to grid computing. In the mobile object world objects maintain both data and behavior and merely move to physical locations in order to access resources. Raw data never moves across the network (except between the Data Access and Data Storage layers), because data without context (behavior) is meaningless.

 

Of course some very large systems have been built following both the thick client and thin client models. It would be foolish to say that either is fatally flawed. But it is my opinion that neither is optimal, and that a mobile object approach is the way to go.

Saturday, July 23, 2005 11:12:25 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [6]  | 

 Thursday, July 21, 2005

I am often asked whether n-tier (where n>=3) is always the best way to go when building software.

 

Of course the answer is no. In fact, it is more likely that n-tier is not the way to go!

 

By the way, this isn’t the first time I’ve discussed this topic – you’ll find previous blog entries on this blog and an article at www.devx.com where I’ve covered much of the same material. Of course I also cover it rather a lot in my Expert VB.NET and C# Business Objects books.

 

Before proceeding further however, I need to get some terminology out of the way. There’s a huge difference between logical tiers and physical tiers. Personally I typically refer to logical tiers as layers and physical tiers as tiers to avoid confusion.

 

Logical layers are merely a way of organizing your code. Typical layers include Presentation, Business and Data – the same as the traditional 3-tier model. But when we’re talking about layers, we’re only talking about logical organization of code. In no way is it implied that these layers might run on different computers or in different processes on a single computer or even in a single process on a single computer. All we are doing is discussing a way of organizing a code into a set of layers defined by specific function.

 

Physical tiers however, are only about where the code runs. Specifically, tiers are places where layers are deployed and where layers run. In other words, tiers are the physical deployment of layers.

 

Why do we layer software? Primarily to gain the benefits of logical organization and grouping of like functionality. Translated to tangible outcomes, logical layers offer reuse, easier maintenance and shorter development cycles. In the final analysis, proper layering of software reduces the cost to develop and maintain an application. Layering is almost always a wonderful thing!

 

Why do we deploy layers onto multiple tiers? Primarily to obtain a balance between performance, scalability, fault tolerance and security. While there are various other reasons for tiers, these four are the most common. The funny thing is that it is almost impossible to get optimum levels of all four attributes – which is why it is always a trade-off between them.

 

Tiers imply process and/or network boundaries. A 1-tier model has all the layers running in a single memory space (process) on a single machine. A 2-tier model has some layers running in one memory space and other layers in a different memory space. At the very least these memory spaces exist in different processes on the same computer, but more often they are on different computers. Likewise, a 3-tier model has two boundaries. In general terms, an n-tier model has n-1 boundaries.

 

Crossing a boundary is expensive. It is on the order of 1000 times slower to make a call across a process boundary on the same machine than to make the same call within the same process. If the call is made across a network it is even slower. It is very obvious then, that the more boundaries you have the slower your application will run, because each boundary has a geometric impact on performance.

 

Worse, boundaries add raw complexity to software design, network infrastructure, manageability and overall maintainability of a system. In short, the more tiers in an application, the more complexity there is to deal with – which directly increases the cost to build and maintain the application.

 

This is why, in general terms tiers should be minimized. Tiers are not a good thing, they are a necessary evil required to obtain certain levels of scalability, fault tolerance or security.

 

As a good architect you should be dragged kicking and screaming into adding tiers to your system. But there really are good arguments and reasons for adding tiers, and it is important to accommodate them as appropriate.

 

The reality is that almost all systems today are at least 2-tier. Unless you are using an Access or dBase style database your Data layer is running on its own tier – typically inside of SQL Server, Oracle or DB2. So for the remainder of my discussion I’ll primarily focus on whether you should use a 2-tier or 3-tier model.

 

If you look at the CSLA .NET architecture from my Expert VB.NET and C# Business Objects books, you’ll immediately note that it has a construct called the DataPortal which is used to abstract the Data Access layer from the Presentation and Business layers. One key feature of the DataPortal is that it allows the Data Access layer to run in-process with the business layer, or in a separate process (or machine) all based on a configuration switch. It was specifically designed to allow an application to switch between a 2-tier or 3-tier model as a configuration option – with no changes required to the actual application code.

 

But even so, the question remains whether to configure an application for 2 or 3 tiers.

 

Ultimately this question can only be answered by doing a cost-benefit analysis for your particular environment. You need to weigh the additional complexity and cost of a 3-tier deployment against the benefits it might bring in terms of scalability, fault tolerance or security.

 

Scalability flows primarily from the ability to get database connection pooling. In CSLA .NET the Data Access layer is entirely responsible for all interaction with the database. This means it opens and closes all database connections. If the Data Access layer for all users is running on a single machine, then all database connections for all users can be pooled. (this does assume of course, that all users employ the same database connection string include the same database user id – that’s a prerequisite for connection pooling in the first place)

 

The scalability proposition is quite different for web and Windows presentation layers.

 

In a web presentation the Presentation and Business layers are already running on a shared server (or server farm). So if the Data Access layer also runs on the same machine database connection pooling is automatic. In other words, the web server is an implicit application server, so there’s really no need to have a separate application server just to get scalability in a web setting.

 

In a Windows presentation the Presentation and Business layers (at least with CSLA .NET) run on the client workstation, taking full advantage of the memory and CPU power available on those machines. If the Data Access layer is also deployed to the client workstations then there’s no real database connection pooling, since each workstation connects to the database directly. By employing an application server to run the Data Access layer all workstations offload that behavior to a central machine where database connection pooling is possible.

 

The big question with Windows applications is at what point to use an application server to gain scalability. Obviously there’s no objective answer, since it depends on the IO load of the application, pre-existing load on the database server and so forth. In other words it is very dependant on your particular environment and application. This is why the DataPortal concept is so powerful, because it allows you to deploy your application using a 2-tier model at first, and then switch to a 3-tier model later if needed.

 

There’s also the possibility that your Windows application will be deployed to a Terminal Services or Citrix server rather than to actual workstations. Obviously this approach totally eliminates the massive scalability benefits of utilizing the memory and CPU of each user’s workstation, but does have the upside of reducing deployment cost and complexity. I am not an expert on either server environment, but it is my understanding that each user session has its own database connection pool on the server, thus acting the same as if each user has their own separate workstation. If this is actually the case, then an application server would have benefit by providing database connection pooling. However, if I’m wrong and all user sessions share database connections across the entire Terminal Services or Citrix server then having an application server would offer no more scalability benefit here than it does in a web application (which is to say virtually none).

 

Fault tolerance is a bit more complex than scalability. Achieving real fault tolerance requires examination of all failure points that exist between the user and the database – and of course the database itself. And if you want to be complete, you just also consider the user to be a failure point, especially when dealing with workflow, process-oriented or service-oriented systems.

 

In most cases adding an application server to either a web or Windows environment doesn’t improve fault tolerance. Rather it merely makes it more expensive because you have to make the application server fault tolerant along with the database server, the intervening network infrastructure and any client hardware. In other words, fault tolerance is often less expensive in a 2-tier model than in a 3-tier model.

 

Security is also a complex topic. For many organizations however, security often comes down to protecting access to the database. From a software perspective this means restricting the code that interacts with the database and providing strict controls over the database connection strings or other database authentication mechanisms.

 

Security is a case where 3-tier can be beneficial. By putting the Data Access layer onto its own application server tier we isolate all code that interacts with the database onto a central machine (or server farm). More importantly, only that application server needs to have the database connection string or the authentication token needed to access the database server. No web server or Windows workstation needs the keys to the database, which can help improve the overall security of your application.

 

Of course we must always remember that switching from 2-tier to 3-tier decreases performance and increases complexity (cost). So any benefits from scalability or security must be sufficient to outweigh these costs. It all comes down to a cost-benefit analysis.

 

Thursday, July 21, 2005 3:51:17 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [15]  | 

 Monday, April 18, 2005

Indigo is Microsoft’s code name for the technology that will bring together the functionality in today’s .NET Remoting, Enterprise Services, Web services (including WSE) and MSMQ. Of course knowing what it is doesn’t necessarily tell us whether it is cool, compelling and exciting … or rather boring.

 

Ultimately beauty is in the eye of the beholder. Certainly the Indigo team feels a great deal of pride in their work and they paint this as a very big and compelling technology.

 

Many technology experts I’ve talked to outside of Microsoft are less convinced that it is worth getting all excited.

 

Personally, I must confess that I find Indigo to be a bit frustrating. While it should provide some absolutely critical benefits, in my view it is merely laying the groundwork for the potential of something actually exciting to follow a few years later.

 

Why do I say this?

 

Well, consider what Indigo is again. It is a technology that brings together a set of existing technologies. It provides a unified API model on top of a bunch of concepts and tools we already have. To put it another way, it lets us do what we can already do, but in a slightly more standardized manner.

 

If you are a WSE user, Indigo will save you tons of code. But that’s because WSE is experimental stuff and isn’t refined to the degree Remoting, Enterprise Services or Web services are. If you are using any of those technologies, Indigo won’t save you much (if any) code – it will just subtly alter the way you do the things you already do.

 

Looking at it this way, it doesn’t sound all that compelling really does it?

 

But consider this. Today’s technologies are a mess. We have at least five different technologies for distributed communication (Remoting, ES, Web services, MSMQ and WSE). Each technology shines in different ways, so each is appropriate in different scenarios. This means that to be a competent .NET architect/designer you must know all five reasonably well. You need to know the strengths and weaknesses of each, and you must know how easy or hard they are to use and to potentially extend.

 

Worse, you can’t expect to easily switch between them. Several of these options are mutually exclusive.

 

But the final straw (in my mind) is this: the technology you pick locks you into a single architectural world-view. If you pick Web services or WSE you are accepting the SOA world view. Sure you can hack around that to do n-tier or client/server, but it is ugly and dangerous. Similarly, if you pick Enterprise Services you get a nice set of client/server functionality, but you lose a lot of flexibility. And so forth.

 

Since the architectural decisions are so directly and irrevocably tied to the technology, we can’t actually discuss architecture. We are limited to discussing our systems in terms of the technology itself, rather than the architectural concepts and goals we’re trying to achieve. And that is very sad.

 

By merging these technologies into a single API, Indigo may allow us to elevate the level of dialog. Rather than having inane debates between Web services and Remoting, we can have intelligent discussions about the pros and cons of n-tier vs SOA. We can apply rational thought as to how each distributed architecture concept applies to the various parts of our application.

 

We might even find that some parts of our application are n-tier, while others require SOA concepts. Due to the unified API, Indigo should allow us to actually do both where appropriate. Without irrational debates over protocol, since Indigo natively supports concepts for both n-tier and SOA.

 

Now this is compelling!

 

As compelling as it is to think that we can start having more intelligent and productive architectural discussions, that isn’t the whole of it. I am hopeful that Indigo represents the groundwork for greater things.

 

There are a lot of very hard problems to solve in distributed computing. Unfortunately our underlying communications protocols never seem to stay in place long enough for anyone to really address the more interesting problems. Instead, for many years now we’ve just watched as vendors reinvent the concept of remote procedure calls over and over again: RPC, IIOP, DCOM, RMI, Remoting, Web services, Indigo.

 

That is frustrating. It is frustrating because we never really move beyond RPC. While there’s no doubt that Indigo is much easier to use and more clear than any previous RPC scheme, it is also quite true that Indigo merely lets us do what we could already do.

 

What I’m hoping (perhaps foolishly) is that Indigo will be the end. That we’ll finally have an RPC technology that is stable and flexible enough that it won’t need to be replaced so rapidly. And being stable and flexible, it will allow the pursuit of solutions to the harder problems.

 

What are those problems? They are many, and they include semantic meaning of messages and data. They include distributed synchronization primitives and concepts. They include standardization and simplification of background processing – making it as easy and natural as synchronous processing is today. They include identity and security issues, management of long-running processes, simplification of compensating transactions and many other issues.

 

Maybe Indigo represents the platform on which solutions to these and other problems can finally be built. Perhaps in another 5 years we can look back and say that Indigo was the turning point that finally allowed us to really make distributed computing a first-class concept.

Monday, April 18, 2005 10:13:06 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [3]  | 

 Thursday, March 31, 2005

I just spent three days getting inundated with the details of Indigo as it stands today (in its pre-beta state). For those that don’t know, Indigo is the combined next generation of .NET Remoting, Enterprise Services, Web services (asmx and WSE) and MSMQ all rolled into one big ball.

 

I also had some conversations about Avalon, though not nearly in so much detail. For those that don’t know, Avalon is the next generation display technology that uses full 3D rendering and should allow us to (finally) escape the clutches of GDI. Avalon is the display technology related to XAML. XAML is an XML markup language to describe Avalon interfaces.

 

My interest in these technologies spans quite a range of concerns, but top on my list is how they might impact my Business Objects books and the related CSLA .NET framework.

 

It turns out that the news is pretty good. Seriously good actually. In fact, it looks like people using CSLA .NET today are going to be very happy over the next few years.

 

Within the context of CSLA .NET, Indigo is essentially a drop-in replacement for Remoting. I will have to change the DataPortal to use Indigo, but that change should have no impact on an application’s UI, business logic or data access code. In other words (cross your fingers), a business application based on CSLA .NET should move to Indigo with essentially no code changes.

 

[Disclaimer: Indigo isn’t even in beta, so anything and everything could change. My statements here are based on what I’ve seen and heard, and thus may change over time as well.]

 

One of my primary goals for CSLA .NET 2.0 is to alter the DataPortal to make it easier to adapt it to various transports. In the short term this means Remoting, asmx, WSE and Enterprise Services (DCOM). But it also means I’ll be able to add Indigo support with relative ease once Indigo actually exists.

 

Avalon is a different story. Avalon is a new UI technology, which means that moving to Avalon means tossing out your existing UI and building a new one. But if you are using Windows Forms today, with CSLA .NET business objects for your logic and databinding to connect the two together your life will be better than most. It appears that Avalon will also support databinding against objects just like (hopefully better than) Windows Forms.

 

Since a well-written CSLA-based Windows Forms application doesn’t have any business logic (not even validation) in the UI itself, switching to Avalon should merely be a rip-and-replace of the UI, with little to no impact on the underlying business or data access layers. I keep telling people that the “UI is expendable”, and here’s the proof :-)

 

I just thought I’d share these observations. Indigo and Avalon (together under the label of WinFX) won’t show up for quite a long time, so none of this is of immediate interest. Still it is nice to know that when it does show up sometime in the future that CSLA .NET will have helped people to move their applications more easily to the new technologies.

Thursday, March 31, 2005 11:10:08 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [3]  | 

 Friday, February 25, 2005

In this article, Bill Vaughn voices his view that DataAdapter.Fill should typically be a developer's first choice for getting data rather than using a DataReader directly.

 

In my VB.NET and C# Business Objects books I primarily use the DataReader to populate objects in the DataPortal_Fetch methods. You might infer then, that Bill and I disagree.

 

While it is true that Bill and I often get into some really fun debates, I don't think we disagree here.

 

Bill's article seems to be focused on scenarios where UI developers or non-OO business developers use a DataReader to get data. In such cases I agree that the DataAdapter/DataTable approach is typically far preferable. Certainly there will be times when a DataReader makes more sense there, but usually the DataAdapter is the way to go.

 

In the OO scenarios like CSLA .NET the discussion gets a bit more complex. In my books I discussed why the DataReader is a good option - primarily because it avoids loading the data into memory just to copy that data into our object variables. For object persistence the DataReader is the fastest option.

 

Does that mean it is the best option in some absolute sense?

 

Not necessarily. Most applications aren't performance-bound. In other words, if we lost a few milliseconds of performance it is likely that our users would never notice. For most of us, we could trade a little performance to gain maintainability and be better off.

 

As a book author I am often stuck between two difficult choices. If I show the more maintainable approach the performance Nazis will jump on me, while if I show the more performant option the maintainability Nazis shout loudly.

 

So in the existing books I opted for performance at the cost of maintainability. Which is nice if performance is your primary requirement, and I don’t regret the choice I made.

 

(It is worth noting that subsequent to publication of the books, CSLA .NET has been enhanced, including enhancing the SafeDataReader to accept column names rather than ordinal positions. This is far superior for maintenance, with a very modest cost to performance.)

 

Given some of the enhancements to the strongly typed DataAdapter concept (the TableAdapter) that we’ll see in .NET 2.0, it is very likely that I’ll switch and start using TableAdapter objects in the DataPortal_xyz methods.

 

While there’s a performance hit, the code savings looks to be very substantial. Besides, it is my opportunity to make the maintenance-focused people happy for a while and let the performance nuts send me nasty emails. Of course nothing I do will prevent the continued use the DataReader for those who really need the performance.

Friday, February 25, 2005 2:28:57 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [2]  | 

 Wednesday, February 23, 2005
Rich put together this valuable list of tools - thank you!!
Wednesday, February 23, 2005 2:51:21 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [2]  | 

Sahil has started a discussion on a change to the BinaryFormatter behavior in .NET 2.0.

 

Serialization is just the process of taking complex data and converting it into a single byte stream. Motivation is a separate issue :)

 

There are certainly different reasons to "serialize" an object.

 

One for what I do in CSLA .NET, which is to truly clone an object graph (not just an object, but a whole graph), either in memory or across the network. This is a valid reason, and has a set of constraints that a serializer must meet to be useful. This is what the BinaryFormatter is all about today.

 

Another is to easily convert some or all of an object’s data into actual data. To externalize the object’s state. This is what the XmlSerializer is all about today. The purpose isn’t to replicate a .NET type, it is to convert that type into a simple data representation.

 

Note that in both cases the formatter/serializer attempts to serialize the entire object graph, not just a single object. This is because the “state” of an object is really the state of the object graph. That implies that all references from your object to any other objects are followed, because they collectively constitute the object graph. If you want to “prune” the graph, you mark references such that the formatter/serializer doesn’t follow them.

 

In the case of the BinaryFormatter, it does this by working with each object’s fields, and it follows references to other objects. An event inside an object is just another field (though the actual backing field is typically hidden). This backing field is just a delegate reference, which is just a type of object reference – and thus that object reference is followed like any other.

 

In the case of the XmlSerializer, only the public fields and read/write properties are examined. But still, if one of them is a reference the serializer attempts to follow that reference. The event/delegate issue doesn’t exist here only because the backing field for events isn’t a public field. Make it a public field and you’ll have issues.

 

In .NET 1.x the BinaryFormatter attempts to follow all references unless the field is marked as [NonSerializable]. In C# the field: target provided (what I consider to be) a hack to apply this attribute to the backing field. It also has a better approach, which is to use a block structure to declare the event so you get to manually declare the backing field and can apply the attribute yourself.

 

The trick is that the default behavior is for the backing field to be serializable, and things like Windows Forms or other nonserializable objects might subscribe to the object’s events. When the BinaryFormatter follows the references to those objects it throws an exception (as it should).

 

I made a big stink about this when I wrote my VB Business Objects book though, because I discovered this issue and there was no obvious solution. This is because VB.NET has neither the Field: target hack, nor the block structure declaration for events. I was forced to implement a C# base class to safely declare my events.

 

I spent a lot of time talking to both the VB and CLR (later Indigo) teams about this issue.

 

The VB team provided a solution by supporting the block structure for declaring events, thus allowing us to have control over how the backing field is serialized. This is a very nice solution as I see it.

 

I am not familiar with the change Sahil is seeing in .NET 2.0. But that is probably because I’ve been using the block structure event declarations as shown in my C# and VB 2005 links above, so I’m manually ensuring that only serializable event handlers are being traced by the BinaryFormatter.

 

But I should point out that the C# code in the example above is for .NET 1.1 and works today just as it does in .NET 2.0.

Wednesday, February 23, 2005 12:06:27 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [3]  | 

 Friday, February 18, 2005

In a recent online discussion the question came up “If ‘the middle tier’ has no knowledge about the actual data it's transporting, then what value is it adding?”

 

The answer: database connection pooling.

 

Pure and simple, in most rich client scenarios the only reason for a "middle tier" is to pool database connections. And the benefit can be tremendous.

 

Consider 200 concurrent clients all connecting to your database using conventional coding. They'll have at least 200 connections open, probably more. This is especially true since each client does its own "connection pooling", so typically a client will never close its connection once established for the day.

 

Then consider 200 clients going through a middle tier "app server" that does nothing but ferry the data between the clients and database. But it has the code to open the connections. Now those 200 clients might use just 3-4 connections to the database rather than 200, because they are all pooled on the server.

 

Was there a performance hit? Absolutely. Was there a scalability gain? Absolutely. Is it more expensive and harder to build/maintain? Absolutely.

 

This middle tier stuff is not a panacea. In fact its cost is typically higher than the benefit, because most applications don't actually have enough concurrent users to make it worth the complexity. But people are enamored of the idea of "n-tier", thinking it requires an actual physical tier...

 

I blame it on Windows DNA and those stupid graphic representations. They totally muddied the waters in people's understanding the difference between n-layer (logical separation) and n-tier (physical separation).

 

People DO want n-layer, because that provides reuse, maintainability and overall lower costs. Logical separation of UI, business logic, data access and data storage is almost always of tremendous benefit.

 

People MIGHT want n-tier if they need the scalability or security it can offer, and if those benefits outweigh the high cost of building a physically distributed system. But the cost/benefit isn’t there as often as people think, so a lot of people build physical n-tier systems for no good reason. They waste time and money for no real gain. This is sad, and is something we should all fight against.

 

I make my living writing books and articles and speaking about building distributed systems. And my primary message is just say NO!

 

You should be forced into implementing physical tiers kicking and screaming. There should be substantial justification for using tiers, and those justifications should be questioned at every step along the way.

 

At the same time, you should encourage the use of logical layers at all times. There should be substantial justification for not using layers, and any argument against layering of software should be viewed with extreme skepticism.

 

Layering is almost always good, tiers are usually bad.

Friday, February 18, 2005 12:11:26 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [5]  |