Monday, February 21, 2005
awk
A few days ago I posted a list of programming languages in which I've been competent over the years. A few other people chimed in with their own lists, which I found very interesting.
 
Thomas Williams pointed out that XSLT was important in his history, which got me thinking about an oversight in my own list.
 
I neglected to mention awk (more specifically gawk on the VAX). awk is a unix text processing language, and gawk is the GNU Project version created for many platforms. I first learned about awk when taking a graduate level data structures class at the University of Minnesota where we used Unix boxes of some flavor or other. It was so useful that I found the VAX gawk implementation and put that on our VAX at work (this was around 1990 or so).
 
People rave about things like perl or XSLT, but I gotta say that for pure text processing it is hard to be awk. If XSLT had the power of awk it would have swept the web development world in ways we can't even imagine. I know that XSLT is widely used in the web world, but if you've used XSLT and haven't used awk you just don't know how crippled XSLT really is.
 
The thing is, that XSLT has the same mindset as awk. An awk program is divided up into blocks, and each block is triggered based on a regular expression evaluation. In the case of awk, each input line is evaluated against the regular expression for every block. Each block where the regular expression matches the input line is executed. There is no linear or event-driven or OO concept involved. It is the same as XSLT in this regard.
 
Where awk is amazing is that it is a complete language. It has variables, arrays, looping structures, conditionals and so forth. The input text is automatically parsed into easy-to-manipulate chunks based on your parsing choices. This means that inside one of these blocks, you can do virtually anything you desire. So within a block, triggered due to a regular expression match, you can use a complete programming language that is entirely geared toward text manipulation to act on a pre-parsed line of input.
 
To this day I wonder why there isn't either a variant of XSLT that can do what awk does, or a variant of awk that parses XML documents like XSLT. Perhaps the world just isn't ready for that kind of power?
Monday, February 21, 2005 9:29:49 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, February 18, 2005

In a recent online discussion the question came up “If ‘the middle tier’ has no knowledge about the actual data it's transporting, then what value is it adding?”

 

The answer: database connection pooling.

 

Pure and simple, in most rich client scenarios the only reason for a "middle tier" is to pool database connections. And the benefit can be tremendous.

 

Consider 200 concurrent clients all connecting to your database using conventional coding. They'll have at least 200 connections open, probably more. This is especially true since each client does its own "connection pooling", so typically a client will never close its connection once established for the day.

 

Then consider 200 clients going through a middle tier "app server" that does nothing but ferry the data between the clients and database. But it has the code to open the connections. Now those 200 clients might use just 3-4 connections to the database rather than 200, because they are all pooled on the server.

 

Was there a performance hit? Absolutely. Was there a scalability gain? Absolutely. Is it more expensive and harder to build/maintain? Absolutely.

 

This middle tier stuff is not a panacea. In fact its cost is typically higher than the benefit, because most applications don't actually have enough concurrent users to make it worth the complexity. But people are enamored of the idea of "n-tier", thinking it requires an actual physical tier...

 

I blame it on Windows DNA and those stupid graphic representations. They totally muddied the waters in people's understanding the difference between n-layer (logical separation) and n-tier (physical separation).

 

People DO want n-layer, because that provides reuse, maintainability and overall lower costs. Logical separation of UI, business logic, data access and data storage is almost always of tremendous benefit.

 

People MIGHT want n-tier if they need the scalability or security it can offer, and if those benefits outweigh the high cost of building a physically distributed system. But the cost/benefit isn’t there as often as people think, so a lot of people build physical n-tier systems for no good reason. They waste time and money for no real gain. This is sad, and is something we should all fight against.

 

I make my living writing books and articles and speaking about building distributed systems. And my primary message is just say NO!

 

You should be forced into implementing physical tiers kicking and screaming. There should be substantial justification for using tiers, and those justifications should be questioned at every step along the way.

 

At the same time, you should encourage the use of logical layers at all times. There should be substantial justification for not using layers, and any argument against layering of software should be viewed with extreme skepticism.

 

Layering is almost always good, tiers are usually bad.

Friday, February 18, 2005 12:11:26 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
I was just IMing with a friend. He's working with a client that has an interesting IT staff. The person he's working with for instance, recently shut down a main server for maintenance - in the middle of the day, without warning the active users.
 
This is why we need fault-tolerant, stateless server clusters. Not to stop downtime from accidents or hardware failure, but rather to overcome the limitations of IT staffing.
Friday, February 18, 2005 9:11:51 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, February 15, 2005

So there’s some news around Internet Explorer. Yeah, that browser that everyone uses, but which hasn’t changed for years.

 

First and most important, there was a vulnerability – a nasty one – in IE that got fixed in the most recent round of patches. If you haven’t installed them you better do it quick. This vulnerability is very easy to exploit! To see if you are vulnerable you can go here.

 

Second, a fellow RD put me onto this IE-based browser called Avant Browser. It adds a ton of Firefox-like features to IE, including tabbed browsing, integrated searching and more. And it is freeware – no ads, no spyware, no catch that we can find. I’ve been using it as my primary browser for a couple days now and no longer yearn for Firefox at all.

 

Finally, Microsoft has decided that they really need to do some thing about or with IE, so they are coming out with IE 7.0 sometime in the future. Here’s the Microsoft press announcement, and here’s already an article on the topic.

 

While I do think that Microsoft needs to do an IE upgrade, this is a double-edged sword for them – and for those of us who prefer rich clients.

 

Back around 2000, before the dot-bomb, there was an emerging debate about whether the browser should continue to be a glorified terminal or should become a programming platform. The discussion was rendered largely moot by the dot-bomb and the Bush-era recession, but Firefox and a new IE will likely rekindle the debate.

 

I personally don’t see the “browser as a programming platform” being a good thing. Browsers were designed for document viewing. They’ve already been hacked nearly to death to enable the kinds of web apps we have today. Just think how deeply they’ll need to be hacked to enable real programming capabilities comparable to Windows or KDE. Such a backwards way of getting a programming platform is very unlikely to result in anything good.

 

That said, if we were to start from scratch. If we were to design a real programming platform that supported rich GUI interactions, client-side logic, included meaningful state management and access to client-side devices like printers, scanners, etc. Well, then we’d have Windows, or at least something quite close to it.

 

Sure, it would offer a way to break from the past. It would mean all that legacy code could go away. But it would also mean that all our existing software would be stuck. The odds of such an idea going anywhere are comparable to BeOS taking over the planet.

 

So the browser will never become a new platform. At best it will become the ultimate in chewing gum and bailing twine platforms. What a nightmare!

 

The only way out I can see is a browser that directly embeds .NET or the JDK, and provides programmers in those virtual machines access to a decent document object model akin to what Microsoft is creating in Avalon or XAML. But there too, we’re just recreating Avalon itself inside a browser rather than in Windows itself. Why would we want to be restricted to some arbitrary browser window when we can have the whole OS experience?

 

So in the end I see little hope for the browser-as-a-platform concept – but I am sure there’ll be people who do see it as a good thing and who see Firefox and an IE upgrade as a way to rid themselves of traditional rich clients… Such is life.

Tuesday, February 15, 2005 1:09:55 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, February 14, 2005

There’s a thread on the CSLA .NET discussion forum about possible differences between the VB and C# versions of CSLA .NET. I started to answer the thread, then got on a roll, so it became a blog entry :-)

 

I strive to keep the two versions of CSLA .NET in sync within a reasonable time window. Everything after version 1.0 (the version in my books) is essentially my donation to the community. What I get out of it is not wealth, but rather is a lot of very interesting and useful feedback from the vibrant CSLA .NET community. I'm able to try out some of the most interesting (to me) ideas by releasing new and updated versions of the code. It is a learning opportunity.

 

The fact that I have to do every mod twice is a serious pain and does reduce the fun, but I think it is worth the pain because it makes the end result more useful for everyone.

 

I do most of my first-run coding in VB, because I prefer it. Simple personal preference. I've done some first-run coding in C# too, I just don't find it as enjoyable. Some people have the reverse experience and that's cool too. That doesn't bother me one way or the other. I fully understand feeling an affinity toward a specific language. It took me years to get over Pascal. Ahh VAX Pascal, I still harbor such fond memories.

 

But what I am more concerned about in terms of CSLA .NET is VS 2005. In .NET 2.0 we start to see some feature divergence between VB and C#. Most notably the My namespace in VB. Fortunately by playing in the middle-tier, CSLA is less subject to the differences than some code will be. However, there'll still be some differences that will make my dual life harder.

 

The biggest one that will impact me is My.Resources, which makes the use of resources somewhat simpler than C#. This isn't a huge thing, but it does mean there'll be extra code differences to reconcile between the two versions in CSLA .NET 2.0.

 

There's also My.Settings, though I don't know if that will impact me quite as much. I anticipate dropping the DB() function from BusinessBase in 2.0, since most people (rightly) avoid putting db connection strings in their config files.

 

The two primary C# features (yield and anonymous delegates) don't appear to have a home in CSLA, so I don't expect any differences from them. Not that they aren’t seriously cool features, but they just don’t have a place in CSLA .NET itself.

 

The new strongly typed TableAdapter classes are very cool. They are useful in both languages. And I hope to use strongly typed TableAdapter objects to simplify the code in the DataPortal_xyz methods.

 

There are some features that are more accessible to VB than C# in the new strongly typed DataTable (due to C#'s lack of WithEvents functionality - a major oversight imo). However, I don't expect to use any of those features in CSLA to start with, so there's no impact there.

 

When I write the book I'll create Windows and Web UI chapters. Those are what I dread most, because that's where the differences due to My become much more serious. There are numerous examples of UI development where My will be a serious code-saver - thus causing direct differences between the VB and C# code. Not that I can't do the same stuff in C#, just that it will take more and different code, which increases the effort on my part as an author.

 

Fortunately most of the book is about the framework and creating business objects, and the language divergence will have relatively minimal impact in those areas.

 

It is hard to speculate on what comes after VS 2005, but personally I expect more divergence, not less. Earlier in the thread someone noted that things like the Mac, Linux and Java still exist even though you can technically do everything they do with Windows and .NET.

 

The fact is that they all serve a purpose, as does .NET to them. People deep in C# often think different than those deep in VB. People in Java think different than those in .NET. This means they have different perspectives, different priorities, on the same problems and issues. This is only good. This means there are competing ideas that we can all evaluate and use to the best of our abilities, regardless of the language or platform we choose to use.

 

Loving distributed computing as I do, I am constantly taking ideas from the C++ and Java worlds. I closely watch the SOA world, even though I think it is misguided in many ways, because there are interesting ideas and perspectives there that can apply to distributed object-oriented systems as well.

 

I’ve said it before and I’ll say it again, if you only know one programming language family (such as the C family or the Basic family) then you really, really need to get out more. Your horizons and thus your career are simply too limited and you can’t be considered credible in most of these discussions.

 

That’s an interesting meme. Which programming languages have you been competent in during your career? I’ll start (in rough order of usage):

 

1.      Apple BASIC

2.      VAX Pascal

3.      Turbo Pascal

4.      DCL

5.      FORTRAN 90

6.      VAX Basic

7.      ARexx

8.      Modula-II

9.      Visual Basic (1-6)

10.  Visual Basic .NET

11.  C#

 

While I did write a VT terminal emulator in C once, I don’t think I was ever really competent in C, so I’m not counting that. My memories of that experience are not inspirational in the slightest… I’ve also dabbled in various Unix shell languages and bat files, but was never competent in them.

 

Converting the list to language families is harder, because things like ARexx aren’t obvious, but here’s my attempt:

 

1.      Pascal (Pascals and Modula-II)

2.      Basic (various)

3.      FORTRAN

4.      Scripting (ARexx, DCL)

5.      C (C# and C if you are generous)

 

So, having wandered from the topic of CSLA .NET parity between VB and C# we arrive at what could be a cool meme. Go ahead, comment or blog – what languages and language families have you been competent in during your career?

Monday, February 14, 2005 6:50:22 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, February 08, 2005

I was looking for info on a design pattern, and on a whim I thought I'd see if patternshare.org was online yet. And it is!

This site promises to be an awesome resource for all of us, since it provides a centralized index/resouce for patterns of many types. The fact that it is online is wonderful news!

Tuesday, February 08, 2005 7:35:39 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 

I just finished watching Eric Rudder’s keynote on Indigo at VS Live in San Francisco. As with all keynotes, it had glitz and glamour and gave a high-level view of what Microsoft is thinking.

 

(for those who don’t know, Eric is the Microsoft VP in charge of developer-related stuff including Indigo)

 

Among the various things discussed was the migration roadmap from today’s communication technologies to Indigo. I thought it was instructive.

 

From asmx web services the proposed changes are minor. Just a couple lines of code change and away you go. Very nice.

 

From WSE to Indigo is harder, since you end up removing lots of WSE code and replacing it with an attribute or two. The end result is nice because your code is much shorter, but it is more work to migrate.

 

From Enterprise Services (COM+, ServicedComponent) the changes are minor – just a couple lines of changed code. But the semantic differences are substantial because you can now mark methods as transactional rather than the whole class. Very nice!

 

From System.Messaging (MSMQ) to Indigo the changes are comparable in scope to the WSE change. You remove lots of code and replace it with an attribute or two. Again the results are very nice because you save lots of code, but the migration involves some work.

 

From .NET Remoting to Indigo the changes are comparable to the asmx migration. Only a couple lines of code need to change and away you go. This does assume you listened to advice from people like Ingo Rammer, Richard Turner and myself and avoided creating custom sinks, custom formatters or custom channels. If you ignored all this good advice then you’ll get what you deserve I guess :-)

 

As Eric pointed out however, Indigo is designed for the loosely coupled web service/SOA mindset, not necessarily for the more tightly coupled n-tier client/server mindset. He suggested that many users of Remoting may not migrate to Indigo – directly implying that Remoting may remain the better n-tier client/server technology.

 

I doubt he is right. Regardless of what Indigo is designed for, it is clear to me that it offers substantial benefits to the n-tier client/server world. These benefits include security, reliable messaging, simplified 2-phase transactions and so forth. The fact that Indigo can be used for n-tier client/server even if it is a bit awkward or not really its target usage won’t stop people. And from today’s keynote I must say that it looks totally realistic to (mis)use Indigo for n-tier client/server work.

Tuesday, February 08, 2005 12:35:52 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, February 03, 2005

One last post on SOA from my coffee-buzzed, Chicago-traffic-addled mind.

 

Can a Service have Tiers?

 

Certainly a Service can have layers. Any good software will have layers. In the case of a Service these layers will likely be:

 

1.      Interface

2.      Business

3.      Data access

4.      Data management

 

This only makes sense. You’ll organize your message-parsing and XML handling code into the interface layer, which will invoke the business layer to do actual work. The business layer may invoke the Data access layer to get/save data into the Data management (database) layer.

 

But layers are logical constructs. They are just a way of organizing code so it is maintainable, readable and reusable. Layers say nothing about how the code is deployed – that is the realm of tiers.

 

So the question remains, can a Service be divided into tiers?

 

I’ll argue yes.

 

You deploy layers onto different tiers in an effort to get a good trade-off between performance, scalability, fault-tolerance and security. More tiers mean worse performance, but may result in better scalability or security.

 

If I create a service, I may very well need to deploy it such that I can provide high levels of scalability or security. To do this, I may need to deploy some of my service’s layers onto different tiers.

 

This is no different – absolutely no different – than what we do with web applications. This shouldn’t be a surprise, since a web service is nothing more than a web application that spits out XML instead of HTML. It seems pretty obvious that the rules are the same.

 

And there are cases where a web application needs to have tiers to scale or to be secure. It follows then that the same is true for web services.

 

Thus, services can be deployed into multiple tiers.

 

Yet the SOA purists would argue that any tier boundary should really be a service boundary. And this is where things get nuts. Because a service boundary implies lack of trust, while a layer boundary implies complete trust. Tiers are merely deployments of layers, so tiers imply complete trust too.

 

(By trust here I am not talking about security – I’m talking about data trust. A service must treat any caller – even another service – as an untrusted entity. It must assume that any inbound data breaks rules. If a service does extend trust then it instantly becomes unmaintainable in the long run and you just gave up the primary benefit of SOA.)

 

So if a tier is really a service boundary, then we’re saying that we have multiple services, one calling the next. But services pretty much always have those four layers I mentioned earlier, so now each “was-tier-now-is-service” will have those layers.

 

Obviously this is a lot more code to write, and a lot of overhead, since the lower-level service (that would have been a tier) can’t trust the higher level one and must replicate much of its validation and possibly other business logic.

 

To me, at this point, it is patently obvious that the idea of entirely discarding tiers in favor of services is absurd. Rather, a far better view is to suggest that services can have tiers – private, trusting communications between layers, even across the network between the web server hosting the service interface and the application server hosting the data access code.

 

And of course this ties right back into my previous post for today on remoting. Because it is quite realistic to expect that you’ll use DCOM/ES/COM+ or remoting to do the communication between the web server and application server for this private communication.

 

While DCOM might appear very attractive (and is in many cases), it is often not ideal if there’s a firewall between the web server and application server. While it is technically possible to get DCOM to go through a firewall, I gotta say that this one issue is a major driver for people to move to remoting or web services.

 

And web services might be very attractive (and is in many cases), it is not ideal if you want to use distributed OO concepts in the implementation of your service.

 

And there we are back at remoting once again as being a perfectly viable option.

 

Of course there’s a whole other discussion we could have about whether there’s any value to using any OO design concepts when implementing a service – but that can be a topic for another time.

Thursday, February 03, 2005 11:32:50 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 

I am afraid that I'm rapidly becoming more convinced than even Ted that SOA == web services == RPC with angle brackets.

The more people I talk to, the more I realize that virtually no one is actually talking about service-oriented analysis, architecture or design. They are using SOA as a synonym for web services, and they are using web services as a replacement for DCOM, RMI, remoting or whatever RPC protocol they used before.

I think the battle is lost, if battle there was. The idea of a loosely-coupled, message-based architecture where autonomous entities interact with each other over policy-based connections is a really cool idea, but it doesn’t resonate with typical development teams.

The typical development team is building line-of-business systems and just needs a high performance, reliable and feature-rich RPC protocol. Sometimes web services fits that bill, and even if it doesn’t it is the currently fad so it tends to win by default.

People are running around creating web services that do not follow a message-based design. What would a message-based design look like you ask? Like this:

result = procedure(request)

Where ‘procedure’ is the method/procedure name, ‘request’ is the idempotent message containing the request from the caller and ‘result’ is the idempotent message containing the result of the procedure.

Then if you want to be a real purist, you’d make this asynchronous, so the design would actually be:

procedure(request)

And any result message would be returned as a service call from ‘procedure’. But that really goes out of bounds for almost everyone, because then you are truly doing distributed parallel processing and that’s just plain hard to grok.

So in our pragmatic universe, we’re talking about the

result = procedure(request)

form and that’s enough. But that isn’t what most people are doing. Most people are creating services as though they were components. Creating methods/procedures that accept parameters rather than messages. Stuff like this:

customerList = GetCustomerData(firstName As String, lastName As String)

Where ‘customerList’ is a DataSet containing the results of any matches.

There’s not a message, idempotent or not, to be found here. This is components-on-the-web. This is COM-on-the-web or CORBA-on-the-web. This is not SOA, this is just RPC redux.

And that’s OK. I have no problem with that necessarily. But since this is the norm, I am pretty much ready to concede that the “Battle of SOA” is lost. SOA has already become just another acronym in the long list of RPC acronyms we’ve left behind over the decades.

Too bad really, because I found the distributed parallel, loosely coupled, message-based concepts to be extremely interesting and challenging. Hard, and impractical for normal business development, but really ranking high on the geek-cool chart.

Thursday, February 03, 2005 10:59:31 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 

I’ve hashed and rehashed this topic numerous times. In particular, read this and this. But the debate rages on, paralyzing otherwise perfectly normal development teams in a frenzy of analysis paralysis over something that is basically not all that critical in a good architecture.

 

I mean really. If you do a decent job of architecting, it really doesn’t matter a whole hell of a lot which RPC protocol you use, because you can always switch away to another one when required.

 

And while web services a pretty obvious choice for SOA, the reality is that if you are debating between remoting, web services and DCOM/ES/COM+ then you are looking for an RPC protocol – end of story.

 

If you were looking to do service-oriented stuff you’d reject remoting and DCOM/ES/COM+ out of hand because they are closed, proprietary, limited technologies that just plain don’t fit the SO mindset.

 

In an effort to summarize to the finest point possible, I’ll try once again to clarify my view on this topic. In particular, I am going to focus on why you might use remoting and how to use it if you decide to. Specifically I am focusing on the scenario where remoting works better than either of its competitors.

 

First, the scenario that remoting does that web services and ES/COM+/DCOM don’t do (without hacking them):

 

1.      You want to pass rich .NET types between AppDomains, Processes or Machines (we’re talking Hashtables, true business objects, etc.)

2.      You are communicating between layers of an application that are deployed in different tiers (We’re not talking services here, we’re talking layers of a single application. We’re talking client-server, n-tier, etc.)

3.      You want no-touch deployment on the client

 

If you meet the above criteria then remoting is the best option going today. If you only care about 1 and 2 then ES/COM+/DCOM is fine – all you lose is no-touch deployment (well, and a few hours/days in configuring DCOM, but that’s just life :-) ).

 

If you don’t care about 1 then web services is fine. In this case you are willing to live within the confines of the XmlSerializer and should have no pretense of being object-oriented or anything silly like that. Welcome to the world of data-centric programming. Perfectly acceptable, but not my personal cup of tea.  To be fair, it is possible to hack web services to handle number 1, and it isn't hard. So if you feel that you must avoid remoting but need 1, then you aren't totally out of luck.

 

But in general, assuming you want to do 1, 2 and 3 then you should use remoting. If so, how should you use remoting?

 

1.      Host in IIS

2.      Use the BinaryFormatter

3.      Don’t create custom sinks or formatters, just use what Microsoft gave you

4.      Feel free to use SSL if you need a secure line

5.      Wrap your use of the RPC protocol in abstraction objects (like my DataPortal or Fowler’s Gateway pattern)

 

Hosting in IIS gives you a well-tested and robust process model in which your code can run. If it is good enough for www.microsoft.com and www.msn.com it sure should be good enough for you.

 

Using the BinaryFormatter gives you optimum performance and avoids the to-be-deprecated SoapFormatter.

 

By not creating custom sinks or formatters you are helping ensure that you’ll have a relatively smooth upgrade path to Indigo. Indigo, after all, wraps in the core functionality of remoting. They aren’t guaranteeing that internal stuff like custom sinks will upgrade smoothly, but they have been very clear that Indigo will support distributed OO scenarios like remoting does today. And that is what we’re talking about here.

 

If you need a secure link, use SSL. Sure WSE 2.0 gives you an alternative in the web service space, but there’s no guarantee it will be compatible with WSE 3.0, much less Indigo. SSL is pretty darn stable comparatively speaking, and we’ve already covered the fact that web services doesn’t do distributed OO without some hacking.

 

Finally, regardless of whether you use remoting, web services or ES/COM+/DCOM make sure – absolutely sure – that you wrap your RPC code in an abstraction layer. This is simple good architecture. Defensive design against a fluid area of technology.

 

I can’t stress this enough. If your business or UI code is calling web services, remoting or DCOM directly you are vulnerable and I would consider your design to be flawed.

 

This is why this whole debate is relatively silly. If you are looking for an n-tier RPC solution, just pick one. Wrap it in an abstraction layer and be happy. Then when Indigo comes out you can easily switch. Then when Indigo++ comes out you can easily upgrade. Then when Indigo++# comes out you are still happy.

Thursday, February 03, 2005 10:11:20 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Saturday, January 29, 2005

I just upgraded this blog to the newest version of dasBlog. I upgraded my personal blog a few days ago and it has been trouble-free, so I thought it safe to upgrade this one. The only serious change you may see is when posting comments, as this new version requires that you type in a code from a graphic to help defeat posting bots. Kind of a pain, but worth it I suppose.

Of course I'm doing this just before leaving town for two weeks - first to Chicago and then to San Francisco for VS Live. Always a good time to upgrade ;)

The Chicago trip is primarily focused around working with a Magenic client, but I'll be having some evening dinner meetings with various groups as well, including my first real foray into this fun new trend: a Nerd Dinner.

Saturday, January 29, 2005 10:58:59 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, January 24, 2005

Sahil has some interesting thoughts on the web service/DataSet question as well.

 

He spends some time discussing whether “business objects” should be sent via a web service. His definition of “business object” doesn’t match mine, and is closer to Fowler’s data transfer object (DTO) I think.

 

It is important to remember that web services only move boring data. No semantic meaning is included. At best (assuming avoidance of xsd:any) you get some limited syntactic meaning along with the data.

 

When we talk about moving anything via a web service, we’re really just talking about data.

 

When talking about moving a “business object”, most people think of something that can be serialized by web services – meaning by the XmlSerializer. Due to the limitations of the XmlSerializer this means that the objects will have all their fields exposed as public fields or read-write properties.

 

What this means in short, is that the “business objects” can not follow good OO design principles. Basically, they are not “business objects”, but rather they are a way of defining the message schema for the web service. They are, at best, data transfer objects.

 

In my business objects books and framework I talk about moving actual business objects across the wire using remoting. Of course the reality here is that only the data moves – but the code must exist on both ends. The effective result is that the object is cloned across the network, and retains both its data and the semantic meaning (the business logic in the object).

 

You can do this with web services too, but not in a “web service friendly” way. Cloning an object implies that you get all the data in the object. And to do this while still allowing for encapsulation means that the serialization must get private, friend/internal and protected fields as well as public ones. This is accomplished via the BinaryFormatter. The BinaryFormatter generates and consumes streams, which can be thought of as byte arrays. Thus, you end up creating a web service that moves byte arrays around. Totally practical, but the data is not human-readable XML – it is Base64 encoded binary data. I discuss how to do this in CSLA .NET on my web site.

 

Now we are talking about moving business objects. Real live, OO designed business objects.

 

Of course this approach is purely for n-tier scenarios. It is totally antithetical to any service-oriented model!

 

For an SO model you need to have clearly defined schemas for your web service messages, and those should be independent from your internal implementation (business objects, DataSets or whatever). I discuss this in Chapter 10 of my business objects books.

Monday, January 24, 2005 10:10:21 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 

OK, so Shawn has some good points about the use of a DataSet for the purpose of establishing a formal contract for your web service messages. This is in response to my previous entry about not using DataSets across web services.

 

The really big distinction between using web services for SO vs n-tier remains.

 

If you are doing SO, you need to clearly define your message schema, and that schema must be independent from your internal data structures or representations. This independence is critical for encapsulation and is a core part of the SO philosophy. Your service interface must be independent from your service implementation.

 

What technology you use to define your schema is really up to you. Shawn’s entry points out that you can use strongly typed DataSet objects to generate the schema – something I suppose I knew subconsciously but didn’t really connect until he brought it up. Thank you Shawn!

 

Any tool that generates an XSD will work to define your schema, and from there you can define proxy classes (or apparently use DataSet objects as proxies). Alternately you can create your “schema” in VB or C# code and get the XSD from the code using the wsdl.exe tool from the .NET SDK (though you have less control over the XSD this way).

 

But my core point remains, that this is all about defining your interface, and should not be confused with defining your implementation. Using a DataSet for a proxy is (according to Shawn) a great thing, and I’ll roll with that.

 

But to use that same DataSet all the way down into your business and data code is dangerous and fragile. That directly breaks encapsulation and makes you subject to horrific versioning issues.

 

And versioning is the big bugaboo hiding in web services. Web services have no decent versioning story. At the service API level the story is the same as it was for COM and DCOM – which was really bad and I believe ultimately helped drive the success of Java and .NET.

 

At the schema level the versioning story is better. It is possible to have your service accept different variations on your XML schema, and you can adapt to the various versions. But this implies that your service (or a pre-filter/adapter) isn’t tightly coupled to some specific implementation. I fear that DataSets offer a poor answer in this case.

 

And in any case, if you to create maintainable code you must be able to alter your internal implementation and data representation independently from your external contract. The XSD that all your clients rely on can’t change easily, and you must be able to change your internal structures more rapidly to accommodate changing requirements over time.

 

Again, I’m talking about SO here. If you are using web services for n-tier then the rules are all different.

 

In the n-tier world, your tiers are tightly coupled to start with. They are merely physical deployments of layers, and layers trust each other implicitly because they are just a logical organization of the code inside your single application. While there are still some real-world versioning issues involved, the fact is that they aren’t remotely comparable to the issues faced in the SO world.

Monday, January 24, 2005 9:13:49 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  | 
 Sunday, January 23, 2005

Every now and then the question comes up about whether to pass DataSet or DataTable objects through a web service.

 

I agree with Ted Neward that the short answer is NO!!

 

However, nothing is ever black and white…

 

For the remainder of this discussion remember that a DataSet is just a collection of DataTable objects. There’s no real difference between a DataTable and DataSet in the context of this discussion, so I’m just going to use the term DataSet to mean 1..n DataTables.

 

There are two “types” of DataSet – default and strongly typed.

 

Default DataSet objects can be converted to relatively generic XML. They don’t do this by default of course. So you must choose to either pass a DataSet in a form that is pretty much only useful to .NET code, or to force it into more generic XML that is useful by anyone.

 

To make this decision you need to ask yourself why you are using web services to start with. They are designed, after all, for the purpose of interop/integration. If you are using them for that intended purpose then you want the generic XML solution.

 

On the other hand, if you are misusing web services for something other than interop/integration then you are already on thin ice and can do any darn thing you want. Seriously, you are on your own anyway, so go for broke.

 

Strongly typed DataSet objects are a different animal. To use them, both ends of the connection need the .NET assembly that contains the strongly typed code for the object. Obviously interop/integration isn’t your goal here, so you are implicitly misusing web services for something else already, so again you are on your own and might as well go for broke.

 

Personally my recommendation is to avoid passing DataSet objects of any sort via web services. Create explicit schema for your web service messages, then generate proxy classes in VB or C# or whatever language based on that schema. Then use the proxy objects in your web service code.

 

Your web service (asmx) code should be considered the same as any other UI code. It should translate between the “user” presentation (XML based on your schema/proxy classes) and your internal implementation (DataSet, business objects or whatever).

 

I discuss this in Chapter 10 of my Business Objects books, but the concepts apply directly to data-centric programming just as they do to OO programming.

Sunday, January 23, 2005 9:06:20 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |