WebAPI Parameter binding under the hood

May 10, 2012, 10:25 pm

≫ Next: Per-controller configuration in WebAPI

I wrote about WebAPI’s parameter binding at a high level before. Here’s what’s happening under the hood. The most fundamental object for binding parameters from a request in WebAPI is a HttpParameterBinding. This binds a single parameter. The binding is created upfront and then is invoked across requests. This means the binding must be determined from static information such as the parameter’s name, type, or global config. A parameter binding has a reference to the HttpParameterDescriptor, which provides static information about the parameter from the action’s signature.

Here’s the key method on HttpParameterBinding:

public abstract Task ExecuteBindingAsync(
    ModelMetadataProvider metadataProvider, 
    HttpActionContext actionContext, 
    CancellationToken cancellationToken);

This is invoked on each request to perform the actual binding. It takes in the action context (which has the incoming request) and then does the binding and populates the result in the argument dictionary hanging off action context. This method returns a Task in case the binding needs to do an IO operation like read the content stream.

Examples of bindings

WebAPI has two major parameter bindings: ModelBindingParameterBinder or FormatterParameterBinder. The first uses model binding, and generally assembles the parameter from the URI. The second uses the MediaTypeFormatters to read the parameter from the content stream.

Ultimately, these are both just derived classes from HttpParameterBinding. Once WebAPI gets the binding, it just invokes the ExecuteBindingAsync method and doesn’t care about the parameter’s type, it’s name, whether it had a default value, whether it was model binding vs. formatters, etc.

However, you can always add your own. For example, suppose you want to bind action parameters of type IPrincipal to automatically go against the thread’s current principal. Clearly, this does touch the content stream or need the facilities from model binding. You could create a custom binding like so:

    // Example of a binder
    public class PrincipalParameterBinding : HttpParameterBinding
    {
        public PrincipalParameterBinding(HttpParameterDescriptor p) : base(p) { }
 
        public override Task ExecuteBindingAsync(ModelMetadataProvider metadataProvider, 
                                    HttpActionContext actionContext, CancellationToken cancellationToken)
        {
            IPrincipal p = Thread.CurrentPrincipal;
            SetValue(actionContext, p);

 
            var tsc = new TaskCompletionSource<object>();
            tsc.SetResult(null);
            return tsc.Task;
        }
    }

The binding really could do anything. You could have custom bindings that go and pull values from a database.

Normally, you wouldn’t need to plug your own HttpParameterBinding. Most scenarios could be solved by plugging a simpler interface, like adding a formatter or model binder.

Who determines the binding?

This is ultimately determined by the IActionValueBinder, which is a pluggable service. Here’s the order that the DefaultActionValueBinder looks in to get a binding. (I described an alternative binder here which has MVC like semantics.)

Look for a ParameterBindingAttribute

The highest precedence is to use a ParameterBindingAttribute, which can be places on a parameter site or a parameter type’s declaration. This lets you explicitly set the binding for a parameter.

    [AttributeUsage(AttributeTargets.Class | AttributeTargets.Parameter, Inherited = true, AllowMultiple = false)]
    public abstract class ParameterBindingAttribute : Attribute
    {
        public abstract HttpParameterBinding GetBinding(HttpParameterDescriptor parameter);
    }

The virtual function here hints that this is really the base class of a hierarchy. [FromBody] and [ModelBinder] attributes both derive from [ParameterBinding]. [FromUri] derives from [ModelBinder] and just invokes model binding and constrains the inputs to be from the URI.

In our example, we could create our own custom attribute to provide PrincipalParameterBindings.

Look at the ParamterBinding Rules in the Configuration

The HttpConfiguration has a collection of binding rules. This is checked if there is no ParameterBinding attribute. Here are some examples of setting some binding rules for certain types.

            HttpConfiguration config = new HttpConfiguration();
 
            ParameterBindingRulesCollection pb = config.ParameterBindingRules;
            pb.Insert(typeof(IPrincipal), param => new PrincipalParameterBinding(param)); // custom binder against request
            pb.Insert(typeof(Location), param => param.BindWithModelBinding(new LocationModelBinder())); 
            pb.Insert(typeof(string), param => param.BindWithFormatter(new CustomMediaFormatter()));

The first rule says that all IPrincipal types should be bound using our IPrincipal binder above.

The second rule says that all Location types should be bound using Model Binding, and specifically use the LocationModelBinder (which would implement IModelBinder).

The third rule says that all strings should be bound with the formatters.

Rules are executed in order and work against exact type matches.

The binding rules actually operate on a ParameterDescriptor. The Insert() methods above are just using a helper that filters based on the parameter’s type. So you could add a rule that binds on a parameter’s name, or even if the par

Setting rules lets your config describe how types should be bound, and alleviates needing to decorate every callsite with an attribute.

The configuration has some default entries in the parameter binding rules collection:

bind the cancellation token
bind the HttpRequestMessage (without this rule, HttpRequestMessage would be seen as a complex object and so we’d naturally try to read it from the body using a formatter)
prevent accidentally binding any class derived from HttpContent. (This is trying to protect users from accidentally having a formatter try to bind)

Since these are just regular entries in the rule collection, you can supersede them by inserting a broader rule in front of them. Or you can clear the collection completely.

The rules here are extremely flexible and can solve several scenarios:

Allow you to override WebAPI’s behavior for special types like cancellation token, HttpContent, or HttpRequestMessage. For example, if you did want the HttpContent to bind against the Request.Content, you could add a rule for that. Or if you had multiple cancellation tokens floating around and wanted to bind them by name, you could add a rule for that too.
Specify whether a type should use model binding or formatter by default. Maybe you have a complex type that should always use model binding (eg, Location in the above example). Just adding a formatter doesn’t mean that a type automatically uses it. Afterall, you could add a formatter and model binder for the same type. And some formatters and model binders eagerly claim to handle all types (eg, JSON.Net thinks it can handle anything, even a wacky type like a delegate or COM object). Same for model binding. So WebAPI needs a hint, and parameter binding rules can provide that hint.
Create a binding rule once that applies globally, without having to touch up every single action signature.
Create binding rules that require rich type information. For example, you could create a “TryParse” rule that looks if a parameter type has a “bool TryParse(string s, out T)” method, and if so, binds that parameter by invoking that method.
Instead of binding by type, bind by name, and coerce the value to the given parameter type.

Fallback to a default policy

If there is no attribute, and there is no rule that claims the parameter descriptor, than the default binder falls back to its default policy. That’s basically simple types are model bound against the URI, and complex types are read from the body using formatters.

↧

Per-controller configuration in WebAPI

May 11, 2012, 10:50 am

≫ Next: Strong binding for CSV reader

≪ Previous: WebAPI Parameter binding under the hood

We’ve just added support for WebAPI to provide per-controller-type configuration. WebAPI has a HttpConfiguration object that provides configuration such as:

route table
Dependency resolver for specifying services
list of Formatters, ModelBinders, and other parameter binding settings.
list of message handlers,

However, a specific controller may need its own specific services. And so we’ve added per-controller-type configuration. In essence, a controller type can have its own “shadow copy” of the global config object, and then override specific settings. This is automatically applied to all controller instances of the given controller-type. (This supersedes the HttpControllerConfiguration attribute that we had in Beta)

Some of the scenarios we wanted to enable here:

A controller may have its own specific list of Formatters, for both reading and writing objects.
A controller may have special dynamic actions that aren’t based on reflecting over C# methods, and so may need its own private action selector.
A controller may need its own IActionValueBinder. For example, you might have an HtmlController base class that has a MVC-style parameter binder that handles FormUrl data.

In all these cases, the controller is coupled to a specific service for basic correct operation, and these services really are private implementation of the controller that shouldn’t conflict with settings from other controllers. Per-controller config allows multiple controllers to override their own services and coexist peacefully in an app together.

How to setup per-controller config?

We’ve introduced the IControllerConfiguration interface:

public interface IControllerConfiguration
{
    void Initialize(HttpControllerSettings controllerSettings, 
                    HttpControllerDescriptor controllerDescriptor);
}

WebAPI will look for attributes on the controller that implement that interface, and then invoke them when initializing the controller-type. This follows the same inheritance order as constructors, so attributes on the base type will be invoked first.

The controllerSettings object specifies what things on the configuration can be overriden for a controller. This provides static knowledge of what things on a configuration can and can’t be specified for a controller. Obviously, things like message handlers and routes can’t be specified for a per-controller basis.

public sealed class HttpControllerSettings
{
    public MediaTypeFormatterCollection Formatters { get; }
    public ParameterBindingRulesCollection ParameterBindingRules { get; }
    public ServicesContainer Services { get; }        
}

So an initialization function can change the services, formatters, or binding rules. Then WebAPI will create a new shadow HttpConfiguration object and apply those changes. Things that are not changes will still fall through to the global configuration.

Example

Here’s an example. Suppose we have our own controller type, and we want it to only use a specific formatter and IActionValueBinder.

First, we add a config attribute:

[AwesomeConfig]
public class AwesomeController : ApiController
{
    [HttpGet]
    public string Action(string s)
    {
        return "abc";
    }
}

That attribute implementss the IControllerConfiguration:

class AwesomeConfigAttribute : Attribute, IControllerConfiguration
{
    public void Initialize(HttpControllerSettings controllerSettings, 
                           HttpControllerDescriptor controllerDescriptor)
    {
        controllerSettings.Services.Replace(typeof(IActionValueBinder), new AwesomeActionValueBinder());
        controllerSettings.Formatters.Clear();
        controllerSettings.Formatters.Add(new AwesomeCustomFormatter());
    }
}

This will clear all the default formatters and add our own AwesomeCustomFormatter. It will also the IActionValueBinder to our own AwesomeActionValueBinder. It also will not affect any other controllers in the system.

Setting a service on the controller here has higher precedence than setting services in the dependency resolver or in the global configuration.

The initialization function can also inspect incoming configuration and modify it. For example, it can append a formatter or binding rule to an existing list.

What happens under the hood?

This initialization function is invoked when WebAPI is first creating the HttpControllerDescriptor for this controller type. It’s only invoked once per controller type. WebAPI will then apply the controllerSettings and create a new HttpConfiguration object. There are some optimizations in place to make this efficient:

If there’s no change, it shares the same config object and doesn’t create a new one.
The new config object reuses much of the original one. There are several copy-on-write optimization in place. For example, if you don’t touch the formatters, we avoid allocating a new formatter collection.

Then the resulting configuration is used for future instances of controller. Calling code still just gets a HttpConfiguration instance and doesn’t need to care whether that instance was the global configuration or a per-controller configuration. So when the controller asks for formatters or an IActionValueBinder here, it will automatically pull from the controller’s config instead of the global one.

↧

Strong binding for CSV reader

May 19, 2012, 8:33 am

≫ Next: Converting between Azure Tables and CSV

≪ Previous: Per-controller configuration in WebAPI

I updated my open source CSV reader to provide parsing rows back into strongly typed objects. You can get it from Nuget as CsvTools 1.0.6.

For example, suppose we have a CSV file “test.csv” like so:

name, species, favorite fruit, score
Kermit, Frog, apples, 18%
Ms. Piggy, Pig, pears, 22%
Fozzy, Bear, bananas, 19.4%

You can open the CSV and read the rows with loose typing (as strings):

var dt = DataTable.New.Read(@"c:\temp\test.csv");
IEnumerable<string> rows = from row in dt.Rows select row["Favorite Fruit"];

But it’s very convenient to use strongly-typed classes. We can define a strongly-typed class for the CSV:

enum Fruit
{
    apples,
    pears,
    bananas,
}
class Entry
{
    public string Name { get; set; }
    public string Species { get; set; }
    public Fruit FavoriteFruit { get; set; }
    public double Score { get; set; }
}

We can then read via the strongly-typed class as:

var dt = DataTable.New.Read(@"c:\temp\test.csv"); 
Entry[] entries = dt.RowsAs<Entry>().ToArray(); // read all entries

We can also use linq expressions like:

IEnumerable<Fruit> x = from row in dt.RowsAs<Entry>() select row.FavoriteFruit;

What are the parsing rules?

Parsing can get arbitrarily complex. This uses some simple rules that solved the scenarios I had.

The parser looks at each property on the strong type, and matches that to a column from the CSV. Since property names are going to be restricted to C# identifiers, whereas row names can have arbitrary characters (and thus be invalid C# identifiers), the matching here is flexible. It will match properties to columns just looking at the alphanumeric characters. So the “FavoriteFruit” property matches to “Favorite Fruit” field name.

To actually parse the row value from a string to the target type, T, it uses the following rules:

if T is already a string, just return the value
special case doubles parsing to allow the percentage sign. (Parse 50% as .50).
if T has a TryParse(string, out T) method, then invoke that. I found TryParse to be significantly faster than invoking a TypeConverter.
Else use a TypeConverter. This is a general and extensible hook.

Errors are ignored. The rationale here is that if I have 3 million rows of CSV data, I don’t want to throw an exception on reading just because one row has bad data.

Under the hood

DataTable.RowsAs<T>() uses expression trees to build a strongly typed dynamic method of Func<Row, T>. I originally uses reflection to enumerate the properties, and then find the appropriate parsing technique, and set the value on the strong type. Switching to pre-compiled methods was about a 10x perf win.

In this case, the generated method looks something like this:

class EnumParser
{
    const int columnIndex_Name = 0;
    const int columnIndex_species = 1;

    TypeConverter _convertFavoriteFruit = TypeDescriptor.GetConverter(typeof(Fruit));
    const int columnIndex_Fruit = 2;

    const int columnIndex_Score = 3;

    public Entry Parse(Row r)
    {
        Entry newObj = new Entry();
        newObj.FavoriteFruit = (Fruit) _convertFavoriteFruit.ConvertFrom(r.Values[columnIndex_Fruit]);
        newObj.Name = r.Values[columnIndex_Name];
        newObj.Species = r.Values[columnIndex_species];
        newObj.Score = ToDouble(r.Values[columnIndex_Score]);
        return newObj;
    }    
}

The parse method is a Func<Row, Entry> which can be invoked on each row. It’s actually a closure so that it can capture the TypeConverters and only do the lookup once. The mapping between property names and column names can also be done upfront and captured in the columnIndex_* constants.

↧

Converting between Azure Tables and CSV

August 3, 2012, 6:02 pm

≫ Next: Reflection vs. Metadata

≪ Previous: Strong binding for CSV reader

I published a nuget package (CsvTools.Azure) to easily read/write CSVs to azure blobs and tables. It builds on the CSV reader, also on Nuget (see CsvTools) and GitHub (https://github.com/MikeStall/DataTable ).

Azure Tables are very powerful, but can be tricky to use. I wanted something that:

handled basic scenarios, such as uploading a CSV file to an Azure table with strongly typed columns, and downloading an Azure table as a CSV that I could then open in Excel.
Was easy to use and could accomplish most operations in a single line.
Could still be type-safe.
Had intelligent defaults. If you didn’t specify a partition key, it would infer one. If the defaults weren’t great, you could go back and improve them.

The CsvTools.Azure nuget package adds extension methods for DataTable, contained in the core CsvTools package. These extension methods save a DataTable to an Azure blob or table, and can read a DataTable from an azure blob or table.

Examples with Azure Blobs

Writing to and from blobs is easy, since blobs resemble the file system. Here’s an example to write a data and read it back from blob storage.

        var dt = DataTable.New.Read(@"c:\temp\test.csv");

        // Write and Read from blobs
        dt.SaveToAzureBlob(Account(), "testcontainer", "test.csv");
        var dataFromBlob = DataTable.New.ReadAzureBlob(Account(), "testcontainer", "test.csv"); // read it back

These code snippets assume a sample CSV at c:\temp\test.csv:

name, species, score
Kermit, Frog , 10
Ms. Piggy, Pig , 50
Fozzy, Bear , 23

Examples with Azure Tables

The scenarios I find interesting with Csv and Azure Tables are:

Ingress: Uploading a CSV as an azure table. I successfully uploaded a 3 million row CSV into Azure using this package. While CSVs don’t support indexing, once in Azure, you can use the standard table query operators (such as lookup by row and partition key)
Egress: download an Azure table to a CSV. I find this can be useful for pulling down a local copy of things like logs that are stored in azure tables.

Azure Tables have some key differences from a CSV file:

	Azure Tables	CSV
special columns and indexing	every row in an Azure Tables has a Partition and Row key. These keys combine to form a unique index and have several other key properties documented on MSDN.	No unique keys for indexing, and no mandated columns.
Schema?	Each row can have its own schema.	All rows have the same schema. A CSV is conceptually a 2d array of strings.
Typing	The “columns” in an Azure Tables are strongly typed.	CSVs are all strings
naming	table and column names are restricted. See naming rules on msdn.	No naming restriction on columns.

Practically, this means when “uploading” a CSV to an Azure Table, we need to provide the types of the columns (or just default to everything being strings). When “downloading” an Azure Table to a CSV, we assume all rows in the table have the same schema.

Uploading a CSV to an Azure Table

Here’s an example of uplaoding a datatable as an Azure table:

// will fabricate partition and row keys, all types are strings
dt.SaveToAzureTable(Account(), "animals");

And then the resulting azure table, as viewed via Azure Storage Explorer. You can see the single line only supplied a) an incoming data table, b) a target name for the azure table to be created. So it picked intelligent defaults for the partition and row key, and all columns are typed as string.

We can pass in an Type[] to provide stronger typing for the columns. In this case, we’re saving the “score” column as an int.

// provide stronger typing
var columnTypes = new Type[] { typeof(string), typeof(string), typeof(int) };
dt.SaveToAzureTable(Account(), "animals2", columnTypes);

How is the partition and row key determined when uploading?

Every entity in an azure table needs a Partition and Row Key.

If the CSV does not have columns named PartitionKey or RowKey, then the library will fabricate values. The partition key will be a constant (eg, everything gets put on the same partition), and the RowKey is just a row counter.
If the csv has a column for PartitionKey or RowKey, then those will be used.
One of the overloads to SaveToAzureTable takes a function that can compute a partition and row key per row.

Here’s an example of the 3rd case, where a user provided function computes the partition and row key on the fly for each row.

dt.SaveToAzureTable(Account(), "animals3", columnTypes, 
  (index, row) => new ParitionRowKey { PartitionKey = "x", RowKey = row["name"] });

Downloading an Azure Table as a CSV

Here we can download an Azure table to a CSV in a single line.

            var dataFromTable = DataTable.New.ReadAzureTableLazy(Account(), "animals2");

The convention in the CsvTools packages is that methods ending in “Lazy” are streaming, so this can handle larger-than-memory tables.

We can then print it out to the console (or any stream) or do anything else with the table. For example, to just dump the table to the console, do this:

dataFromTable.SaveToStream(Console.Out); // print to console

And it prints:

PartitionKey,RowKey,Timestamp,name,species,score
1,00000000,2012-08-03T21:04:08.1Z,Kermit,Frog,10
1,00000001,2012-08-03T21:04:08.1Z,Ms. Piggy,Pig,50
1,00000002,2012-08-03T21:04:08.103Z,Fozzy,Bear,23

Notice that the partition key, row key, and timestamp are included as columns in the CSV.

Of course, once we have a DataTable instance, it doesn’t matter that it came from an Azure Table. We can use any of the normal facilities in CsvTools to operate on the table. For example, we could use the strong binding to convert each row to a class and then operate on that:"

// Read back from table as strong typing 
var dataFromTable = DataTable.New.ReadAzureTableLazy(Account(), "animals2");
IEnumerable<Animal> animals = dataFromTable.RowsAs<Animal>();
foreach (var row in animals)
{
    Console.WriteLine("{0},{1},{2}%", row.name, row.species, row.score / 100.0);
}

// Class doesn't need to derive from TableContext
class Animal
{
    public string name { get ;set; }
    public string species { get ; set; }
    public int score { get ;set; }
}

Full sample

Here’s the full sample.

This is a C# 4.0 console application (Client Profile), with a Nuget package reference to CsvTools.Azure, and it uses a dummy csv file at c:\temp\test.csv (see above).

When you add the nuget reference to CsvTools.Azure, Nuget’s dependency management will automatically bring down references to CsvTools (the core CSV reader that implements DataTable) and even the azure storage libraries. I love Nuget.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using DataAccess;
using Microsoft.WindowsAzure;

class Program
{
    static CloudStorageAccount Account()
    {
        return CloudStorageAccount.DevelopmentStorageAccount;
    }

    static void Main(string[] args)
    {
        var dt = DataTable.New.Read(@"c:\temp\test.csv");

        // Write and Read from blobs
        dt.SaveToAzureBlob(Account(), "testcontainer", "test.csv");
        var dataFromBlob = DataTable.New.ReadAzureBlob(Account(), "testcontainer", "test.csv"); // read it back
        dataFromBlob.SaveToStream(Console.Out); // print to console

        // Write and read from Tables

        // will fabricate partition and row keys, all types are strings
        dt.SaveToAzureTable(Account(), "animals");

        // provide stronger typing
        var columnTypes = new Type[] { typeof(string), typeof(string), typeof(int) };
        dt.SaveToAzureTable(Account(), "animals2", columnTypes);

        {
            Console.WriteLine("get an Azure table and print to console:");
            var dataFromTable = DataTable.New.ReadAzureTableLazy(Account(), "animals2");
            dataFromTable.SaveToStream(Console.Out); // print to console
            Console.WriteLine();
        }

        {
            Console.WriteLine("Demonstrate strong typing");
            // Read back from table as strong typing 
            var dataFromTable = DataTable.New.ReadAzureTableLazy(Account(), "animals2");
            IEnumerable<Animal> animals = dataFromTable.RowsAs<Animal>();
            foreach (var row in animals)
            {
                Console.WriteLine("{0},{1},{2}%", row.name, row.species, row.score / 100.0);
            }
        }

        // Write using a row and parition key        

    }

    // Class doesn't need to derive from TableContext
    class Animal
    {
        public string name { get ;set; }
        public string species { get ; set; }
        public int score { get ;set; }
    }
}

↧

Reflection vs. Metadata

August 6, 2012, 2:21 pm

≫ Next: AttributeUsage.Inherited flag

≪ Previous: Converting between Azure Tables and CSV

Here are some old notes I had about Reflection vs. the raw IMetadata Import interfaces. They’re from a while ago (before CLR 4.0 was shipped!), but still relevant. Better to share late than never!

Quick reminder on the two APIs I’m comparing here:

Reflection is the managed API (System.Type and friends) for reading metadata. The CLR’s implementation is built on top of the CLR loader, and so it’s geared towards a “live” view of the metadata. This is what everybody uses because it’s just so easy. The C# typeof() keyword gives you a System.Type and you’re already into the world of reflection.
IMetaDataImport is an unmanaged COM-classic API which is much lower level. ILDasm is built on IMetadataImport.

The executive summary is that the IMetadata APIs are a file format decoder and returns raw information. The Reflection APIs are a much higher abstraction level that include the metadata and other information in the PE file, fusion, CLR loader, and present a high-level type-system object model.

This difference means that while Reflection and Metadata are conceptually similar, there are things in Reflection that aren’t in the metadata and things in the metadata that aren’t exposed in reflection.

This is not an exhaustive list.

Differences because Reflection can access the loader

Reflection explicitly does eager assembly loading

The only input to IMetaDataImport is the actual bits in a file. It is a purely static API.

In contrast, reflection is a veneer over the CLR loader and thus can use the CLR Loader, fusion, assembly resolution, and policy from current CLR bindings as input.

In my opinion, that’s the most fundamental difference between them, and the root cause of many other differences.

This means that reflection can use assembly resolution, and that causes many differences with raw IMetadataImport:

auto resolving a TypeRef and TypeSpecs to a TypeDef. You can't even retrieve the original typeRef/TypeSpec tokens from the reflection APIs (the metadata tokens it gives back are for the TypeDefs). Same for MethodDef vs. MemberRef.
Type.GetInterfaces() - it returns interfaces from the base type.
Type.GetGenericArguments() - as noted here already.
random APIs like: Type.IsClass, IsEnum, IsValueType - these check the base class, and thus force resolution.
determining if a MemberRef token is a constructor or a method because they have different derived classes (see below).
representing illegal types ("Foo&&"). Reflection is built on the CLR Loader, which eagerly fails when loading an illegal type. (See below.)
Assembly.GetType(string name) will automatically follow TypeForwarders, which will cause assembly loading.

The practical consequence is that a tool like ILDasm can use IMetaDataImport to inspect a single assembly (eg, Winforms.dll) without needing to do assembly resolution. Whereas a reflection-based tool would need to resolve the assembly references.

Different inputs

While Reflection is mostly pure and has significant overlap with the metadata, there is no barrier to prevent runtime input sources from leaking through the system and popping up in the API. Reflection exposes things not in the PE-file, such as:

additional interfaces injected onto arrays by the CLR loader (see here).
Type.get_Guid - if the guid is not represented in the metadata via a Guid attribute, reflection gets the guid via a private algorithm buried within the CLR.

Generics + Type variables + GetGenericArguments()

In Reflection, calling GetGenericArguments() on a open generic type returns the System.Type objects for type-variables. Whereas in metadata, this would be illegal. You could at best get the generic argument definitions from the type definition.

In reflection, if you pass in Type variables to Type.MakeGenericType(), you can get back a type-def. Whereas in metadata, you'd still have a generic type. Consider the following snippet:

var t = typeof(Q2<>); // some generic type Q2<T>

var a1 = t.GetGenericArguments(); // {"T"}

var t2 = t.MakeGenericType(a1);

Debug.Assert(t.Equals(t2));

Debug.Assert(t2.Equals(t));

In other words, metadata has 2 distinct concepts:

The generic type arguments in the type definition (see IMDI2::EnumGenericParams)
The generic type arguments from a type instantiation (as retrieved from a signature blob, see CorElementType.GenericInsantiation=0x15).

In reflection, these 2 concepts are unified together under the single Type.GetGenericArguments() API. Answering #1 requires type resolution whereas #2 can be done on a type-ref. This means that in reflection, you can't check for generic arguments without potentially doing resolution.

PE-files

Reflection exposes interesting data in the PE-file, regardless of whether it's stored in the metadata or rest of the PE-file.

Metadata is just the metadata blob within the PE file. It's somewhat arbitrary what's in metadata vs. not. Metadata can have RVAs to point to auxiliary information, but the metadata importer itself can't resolve those RVAs.

Interesting data in the PE but outside of the metadata:

the entry point token is in CorHeaders outside of the metadata.
Method bodies (including their exception information) are outside the metadata.
RVA-based fields (used for initializing constant arrays)
embedded resources

File management

It is possible to set policy on AppDomain that will require Assembly.Load to make a shadow copy of an assembly before it's loaded and open that copy instead of assembly in the original location. This policy allows user to specify where shadow copies should be created.

Failure points

IMetadataImport only depends on the bits in the file, so it has few failure points after opening. In contrast, Reflection has many dependencies, each of which can fail. Furthermore, there is no clear mapping between a reflection API and the services it depends on, so many reflection APIs can randomly fail at random points.

Also, IMetadataImport allows representing invalid types, whereas Reflection will eagerly fail. For example, it is illegal to have a by-ref to a by-ref, (eg, "Foo&&"). Such a type can still be encoded in the metadata file format via ildasm, and IMetaDataImport will load it and provide the signature bytes. However, Reflection will eagerly fail importing because the CLR Loader won't load the type.

Detecting failures requires eagerly resolving types, so there is a tension between making Reflection a deferred API vs. keeping the eager-failure semantics.

COM-Interop

Reflection represents types at runtime, like COM-interop objects. Whereas metadata only provides a static typing.

Loader-added interfaces on arrays

In .Net 2.0, generic interfaces are added for arrays at runtime. So this expression "typeof(int[]).GetInterfaces()" returns a different result on .NET 2.0 vs. .NET 1.1; even if it's an identical binary. I mentioned this example in more detail here.

Differences in Object model

Reflection deliberately tries to be a higher level friendly managed API, and that leads to a bunch of differences

MemberInfo.ReflectedType property

Reflection has a ReflectedType property which is set based on how an item is queried. So the same item, queried from different sources, will have a different ReflectedType property, and thus compare differently. This property is entirely a fabrication of the reflection object model and does not correspond to the PE file or metadata.

Different object models for Type

The raw metadata format is very precise and represents types in a variety of distinct ways:

TypeDef
TypeRef
TypeSpec, Signature blobs
builtins ("I4")
arrays,
modifiers (pointer, byref)
Type variables (!0, !!0)

These are all unique separate entities with distinct properties which in the metadata model, conceptually do not share a base class. In contrast, Reflection unifies these all into a single common System.Type object. So in reflection, you can't find out if your class’s basetype is a TypeDef or a TypeRef.

Psuedo-custom attributes

Reflection exposes certain random pieces of metadata as faked-up custom attribute instead of giving it a dedicated API the way IMetaDataImport does.

These "pseudo custom attributes" (PCAs) show up in the list of regular custom attributes with no special distinction. This means that requesting the custom attributes in reflection may return custom attributes not specified in the metadata. Since different CLR implementations add different PCAs, this list could change depending on the runtime you bind against.

Some examples are the Serialization and TypeForwardedTo attributes.

Custom Attributes

To get an attribute name in reflection, you must do CustomAttributeData.Constructor.DeclaringType.FullName.

This is a cumbersome route to get to the custom attribute name because it requires creating several intermediate objects (a ConstructorInfo and a Type), which may also require additional resolution. The raw IMetadataImport interfaces are much more streamlined.

ConstructorInfo vs. MethodInfo

Metadata exposes both Constructors and MethodInfos as tokens of the same type (mdMethodDef). Reflection exposes them as separate classes which both derive from MemberBase. This means that in order to create a reflection object over a MemberDef token, you must do some additional metadata resolution to determine whether to allocate a MemberInfo or a ConstructorInfo derived class. This has to be determine when you first allocate the reflection object and can't be deferred, so it forces eager resolution again.

Type Equivalence and Assignability

Reflection exposes a specific policy for type equivalence, which it inherits from the CLR loader. Metadata just exposes the raw properties on a type and requires the caller to determine if types are equivalent.

For example, Reflection has the Type.IsAssignableFrom API, which may invoke the CLR Loader and Fusion, as well as CLR-host version specific Type Unification policies (such as no-PIA support) to determine if types are considered equal. The CLR does not fully specify the behavior of Type.IsAssignableFrom,.

Case sensitivity matching

Metadata string APIs are case sensitive. Reflection string APIs often take a "ignoreCase" flag to facilitate usage with case insensitive languages, like VB.

↧

AttributeUsage.Inherited flag

July 17, 2013, 2:00 pm

≫ Next: $5000 in prizes at upcoming Hackathon

≪ Previous: Reflection vs. Metadata

I found the documentation for AttribuetUsageAttribute to be very ambiguous, particularly regarding the Inherited property. Here’s a quick test on the behavior of the AttributeUsage.Inherited flag. This affects how the attribute is queried via GetCustomerAttributes(). Here’s how …

There are 4 pivots:

1. In the usage case, is there an attribute on the derived class of the same type as the base class?

    [Test(Value="Base")]
    class Base
    {
    }

    [Test(Value = "Derived")]
class Derived : Base
    {
    }

2. On AttributeUsage, setting AllowMultiple

3. On AttributeUsage, setting Inherited

    [AttributeUsage(AttributeTargets.Class, AllowMultiple=false, Inherited=false)]
    class TestAttribute : Attribute
    {
        public string Value { get; set; }

        public override string ToString()
        {
            return Value;
        }
    }

4. In the query case, is inherit true or false?

    var attrs = typeof(Derived).GetCustomAttributes(inherit: true);

Outputs for each combination

That’s 2^4 = 16 cases. Here are the outputs for the following code snippet. Blank cells means false. Results is

foreach (var attr in attrs)
            {
                Console.WriteLine(attr);
            }

	[1] Attr on Derived class?	[2] AttributeUsage AllowMultiple=?	[3] AttributeUsage Inherited=?	[4]GetCustomAttributes Inherit=	Console output:
1	Yes	True	True	True	Derived Base
2	Yes	True	True		Derived
3	Yes	True		True	Derived
4	Yes	True			Derived
5	Yes		True	True	Derived
6	Yes		True		Derived
7	Yes			True	Derived
8	Yes				Derived
9	No	True	True	True	Base
10	No	True	True		(none)
11	No	True		True	(none)
12	No	True			(none)
13	No		True	True	Base
14	No		True		(none)
15	No			True	(none)
16	No				(none)

Other comments:

I found case #1 vs. case #5 to be particularly interesting.

Of course, you can always manually walk the inheritance chain. This has the added perk of telling you at which level the attribute occurs :

for(var t = typeof(Derived); t != null; t = t.BaseType)            
            {
                foreach(var attr in t.GetCustomAttributes(inherit:false))
                {
                    Console.WriteLine("type={0},attr={1}", t.Name, attr);
                }
            }

Assuming the derived class has an attr, this will print:

type=Derived,attr=Derived
type=Base,attr=Base
type=Object,attr=System.Runtime.InteropServices.ComVisibleAttribute
type=Object,attr=System.Runtime.InteropServices.ClassInterfaceAttribute
type=Object,attr=__DynamicallyInvokableAttribute
type=Object,attr=System.SerializableAttribute

Note the attributes on System.Object. These have inherited=false, so don’t show up until we explicitly query.

↧

$5000 in prizes at upcoming Hackathon

November 25, 2013, 5:57 pm

≫ Next: Hackathon tools

≪ Previous: AttributeUsage.Inherited flag

There’s a Hackathon at the Microsoft main campus on Dec 6th / 7th. $5000 in prizes, Free registration, and food is included! This is a 24-hour crash session to put your skills to use and go build some awesome ideas.

It’s put on by LincolnLabs. I went to a similar Hackathon they ran in San Francisco and had a blast.

The theme is “Open Government”, which broadly includes greater visibility into government and greater accountability on government. If you’re in the area, register and come out!

↧

Hackathon tools

December 6, 2013, 9:51 am

≫ Next: WebJobs SDK

≪ Previous: $5000 in prizes at upcoming Hackathon

In light of the upcoming Hackathon , it’s worth noting there’s a lot of great free Microsoft offerings to help with hackathons, including excellent integration with open source technologies (like PHP, Python, GIT). A lot of these are cloud based and so work with any platform (Mac, iPhone, Android). Here are some quick links to get you started:

Free C# / HTML editor. (Visual Studio)

You can download VS Express for free, which provides a complete C# editor, compiler, and debugger. It also includes XML and HTML editors. In addition, you can find lots of free libraries via the Nuget package manager .

Free website hosting via Azure Web Sites (aka “Antares”).

Azure is Microsoft’s cloud offering, which includes support for hosting websites, cloud storage, compute, and much more. You can create a free Azure trial account.

You can host websites for free via Azure Web Sites, which includes support for ASP.Net, PHP, and more. See a tutorial here.

Host your PHP Website in the cloud (Azure with WebMatrix)

WebMatrix is a free editor with PHP, HTML, Node support and includes integrated source control and cloud publishing. WebMatrix is like a streamlined version of VisualStudio for web development.

See the tutorial here.

And another tutorial using PHP and MySQL.

Host RESTful webservices in the cloud (WebApi on Azure)

You can quickly mock up a RESTful API using WebAPI and host live in the cloud.

See tutorial here.

Host your GIT / TFS source control (Codeplex)

CodePlex provides free GIT hosting, along with team management and bug tracking for your project. For example, the source for ASP.Net / MVC / WebAPI is hosted live on codeplex.

Python IDE (Python Tools for VS)

Python is a highly productive dynamic language perfect for hackathons. Python Tools for VS provides excellent integration with VS for editing, REPL, debugging support, and testing. Check out the screen shots for an impressive overview.

Free document and collaboration (SkyDrive)

See http://skydrive.com for hosting free groups with your peers that include email lists, document sharing, and online excel and word viewable through your browser.

Interesting real world data sets (Azure DataMarket)

Do amazing things by mashing up impressive data sets. Azure Data Market provides lots of data sets, ranging from global energy consumption statistics to real GDP per capita by state. You can also create and sell your own data sets here.

Browse here for free data sets.

Solutions for startups (Bizspark)

The Bizspark program provides free softeware and support for startups. See here.

Search (Bing)

Search via http://www.bing.com/

↧

WebJobs SDK

January 24, 2014, 1:21 pm

≫ Next: Getting a dashboard for local development with the WebJobs SDK

≪ Previous: Hackathon tools

We’ve released an alpha of “WebJobs SDK”, a simple framework that makes it crazy easy to write code that runs on Azure and binds against azure storage. (The project was internally codenamed “SimpleBatch” and also known amongst a few as “Project Awesome”). Scott Hanselman has a great blog about SimpleBatch, and there’s also an excellent getting started tutorial.

The basic idea is that you can write normal C# functions and then add some bindings that connect their parameters to Azure storage (notably blobs, tables, and queues). We then have a dashboard that provides free diagnostics into your functions execution.

Simple code…

Consider an example where we want to resize an image and then stamp it with a water mark. We may write a Resize() and ApplyWaterMark() function like this:

public class ImageFuncs
    {
        // Given an image stored in blob, resize it and save the output back to a blob.
        // Use WebImage class from System.Web.Helpers. 
        public static void Resize(
            [BlobInput(@"images-input/{name}")] Stream inputStream,
            [BlobOutput(@"images-output/{name}")] Stream outputStream)
        {
            WebImage input = new WebImage(inputStream);

            var width = 80;
            var height = 80;

            WebImage output = input.Resize(width, height);
            output.Save(outputStream);
        }

        // Take the resulting image from Resize() and stamp a watermark onto it. 
        // The watermark is the filename minus the extension. 
        public static void ApplyWaterMark(
            [BlobInput(@"images-output/{name}")] Stream inputStream,
            string name,
            [BlobOutput(@"images-watermarks/{name}")] Stream outputStream)
        {
            WebImage image = new WebImage(inputStream);
            image.AddTextWatermark(name, fontSize: 20, fontColor: "red");
            image.Save(outputStream);
        }
    }

static class WebImageExtensions
    {
        public static void Save(this WebImage image, Stream output)
        {
            var bytes = image.GetBytes();
            output.Write(bytes, 0, bytes.Length);
        }
    }

First thing to note is that these functions are using regular FX types and so can be easily unit tested running in-memory or against the local file system. This can really be just a normal console application – there’s nothing about it that needs to be azure aware.

Bindings and Triggers

The magic above is the BlobInput / BlobOutput attributes. Those are pulled in via the Microsoft.WindowsAzure.Jobs nuget package. These bind the Stream parameters directly to azure blobs. The string to these attributes is a blob path meaning “Container/BlobName”, and the {name} is like a Route parameter that uses basic pattern matching rules. So “images-input/{name}” matches any blob in container “images-input”, and the blob name is captured in the {name} route parameter.

As you see in the ApplyWaterMark() function, you can directly capture the route parameters via a parameter. Eg, the ‘string name’ parameter captures the {name} route parameter. This can be useful if you need programmatic access to part of the blob’s name.

The route parameters from the [BlobInput] are then passed to the [BlobOutput]. So Resize() would execute when a new blob “images-input/fruit.jpg” is detected, and it would produce an output blob “images-output/fruit.jpg”

You may notice that the output from Resize() maps nicely to the input of ApplyWaterMark(). That provides a defacto way that one function can chain execution to another.

This example demonstrates bindings for blobs, but there are also bindings for queues and tables.

Hosting and executing .

So what actually reads the bindings, listens on the triggers, and invokes the functions? That’s the JobHost object (which lives in Microsoft.WindowsAzure.Jobs.Host ). Here’s a canonical main function:

static void Main()
        {
            JobHost h = new JobHost();
            h.RunAndBlock();
        }

Short and sweet! With the default JobHost ctor, the azure connection strings are pulled from the app.config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <connectionStrings>    
    <add name="AzureJobsRuntime" connectionString="DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???"/>
    <add name="AzureJobsData" connectionString="DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???"/>
  </connectionStrings>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
  </startup>
</configuration>

AzureJobsData is the connection string that the BlobInput/BlobOuput attributes will bind against.
AzureJobsRuntime is where the logging data goes. Both connection strings can point to the same account.

JobHost lives in Microsoft.WindowsAzure.Jobs.Host and it will:

Get azure account strings from your app.config (the strings can also be passed in directly to the ctor).
Reflect over your code to find C# methods with the SimpleBatch attributes. (much like how WebAPI discovers controllers)
Listen for new blobs that match the [BlobInput] pattern.
when a blob is found, invoke the function
automatically log the invocation so that you can view the results in a separate dashboard.

The Dashboard

As functions execute, you’ll see the logs in the SimpleBatch Dashboard. The dashboard runs as an Azure Websites Site Extension and getting to the dashboard is detailed in the tutorial above.

Here’s what the dashboard homepage looks like. You’ll notice it lists the function invocation history on the right.

We can see that Resize() and ApplyWaterMark() each ran twice. We can click on an instance to see details about a specific execution:

Every function instance gets a unique GUID and a permalink showing it’s execution history. In this case, the dashboard is showing us:

basic information like when the function ran, how long, and any console output
any exceptions thrown.
an explanation as to why this function was run in the first place (in this case, a new blob input was detected)
parameter information, including how many bytes each read, and even how long it spent in IO.
Any children function triggered by this function. IE, Resize() will create a blob that causes ApplyWaterMark() to run.

There are additional navigation options, such as seeing which function instance wrote each of the input blobs.

Note that you get all of this diagnostic logging for free, without any additional instrumentation or configuration in your code. As Scott Hanselman said, “Minimal ceremony for maximum results”.

In summary

SimpleBatch makes it very easy to write code in azure and it provides excellent diagnostics. It has two parts:

Client-side dlls, available via the Microsoft.WindowsAzure.Jobs.Host nuget package.
A dashboard for viewing function result. The dashboard runs as an Azure Websites Site Extension.

Here are some more links to other resources

Scott Hanselman’s introduction to WebJobs
WebJobs SDK Getting Started Tutorial
WebJobs SDK Samples on CodePlex (see under Samples/AzureWebJobs)
NuGet packages: Microsoft.WindowsAzure.Jobs, Microsoft.WindowsAzure.Jobs.Host

↧

Getting a dashboard for local development with the WebJobs SDK

January 27, 2014, 8:25 am

≫ Next: Trigger, Bindings, and Route parameters in AzureJobs

≪ Previous: WebJobs SDK

This blog post describes how developers can do local development using the recently announced an alpha release of WebJobs SDK (aka SimpleBatch). The client side code is just pulling down JobHost class from the Microsoft.WindowsAzure.Jobs.Host nuget package, so you can develop inside a console app and run them locally. In fact, since the SDK is just a nuget package, you can include it in any type of application, such as a console app, class library, ASP.Net webpage (useful for your background processing!), or even an Azure Worker Role. The ideal flow here is that you start with local development and then publish your code to run in Azure.

Although the client framework can be run anywhere, the SimpleBatch dashboard (eg, the cool website that gives you the awesome diagnostics) currently only runs as an Azure Websites Site Extension. However, all communication between your app and the corresponding dashboard is done entire through an azure storage account, to which you provide the “AzureJobsRuntime” connection string or via the JobHosts ctor.

That means there’s a decoupling between where your JobHost runs and where the dashboard runs. You can run your JobHost locally while developing it, and still use the same dashboard from your site extension.

Multiple JobHosts can share the same logging storage account, so you can have a single SimpleBatch dashboard service multiple programs. However, be careful about multiple instances of your jobhost running when you use [QueueInput] attributes since the queue message will only go to the one instance that dequeues the message. Normally, that’s exactly what you want. But it can backfire if you’re trying to do local development of a [QueueInput] function and you have a cloud instance running that’s stealing all your messages.

How to get a “SimpleBatch” Dashboard for a given storage account?

This is just an Alpha release, and I’m expecting getting to the dashboard will get much easier in our Beta. But I wanted to share out a tip to use in the meantime.

1) Get the storage connection string you are using for JobHost to log to. This could be from “AzureJobsRuntime” connection string in your app.config or via the “runtimeConnectionString” parameter explicitly passed into the JobHost ctor.

The connection string is a single string that will include both the account name and password, and it looks something like this:

DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???

2) Create a new Azure Web Site (that lives in the same datacenter as your storage account).

3) Point the site to that storage account by setting the “AzureJobsRuntime” connection string in the site’s “Configure” tab.

Save connection string

4) Get the site’s publish credentials (username and password). Access to the SimpleBatch dashboard is protected using your site’s publish credentials. Get these credentials so you can log into the site.

You can get the credentials form the publish profile. Some steps from the tutorial:

Click the site’s Dashboard tab in the Azure Portal .
Under Quick glance, click Download the publish profile.
After you download the .publishsettings file, open it in Notepad or another text editor.
Find the first instance of userName= and userPWD=. These are the credentials you need. For example: <publishData><publishProfile profileName="webjobssite - Web Deploy" publishMethod="MSDeploy" . . . msdeploySite="webjobssite" userName="$webjobssite" userPWD="StjxBlJXnrfahHvbWs92MA6HKlvvLEZznr1EFA6kzSGKbY4v" . . .

Note that the publish file may have multiple username and passwords in it.

5) Go to the dashboard URL. You can manually get this URL based on your site’s URL. If your site URL is:

http://MYSITE.azurewebsites.net

Then the dashboard is a site extension that lives at:

https://MYSITE.scm.azurewebsites.net/azurejobs

The transform here is:
    a. HTTP –> HTTPS
    b. The “.scm” in the URL
    c. The “/azurejobs” suffix

This URL will also show up in the webjobs page. However, when doing local development, you won’t have a WebJob so it can be useful to know how to jump to the SimpleBatch dashboard directly. Save this URL for future usage.

↧

Trigger, Bindings, and Route parameters in AzureJobs

January 28, 2014, 10:28 am

≫ Next: Azure Storage Bindings Part 1 – Blobs

≪ Previous: Getting a dashboard for local development with the WebJobs SDK

We recently an alpha for WebJobs SDK (aka AzureJobs , and internally codenamed “SimpleBatch”). In this blog entry, I wanted to explain how Triggers, Bindings, and Route Parameters worked in AzureJobs.

A function can be “triggered” by some event such as a new blob, new queue message, or explicit invocation. JobHost (in the Microsoft.WindowsAzure.Jobs.Host nuget package) listens for the triggers and invokes the functions.

The trigger source also provides the “route parameters”, which is an extra name-value pair dictionary that helps with binding. This is very similar to WebAPI / MVC. The trigger event provides the route parameters, and then the parameter can be consumed in other bindings:

1. Via a [BlobOutput] parameter
2. Via an explicit parameter capture.

Trigger on new blobs

Example usage:

This happens when the first attribute is [BlobInput] and the function gets triggered when a new blob is detected that matches the pattern.

public static void ApplyWaterMark(
            [BlobInput(@"images-output/{name}")] Stream inputStream,
            string name,
            [BlobOutput(@"images-watermarks/{name}")] Stream outputStream)
        {
            WebImage image = new WebImage(inputStream);
            image.AddTextWatermark(name, fontSize: 20, fontColor: "red");
            image.Save(outputStream);
        }

When does it execute?

The triggering system will compare timestamps for the input blobs to timestamps from the output blobs and only invoke the function if the inputs are newer than the outputs. This simple rules makes the system very easy to explain and prevents the system from endlessly re-executing the same function.

Route parameters:

In this case, the route parameter is {name} from BlobInput, and it flows to the BlobOutput binding. This pattern lends itself nicely for doing blob-2-blob transforms.

The route parameter is also captured via the “name” parameter. If the parameter type is not string, the binder will try to convert via invoking the TryParse method on the parameter type. This provides nice serialization semantics for simple types like int, guid, etc. The binder also looks for a TypeConverter, so you can extend binding to your own custom types.

In WebAPI, route parameters are provided by pattern matching against a URL. In AzureJobs, they’re provided by pattern matching against a BlobInput path (which is morally similar to a URL). This case was implemented first and so the name kind of stuck.

Trigger on new queue message

Example usage:

This happens when the first attribute is [QueueInput].

public static void HandleQueue(
            [QueueInput(queueName : "myqueue")] CustomObject obj,
            [BlobInput("input/{Text}.txt")] Stream input,
            int Number,
            [BlobOutput("results/{Text}.txt")] Stream output)
        {
        }

public class CustomObject
    {
        public string Text { get; set; }

        public int Number { get; set; }
    }

The function has both a QueueInput and BlobInput, but it triggers when the Queue message is detected and just uses the Blob input as a resource while executing.

When does it execute?

This function executes when a new queue message is found on the specified queue. The JobHost will keep the message invisible until the function returns (which is very handy for long running functions) and it will DeleteMessage for you when the function is done.

Route parameters:

In this case, the route parameters are the simple properties on the Queue parameter type (so Text and Number). Note that since the queue parameter type (CustomObject) is a complex object, it will get deserialized using JSON.net. Queue parameter types could also be string or byte[] (in which case they bind to the CloudQueueMessage.AsString and AsBytes).

The usage of parameter binding here may mean your function body doesn’t even need to look at the contents of the queue message.

Trigger when explicitly called via JobHost.Call().

Example usage:

You can explicitly invoke a method via JobHost.Call().

JobHost h = new JobHost();
            var method = typeof(ImageFuncs).GetMethod("ApplyWaterMark");
            h.Call(method, new { name = "fruit.jpg" });

When does it execute?

I expect the list of possible triggers to grow over time, although JobHost.Call() does provide tremendous flexibility since you can have your own external listening mechanism that invokes the functions yourself. EG, you could simulate a Timer trigger by having your own timer callback which does JobHost.Call().

You can use JobHost.Call() to invoke a method that would normally be triggered by BlobInput.

You can also suppress automatic triggering via a "[NoAutomaticTrigger]” attribute on the method. In that case, the function can only be invoked via JobHost.Call().

Route Parameters:

The Call() method takes an anonymous object that provides the route parameters. In this case, it assigned “name” as “fruit,jpg”. The single route parameter will allow 3 normal parameters of ApplyWaterMark to get bound.

Disclaimers

AzureJobs is still in alpha. So some of the rules may get tweaked to improve the experience.

↧

Azure Storage Bindings Part 1 – Blobs

February 18, 2014, 9:09 am

≫ Next: Azure Storage Bindings Part 2 – Queues

≪ Previous: Trigger, Bindings, and Route parameters in AzureJobs

The Azure WebJobs SDK provides model binding between C# BCL types and Azure storage like Blobs, Tables, and Queues.

The SDK has a JobHost object which reflects over the functions in your assembly. So your main looks like this:

static void Main()
        {
            string acs = "DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???";
            JobHost host = new JobHost(acs); // From nuget: Microsoft.WindowsAzure.Jobs.Host
            host.RunAndBlock();            
        }

The JobHost will reflect over your methods looking for attributes in the Microsoft.WindowsAzure.Jobs namespace and use those attributes to setup some triggers (BlobInput, QueueInput) and do bindings. RunAndBlock() will scan for various triggers and then invoke your function when a trigger is fired. Model Binding refers to how the JobHost binds your functions parameters. (it’s very similar to MVC/WebAPI).

The benefits of model binding:

Convenience. You can pick the type that’s most useful for you to consume and the WebJobs SDK will take care of the glue code. If you’re doing string operations on a blob, you can bind directly to TextReader/TextWriter, rather than 10 lines of ceremony to convert a TextWriter from a CloudBlob.
Flushing and Closing: The WebJobs SDK will automatically flush and close outstanding outputs.
Unit testability. It’s far easier to unit test and mock BCL types like TextWriter than ICloudBlob.
Diagnostics. model binding cooperates with the dashboard to give you real time diagnostics on your parameter usage. See screenshots below.

And if model binding is not sufficient for you, you can always bind to the Storage SDK types directly.

That said, here are the bindings that are currently supported in the Alpha release.

Binding to BCL types: Stream, TextReader/Writer, String

You can use [BlobInput] and [BlobOutput] attributes to bind blobs to the BCL types Stream, TextReader and String.

See triggering rules for more details, but basically a function runs when a blob matching [BlobInput] is found that is newer than the blobs specified by [BlobOutput]. This means that it’s important for a [BlobInput] function to write some output (even if it’s just a dummy file) so that the JobHost knows that it’s run and doesn’t keep re-triggering it.

Here’s an example of a blob copy function using each of those types:

public static void CopyWithStream(
            [BlobInput("container/in/{name}")] Streaminput,
            [BlobOutput("container/out1/{name}")] Streamoutput
            )
        {
            Debug.Assert(input.CanRead && !input.CanWrite);
            Debug.Assert(!output.CanRead && output.CanWrite);

            input.CopyTo(output);
        }

        public static void CopyWithText(
            [BlobInput("container/in/{name}")] TextReaderinput,
            [BlobOutput("container/out2/{name}")] TextWriteroutput
            )
        {
            string content = input.ReadToEnd();
            output.Write(content);
        }

        public static void CopyWithString(
            [BlobInput("container/in/{name}")] stringinput,
            [BlobOutput("container/out3/{name}")] out stringoutput
            )
        {
            output = input;
        }

Some notes:

It’s fine to have multiple functions read from the same input blob. In this case, all functions are reading from any blob that matches “in/{name}” in the container named “container”.
The Streams / TextWriters are automatically flushed when the function returns.

You can see some more examples for blob usage on the sample site.

Diagnostics!

When you look at the function invocation in the dashboard, you can see usage stats for each parameter. In this case, we see that CopyWithStream() was invoked on blob “in/a.txt”, and read 4 bytes from it, spending 0.076 seconds on IO, and wrote out 4 bytes to blob “out1/a.txt”.

Again, the monitoring here “just works” when using the SDK, you don’t need to include any extra logging packages or do any extra configuration work to enable it.

Binding to Blob Storage SDK types

You can also bind directly to CloudBlob (v1 storage sdk) or ICloudBlob, CloudPageBlob, CloudBlobBlob (v2+ storage sdk). These options are good when you need to blob properties not exposed as a stream (such as etags, metadata, etc).

public static void UseStorageSdk(
           [BlobInput("container/in/{name}")] CloudBlobinput,
           [BlobOutput("container/out4/{name}")] CloudBloboutput
           )
        {
            // get non-stream properties 
            input.FetchAttributes();
            var keys = input.Metadata.AllKeys;            
            
            // do stuff...
        }

The storage types obviously give you full control, but they don’t cooperate with the dashboard and so won’t give you the same monitoring experience as using the BCL types.

You can also bind a parameter to both CloudStorageAccount 1.7x and 2.x+ Storage SDK types.

public static void Func(CloudStorageAccount account, ...)
        {
            // Now use the azure SDK directly 
        }

The SDK is reflecting over the method parameters, so it knows if ‘account’ is 1.7x (Microsoft.WindowsAzure.CloudStorageAccount) or 2x+ (Microsoft.WindowsAzure.Storage.CloudStorageAccount) and can bind to either one. This means that existing applications using 1.7 can start to incorporate the WebJobs SDK.

↧

Azure Storage Bindings Part 2 – Queues

February 18, 2014, 11:04 am

≫ Next: Who wrote that blob?

≪ Previous: Azure Storage Bindings Part 1 – Blobs

I previously described how the Azure Webjobs SDK can bind to Blobs. This entry describes binding to Azure Queues. (Binding to Service Bus Queues is not yet implemented)

You can see some more examples for queue usage on the sample site. Here are some supported queue bindings in the Alpha:

Queue Input for BCL types (String, Object, Byte[])

A function can have a [QueueInput] attribute, which means that the function will get invoked when a queue message is available.

The queue name is either specified via the QueueInput constructor parameter or by the name of the local variable (similar to MVC). So the following are equivalent:

public static void Queue1([QueueInput] string queuename)
        {
        }

        public static void Queue2([QueueInput("queuename")] string x)
        {
        }

In this case, the functions are triggered when a new message shows up in “queuename”, and the parameter is bound to the message contents (CloudQueueMessage.AsString).

The parameter type can be string or object (both which bind as CloudQueueMessage.AsString) or byte[] (which binds as cloudQueueMessage.AsBytes)

As an added benefit, the SDK will ensure the queue input message is held until the function returns . It does this by calling UpdateMessage on a background thread while the function is still executing and calls DeleteMessage when the function returns.

Queue Input for user types

The parameter type can also be a your user defined poco type, in which case it is deserialized with JSON.Net. For example:

public class Payload
    {
        public int Value { get; set; }
        public string Output { get; set; }
    }

    public static void Queue2([QueueInput("queuename")] Payload x)
    {
    }

That has the same execution semantics as this:

    public static void Queue2([QueueInput("queuename")] string json)
    {
        Payload x = JsonConvert.DeserializeObject<Payload>(json);
        // …
    }

Serialization errors are treated as runtime exceptions which show up in the dashboard.

Of course, a big difference is that using poco type means the QueueInput provides route parameters, which means you can directly use the queue message properties in other parameter bindigs, like so:

public static void Queue2(
            [QueueInput("queuename")] Payload x,
            int Value, 
            [BlobOutput("container/{Output}.txt")] Stream output)
        {
        }

With bindings like these, it’s possible you don’t even need to use x in the function body.

Queue Output

A function can enqueue messages via [QueueOutput]. It can enqueue a single message via an out parameter. If the value is not null on return, the message is serialized and queued, similar to the rules used for [QueueInput].

Here’s an example of queuing a mesasage saying “payload” to a queue named testqueue.

public static void OutputQueue([QueueOutput] out string testqueue)
        {
            testqueue = "payload";
        }

In this case, the function doesn’t have any triggers (no [QueueInput] or [BlobInput]), but could still be invoked directly from the host:

        host.Call(typeof(Program).GetMethod("OutputQueue"));

It can enqueue multiple messages via an IEnumerable<T>. This function queues 3 messages:

public static void OutputMultiple([QueueOutput] out IEnumerable<string> testqueue)
        {
            testqueue = new string[] {
                "one",
                "two",
                "three"
            };
        }

You could of course bind to multiple output queues. For example, this takes an OrderRequest object as input, logs a history to the “history” queue, and may turn around and enqueue suborders for this order.

public static void ProcessOrders(
            [QueueInput("orders")] OrderRequest input,
            [QueueOutput("history")] out WorkItem output, // log a history
            [QueueOutput("orders")] out IEnumerable<OrderRequest> children
            )
        {
            
        }

Diagnostics: Navigating between Producers and Consumers

The Azure WebJobs SDK will track who queued a message so you can use the dashboard to see relationships between queue producers and consumers.

For example, consider these functions:

public static void Producer(
            [QueueOutput("testqueue")] out Payload payload,
            string output, 
            int value            
            )
        {
            payload = new Payload { Output = output, Value = value };
        }

        public static void Consumer([QueueInput("testqueue")] Payload payload)
        {
        }

And say we invoked Producer explicitly via:

        host.Call(typeof(Program).GetMethod("Producer"), new { output = "Hello", value = 15 });

So the Producer() queues a message which is then consumed by Consumer().

We can see Consumer’s execution in the dashboard:

And we can see that it was executed because “New queue input message on ‘testqueue’ from Producer()”. So we can click on the link to producer to jump to that function instance:

And here, we can see that it was executed because of our explicit call to JobHost.Call. And that a child function of Producer() is Consumer(), so you can navigating both directions.

Note this only works when using non-BCL types. You’ll notice the queue payload has an extra “$AzureJobsParentId” field, which we can easily add in JSON objects.

Binding to CloudMessage SDK Types

You can also bind to the CloudQueue Storage SDK type directly. This provides a backdoor if you need full access to queue APIs that don’t naturally map to model binding.

        [NoAutomaticTrigger]
        public static void Queue1(CloudQueue testqueue)
        {
            var ts = TimeSpan.FromSeconds(400);
            testqueue.AddMessage(new CloudQueueMessage("test"), ts);            
        }

It’s still in Alpha, so there are some known issues we’re working through, such as:

Alpha 1 does not support binding to CloudQueueMessage directly.
better bindings for outputs
more customization round serialization for non-BCL types.
Better user control over the QueueInput listening

↧

Who wrote that blob?

February 19, 2014, 9:06 am

≫ Next: Azure Storage Bindings Part 3 – Tables

≪ Previous: Azure Storage Bindings Part 2 – Queues

One of my favorite features of the Azure WebJobs SDK is the “Who wrote this blob?” feature. This is a common debugging scenario: you see your output is incorrect (in this case, a blob) and you’re trying to find the culprit that wrote the bad output.

On the main dashboard, there’s a “Search Blobs” button, which lets you enter the blob name.

Hit “Search” and then it takes you to a permalink for the function instance page that wrote that blob.

Of course, once you’re at the function instance page, you can then see things like that function’s input parameters, console output, and trigger reason for why that function was executed. In this case, the input blob was “in/a.txt”, and you can hit the “lookup” button on that to see who wrote the input. So you’re effectively tracing back up the chain.

And once you find the culprit, you can re-publish the function (if it was a coding error) or re-upload the blob (if it was a bad initial input) and re-execute it just the affected blobs.

It’s basically omniscient debugging for Azure with edit-and-continue.

How does it work?

When the WebJobs SDK invokes a function instance, it gives it a unique guid. That guid is used in the permalink to the function instance page. When a blob is written via [BlobOutput], the blob’s metadata is stamped with that guid. So the lookup only works for functions written by the WebJobs SDK.

What about queues?

The WebJobs SDK has a similar lookup queues. Queue messages written by [QueueOutput] also get stamped with the function instance guid and so you can lookup which function instance queued a message.

↧

Azure Storage Bindings Part 3 – Tables

March 6, 2014, 5:37 pm

≫ Next: How does [BlobInput] work?

≪ Previous: Who wrote that blob?

I previously described how the Azure WebJobs SDK can bind to Blobs and Queues. This entry describes binding to Tables.

You can use a [Table] attribute from the Microsoft.WindowsAzure.Jobs namespace in the Jobs.Host nuget package. Functions are not triggered on table changes. However, once a function is called for some other reason, it can bind to a table as a read/write resource for doing its task.

As background, here’s a good tutorial using about azure tables and using the v2.x+ azure storage sdk for tables.

The WebJobs SDK currently supports binding a table to an IDictionary. where:

The dictionary key is Tuple<string,string> represents the partition and row key.
the dictionary value is a user poco type whose properties map to the table properties to be read. Note that the type does not need to derive from TableServiceEntity or any other base class.
Your poco type’s properties can by strongly typed (not just string) including binding to enum properties or any type with a TryParse() method.
the binding is read/write

For example, here’s a declaration that binds the ‘dict’ parameter to an Azure Table. The table is treated as a homogenous table where each row has properties Fruit, Duration, and Value.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict) {}

public class OtherStuff
        {
            public Fruit Fruit { get; set; }
            public TimeSpan Duration { get; set; }
            public string Value { get; set; }
        }

        public enum Fruit
        {
            Apple,
            Banana,
            Pear,
        }

You can also retrieve PartitionKey, RowKey, or TimeStamp properties by including them as properties on your poco.

Writing to a table

You can use a the dictionary binding to write to a table via the index operator . Here’s an example of ingressing a file (read via some Parse<> function) to an azure table.

        [NoAutomaticTrigger]
        public static void Ingress(
            [BlobInput(@"table-uploads\key.csv")] Stream inputStream,
            [Table("convert")] IDictionary<Tuple<string, string>, object> table
            )
        {
            IEnumerable<Payload> rows = Parse<Payload>(inputStream);
            foreach (var row in rows)
            {
                var partRowKey = Tuple.Create("const", row.guidkey.ToString());
                table[partRowKey] = row; // azure table write
            }
        }

In this case, the IDictionary implementation follows azure table best practices for writing by buffering up the writes by partition key and flushing the batches for you.

Writes default to Upserts.

Reading a table entry

You can use the dictionary indexer or TryGetValue to lookup a single entity based on partition row key.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict)
    {
        // Use IDictionary interface to access an azure table.
var partRowKey = Tuple.Create("PartitionKeyValue", "RowKeyValue");OtherStuff val;
        bool found = dict.TryGetValue(partRowKey, out val);

        OtherStuff val2 = dict[partRowKey]; // lookup via indexer
            
        // another write exmaple
        dict[partRowKey] = new OtherStuff { Value = "fall", Fruit = Fruit.Apple, Duration = TimeSpan.FromMinutes(5) };
    }

Enumerating table entries

You can use foreach() on the table to enumerate the entries. The dictionary<> binding will enumerate the entire table and doesn’t support enumerating a single partition.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict)
    {
        foreach(var kv in dict) { 
            OtherStuff val = kv.Value;
        }
    }

You can also use linq expressions over azure tables, since that just builds on foreach().

Here’s an example of an basic RssAggregator that gets the blog roll from an Azure Table and then writes out a combined RSS feed via [BlobOutput]. The whole sample is available on GitHub, but the interesting code is:

// RSS reader.
        // Aggregates to: http://<mystorage>.blob.core.windows.net/blog/output.rss.xml
        // Get blog roll from a table.
        public static void AggregateRss(
            [Table("blogroll")] IDictionary<Tuple<string, string>, BlogRollEntry> blogroll,
            [BlobOutput(@"blog/output.rss.xml")] out SyndicationFeed output
            )
        {
            // get blog roll form an azure table
            var urls = (from kv in blogrollselect kv.Value.url).ToArray();

            List<SyndicationItem> items = new List<SyndicationItem>();
            foreach (string url in urls)
            {
                var reader = new XmlTextReader(url);
                var feed = SyndicationFeed.Load(reader);

                items.AddRange(feed.Items.Take(5));
            }
            var sorted = items.OrderBy(item => item.PublishDate);

            output = new SyndicationFeed("Status", "Status from SimpleBatch", null, sorted);
        }

BlobRollEntry is just a poco, with no mandatory base class.

// Format for blog roll in the azure table
        public class BlogRollEntry
        {
            public string url { get; set; }
        }

Here’s the contents of the azure table. So you can see how the POCO maps to the table properties of interest.

Removing from a table

You can use IDictionary.Remove() to remove from the table.

public static void TableDict([Table(TableNameDict)] IDictionary<Tuple<string, string>, OtherStuff> dict)
{
                
    var partRowKey = Tuple.Create("PartitionKeyValue", "RowKeyValue");
                
    // Clear
    dict.Remove(partRowKey);
}

You can use IDictionary.Clear() to clear an entire table.

Summary

Here’s a summary of which IDictionary operations map to table operations.

Assume dict is a dictionary table mapping, and partRowKey is a tuple as used above.

Operation	Code snippet
Read single entity	value = dict[partRowKey]
	dict.TryGetValue(partRowKey, out val)
Contains a key	bool found = dict.ContainsKey(partRowKey)
Write single entity	dict[partRowKey] = value
	Add(partRowKey, value)
enumerate entire table	foreach(var kv in dict) { }
Remove a single entity	dict.Remove(partRowKey);
Clear all entities	dict.Clear()

Other notes

This binding is obviously limited. You can always bind directly to CloudStorageAccount and use the SDK directly if you need more control.
The dictionary adapter does not implement all properties on IDictionary<>. For example, in the Alpha 1 release, CopyTo, Contains, Keys, Value, and other aren’t implemented.
We’re looking at more Table bindings in the next update (such as binding directly to CloudTable).
You see some more examples for table usage on samples site.

↧

How does [BlobInput] work?

April 17, 2014, 1:14 pm

≫ Next: Hosting interactive code in the Cloud

≪ Previous: Azure Storage Bindings Part 3 – Tables

The Azure WebJobs SDK supports running functions when a new blob is added. IE, you can write code like this:

public static void CopyWithStream(
            [BlobInput("container/in/{name}")] Stream input,
            [BlobOutput("container/out1/{name}")] Stream output
            )
        {
            Debug.Assert(input.CanRead && !input.CanWrite);
            Debug.Assert(!output.CanRead && output.CanWrite);

            input.CopyTo(output);
        }

See modelbinding to blobs for how we bind the blob to types like Stream. In this entry, I wanted to explain how we handle the blob listening. The executive summary is:

The existing blobs in your container are processed immediately on startup.
But once you’re in steady state, [BlobInput] detection (from external sources) can take up to 10 minutes. If you need fast responses, use [QueueInput].
[BlobInput] can be triggered multiple times on the same blob. But the function will only run if the input is newer than the outputs.

More details…

Blob listening is tricky since the Azure Storage APIs don’t provide this directly. WebJobs SDK builds this on top of the existing storage APIs by:

1. Determining the set of containers to listen on by scanning the [BlobInput] attributes in your program via reflection in the JobHost ctor. This is a fixed list because while the blob names can have { } expressions, the container names must be constants. IE, in the above case, the container is named “container”, and then we scan for any blobs in that container that match the name “in/{name}”.

2. When JobHost.RunAndBlock is first called, it will kick off a background scan of the containers. This is naively using CloudBlobContainer.ListBlobs.

a. For small containers, this is quick and gives a nice instant feel.
b. For large containers, the scan can take a long time.

3. For steady state, it will scan the azure storage logs. This provides a highly efficient way of getting notifications for blobs across all containers without pulling. Unfortunately, the storage logs are buffered and only updated every 10 minutes, and so that means that the steady state detection for new blobs can have a 5-10 minute lag. For fast response times at scale, our recommendation is to use Queues.

The scanning from #2 and #3 are done in parallel.

4. There is an optimization where any blob written via a [BlobOutput] (as opposed to being written by some external source) will optimistically check for any matching [BlobInputs], without relying on #2 or #3. This lets them chain very quickly. This means that a [QueueInput] can start a chain of blob outputs / inputs, and it can still be very efficient.

See Also

↧

Hosting interactive code in the Cloud

April 25, 2014, 9:47 pm

≫ Next: Redis Cache Service on Azure

≪ Previous: How does [BlobInput] work?

Azure WebJobs SDK alpha 2 makes it very easy to host code in the cloud and run it interactively. You can now invoke your SDK functions directly from the dashboard. Some great uses here:

Provide admin diagnostic commands for your live site.
Easily host code in azure for testing benchmarking code within a datacenter.
Sharing your functions so that others trusted folks can call them, without the hassle of writing an MVC front end.
This provides a great live-site debugging tool since you can replay erroneous executions without perturbing the rest of the system. (Couple that with other SDK debugging features like Who wrote this blob?)

For example, suppose you have a function Writer(). In this case, we’ll just do something silly (take a string and write it out multiple times), but you can imagine doing something more interesting like providing diagnostic functions on your live site (eg ,”GetLogs”, “ClearStaleData”, etc).

using Microsoft.WindowsAzure.Jobs; // From nuget: Microsoft.WindowsAzure.Jobs.Host
using System.IO;

namespace Live
{
    class Program
    {
        static void Main(string[] args)
        {
            var host = new JobHost();
host.RunAndBlock();
        }

        // Given a string, write it our 'multiple' times. ("A",3) ==> "AAA".
        public static void Writer(string content, int multiply, [BlobOutput("test/output.txt")] TextWriter output)
        {
            for (int i = 0; i < multiply; i++)
            {
                output.Write(content);
            }
        }
    }
}

Once you run the program to azure, if you go to the dashboard, you’ll see the function show up in the function list (it’s “published” when the JobHost ctor runs):

You can click on that function and see a history of executions and their status. :

You can click on the blue “run function” to invoke the function directly from the dashboard! This lets you fill in the parameters. Notice that the parameters are parsed from strings. So ‘multiply’ is strongly-typed as an integer, and we parse it by invoking the Int.TryParse function.

And of course, the above page has a shareable password-protected permalink, (looks like: https://simplebatch.scm.azurewebsites.net/azurejobs/function/run?functionId=simplebatch3.Live.Live.Program.Writer ) so that you can share out the invocation ability with others.

You can hit run and the function will get invoked! The assumes that your webjob is running, and the invoke works by queuing a message that the JobHost.RunAndBlock() call will listen to. (This means that your webjob needs to actually be running somewhere, although if it’s not, that dashboard will warn you about that too). Run will also take you to the “permalink” for the function instance execution, which is a shareable URL that provides information about the function execution. You can see that the output parameter was written to a blob “test/output.txt” and it wrote 9 bytes and the hyperlink will take you to the blobs contents. It also notes that the execution reason was “Ran form dashboard”.

You can also hit “replay function” on an existing instance to replay the function. This will take you back to the invoke page and a) pre-seed the parameters with values from the execution and b) record a “parent” link to the permalink back to the original instance of the execution so you can see what was replayed.

↧

Redis Cache Service on Azure

May 13, 2014, 12:51 pm

≫ Next: Azure Storage Naming Rules

≪ Previous: Hosting interactive code in the Cloud

We just previewed a Redis cache service on Azure. A good writeup is also on ScottGu’s blog.

This is Redis hosted within azure as service. You can create a cache via the portal, and then access it via

Some highlights:

Hosting Redis 2.8 on Azure VMs
accessible via redis clients from any language. My recommendation for C# is Marc Gravell’s Stackexchange.Redis.
Caches expose a SSL endpoint, and support the Auth command.
The standard SKU provides a single-endpoint that’s backed by a 2-node Master/Slave cluster to increase availability. The service includes automatic failover detection and forwarding requests to the master so you get the persistence without needing to worry about it.
There’s a redis session state provider for using redis from your ASP apps.

See the Getting Started Guide for how to jump right in.

↧

Azure Storage Naming Rules

June 12, 2014, 1:59 pm

≫ Next: Webjobs SDK Beta is released

≪ Previous: Redis Cache Service on Azure

I constantly get burned by naming rules for azure storage. Here’s a collection of the naming rules from MSDN. The Storage client libraries don’t help you with these rules and just give you a 400 if you get them wrong. Fortunately, WebJobs SDK will provide client-side validation and give you more friendly messages.

Here’s a summary of rules in table form:

Kind	Length	Casing?	Valid chars?
Storage Account	3-24	lowercase	alphanumeric
Blob Name	1-1024	case-sensitive	any url char
Container Name	3-63	lowercase	alphanumeric and dash
QueueName	3-63	lowercase	alphanumeric and dash
TableName	3-63	case-insensitive	alphanumeric

You’ll notice blobs, tables, and queues all have different naming rules.

Here are the relevant excerpts from MSDN with more details.

Storage account names

Storage account names are scoped globally (across subscriptions).

Between 3 and 24 characters. Lowercase letters and numbers .

Blobs

From MSDN here.

Blob Names

A blob name can contain any combination of characters, but reserved URL characters must be properly escaped. A blob name must be at least one character long and cannot be more than 1,024 characters long. Blob names are case-sensitive.

Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.

By convention / is the virtual directory separator. Don’t use \ in a blob name. The client APIs may allow it, but then fail to hash properly and the signatures mismatch

Blob Metadata Names

Metadata for a container or blob resource is stored as name-value pairs associated with the resource. Metadata names must adhere to the naming rules for C# identifiers.

Note that metadata names preserve the case with which they were created, but are case-insensitive when set or read. If two or more metadata headers with the same name are submitted for a resource, the Blob service returns status code 400 (Bad Request).

Container Names

A container name must be a valid DNS name, conforming to the following naming rules:

Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
All letters in a container name must be lowercase.
Container names must be from 3 through 63 characters long.

Queues

From MSDN:

Every queue within an account must have a unique name. The queue name must be a valid DNS name.

Queue names must confirm to the following rules:

A queue name must start with a letter or number, and can only contain letters, numbers, and the dash (-) character.
The first and last letters in the queue name must be alphanumeric. The dash (-) character cannot be the first or last character. Consecutive dash characters are not permitted in the queue name.
All letters in a queue name must be lowercase.
A queue name must be from 3 through 63 characters long.

Tables

Name of the table

Table names must conform to these rules:

Table names must be unique within an account.
Table names may contain only alphanumeric characters.
Table names cannot begin with a numeric character.
Table names are case-insensitive.
Table names must be from 3 to 63 characters long.
Some table names are reserved, including "tables". Attempting to create a table with a reserved table name returns error code 404 (Bad Request).

These rules are also described by the regular expression "^[A-Za-z][A-Za-z0-9]{2,62}$".

Table names preserve the case with which they were created, but are case-insensitive when used.

Valid property names

Property names are case-sensitive strings up to 255 characters in size. Property names should follow naming rules for C# identifiers. (The dash is no longer allowed)

Valid values for PartitionKey and RowKey

The following characters are not allowed in values for the PartitionKey and RowKey properties:

The forward slash (/) character
The backslash (\) character
The number sign (#) character
The question mark (?) character
Control characters from U+0000 to U+001F, including:
- The horizontal tab (\t) character
- The linefeed (\n) character
- The carriage return (\r) character
Control characters from U+007F to U+009F

↧

Webjobs SDK Beta is released

June 19, 2014, 9:36 am

≫ Next: Trigger, Bindings, and Route parameters in AzureJobs

≪ Previous: Azure Storage Naming Rules

We just released the WebJobs SDK Beta! Some highlights:

ServiceBus support!
Better configuration options. You can pass in an ITypeLocator to specify which types are indexed, and an INameResolver to resolve %key% tokens in the attributes to values.
Cleaner model for Triggering (this is a breaking change … I need to go and update my previous blob entries)

There were some breaking changes , including an attribute rename and a branding rename.

nuget package rename: Microsoft.WindowsAzure.Jobs.Host –> Microsoft.Azure.Jobs
nuget package rename: Microsoft.WindowsAzure.Jobs –> Microsoft.Azure.Jobs.Core
attribute change: Instead of [BlobInput]/[BlobOutput] attributes, we now have [BlobTrigger] and [Blob].
attribute change: Instead of [QueueInput] / [QueueOutput] have become just [QueueTrigger] and [Queue].

The attribute changes makes it very clear exactly what’s triggering a function. Functions can only have 1 trigger.