I will show a really simple example of data processing in all three languages, trying to form the C++ version to compete with both C# and Python.
I've tried to find a task that can't be done with a simple loop over a linear collection of data. The requirement of sorting the collection in the middle of the exercise make things a bit more tricky. So here it is:
Lets assume, we have a collection of employees. Each employee's data is stored in a class instance, while, to simplify this example, every instance contains these four elements:
- Forename (string)
- Surname (string)
- Birthday (string, YYYY-MM-DD)
- Children (int, count of children)
We assume too, the collection is verified, we do not encounter invalid formatted or empty values (None, null, nullptr). The encoding of the birthday as string was intended, so we do not have to handle different date type implementations and we need a string operation one each element during sort. The requirement is:
- Filter all employees that have at least one child.
- Sort the candidates by month/day of there birthday.
- Create a string collection formatted: 'Forename, Surname: MM-DD'.
The sorting should only be done on the already filtered candidates (In a real life scenario, this is the typically step, always process as few data as possible) !
The element access in the class in Python, as well in C# are done using properties (something I really miss in C++). The C++ class has inline methods in the form:
1 std::string Forename() const { return _forename; }
Using filter, sorted, map and lambda's:
1 result = filter(lambda e: e.Children > 0, employees)
2 result = sorted(result, key=lambda e: e.Birthday[5:])
3 result = map(lambda e: '%s, %s: %s' %
4 (e.Forename, e.Surname, e.Birthday[5:]), result)
1 result = [ e for e in employees if (e.Children > 0) ]
2 result.sort(key=lambda e: e.Birthday[5:])
3 result = [ '%s, %s: %s' %
4 (e.Forename, e.Surname, e.Birthday[5:])
5 for e in result ]
The next candidate is C#. Employee instances are stored in a generic: List<Employee> employees;. We use LINQ here, an innovation that has changed my thinking of software development when I first encountered it. To me, this the easiest to read version of all three languages:
1 var res =
2 from e in employees
3 where e.Children > 0
4 orderby e.Birthday.Substring(5)
5 select string.Format("{0}, {1}: {2}",
6 e.Forename, e.Surname, e.Birthday.Substring(5));
We've seen how easy this can be done in the previous languages. Now take a look how to do this in C++ using the recommended way by using STL algorithms. Employee instances are stored in std::vector<Employee>. I've chosen to prevent the use of std::vector<std::shared_ptr<Employee> >, as this would make things a bit more complicated than necessary, even if speed is C++'s primary domain and copy construction is a complement to this !
The first part are definitions of three functor classes, the first for filtering, the second to sort the remaining candidates and the third to format the output. After the functor definitions, the true processing takes its place and there's one important thing. Algorithm copy_if is not part of the standard, I just assume that the STL implementation in use has it implemented. Otherwise it is not that difficult to implement one. OK, here's the source of the C++ version:
1 struct FilterFunctor : unary_function<const Employee&, bool> {
2 bool operator()(const Employee &e) const {
3 return e.Children() > 0;
4 }
5 };
6
7 struct SortFunctor :
8 binary_function<const Employee &, const Employee &, bool> {
9 bool operator()(const Employee &lhs, const Employee &rhs) const {
10 return lhs.Birthday().substr(5) < rhs.Birthday().substr(5);
11 }
12 };
13
14 struct FormatFunctor : unary_function<string, const Employee &> {
15 string operator()(const Employee &e) {
16 ostringstream os;
17 os << e.Forename() << ", " << e.Surname() <<
18 ": " << e.Birthday().substr(5);
19 return os.str();
20 }
21 };
22
23 vector<Employee> filtered;
24 copy_if(employees.begin(), employees.end(),
25 back_inserter(filtered), FilterFunctor());
26 sort(filtered.begin(), filtered.end(), SortFunctor());
27 vector<string> result(filtered.size());
28 transform(filtered.begin(), filtered.end(),
29 result.begin(), FormatFunctor());
Each algorithm in the STL requires at least one pair of iterators as specifiers for the processing range. This is a nice and really flexible feature, as this opens a wide range of possibilities. But looking back to my usage of the STL, around 95% of my algorithm calls work between begin() and end() of a collection. Why is there no algorithm that expects the container instance as range specification ? It seems that i'm not the only person that asks this question. There's already a solution to this: Boost range algorithms. Besides many useful features, each STL algorithm has an equivalent in namespace boost, that reduces the algorithm call range specification to the container's instance name. A simple example, the call to:
1 copy(employees.begin(), employees.end(), employeeClone.begin());
1 copy(employees, employeeClone.begin());
By just a simple #include and using namespace directive !
- Split the context into different source code parts.
- Heavy overhead of the amount of source code required to type in.
The whole definition of functor FilterFunctor above is expressed in C# using the simple term e.Children > 0, really unfair for the C++ community, isn't it ?
With the introduction of lambda expression in C++0x, these problems are a thing of the past. They allow the C++ user to embed the whole functor body at the place where it is required, the functor parameter on the call-site. Using this and boost range, the compete C++ code above can be written as follows (boost range does not implement copy_if, so we have to use the STL version here):
With the introduction of lambda expression in C++0x, these problems are a thing of the past. They allow the C++ user to embed the whole functor body at the place where it is required, the functor parameter on the call-site. Using this and boost range, the compete C++ code above can be written as follows (boost range does not implement copy_if, so we have to use the STL version here):
1 vector<Employee> filtered;
2 copy_if(employees.begin(), employees.end(), back_inserter(filtered),
3 [](const Employee &e) { return e.Children() > 0; });
4
5 sort(filtered, [](const Employee &lhs, const Employee &rhs) {
6 return lhs.Birthday().substr(5) < rhs.Birthday().substr(5); });
7
8 vector<string> result(filtered.size());
9 transform(filtered, result.begin(), [](const Employee &e) {
10 return (format("%s, %s: %s") % e.Forename() %
11 e.Surname() % e.Birthday().substr(5)).str(); });
Hope this will useful to you,
hobBIT
Btw: I'm very interested to see implementations in other languages, especially Haskell and Ruby !
No comments:
Post a Comment