Mozilla is Evil

Firstly, many browsers are not your friends, so this is not a Mozilla is worse than X post.

So why bash Mozilla?

Google get bashed, Microsoft get bashed and Apple do, but the alternative is not a saint. It boasts about privacy, but doesn’t enable it for most users, it complains about tracking and then teaches web developers how to do it. It has had complaints for around a decade (since then there have been others, like 970092 ) that user privacy is being invaded because of browser features.

But Mozilla are just following a standard?

Mozilla staff can often play a key role in changing the web, from work on drafting standards to work on demonstrating new ideas with new features that are yet to be fully standardised.
Web standards are not legal requirements and there is nothing to stop Mozilla either breaking from them to fix privacy and security or providing a default alternative release or feature flags that protect users.

Fixing the design that would break everyone?

So? Apple broke a lot when they stopped supporting Flash. Is Firefox incapable of leading beyond broken standards, to protect users when others have already demonstrated a precedent that it can be done? Firefox can even re-use the same security pattern adopted for SSL certificates that if you get into trouble you can opt-in to delegate to a less secure mode on a site.

So why does Mozilla have to lead?

Because they boast of caring about privacy.

Sites like https://advocacy.mozilla.org/en-US and https://www.mozilla.org/en-US/privacy/firefox/ boast of how they wish to defend privacy, but their flagship product fails most users.

Sorry, but whatever you do to cure the minority, if the majority are still suffering, then boasting about the minority is a falsehood. It’s like BP boasting about it’s solar energy project… great job, but they’re still mostly an oil company. Firefox is still mostly a web browser business for which most of their users have their privacy breached because of the insecure design of the flagship product.

But they have private browsing mode and tracking protection?

  • Private browsing is designed primarily at local privacy from others users of a machine, don’t confuse it. In doing so it achieved some mitigation of tracking cookies, but not saving history, searches, cookies, temporary files is quite an expensive feature set to lose that people typically would like to have because they trust their local machine, it’s the remote ones they want to protect themselves from.
  • Which brings us to tracking protection that when included blocks “many” trackers…. Many? That’s not enough and on notable sites including health services I’ve found tracking still happens and referer urls are still sent.
  • It isn’t turned on by default, so for you to be beter protected, your first thought after installing Firefox has to be, I don’t trust Firefox to protect my privacy by default, I need to configure that in and how many users think like that and then how many know what to do – (please at least install something like Privacy Badger from a very trusted source).
  • But… neither solve the problems of third party JavaScript running in the same context as the site you are using. This is a fundamental failure in the design of the web and one they acknowledge https://developer.mozilla.org/en-US/Add-ons/WebExtensions/Security_best_practices – so they advise devs (if you find this page), don’t do it, but don’t advise their users when it is happening. Show a red flag, a do you want to continue notice or something to advise people this site executes remote JS.

But it’s not their fault websites include tracking, it’s web developers who add this stuff?

But you’re blowing this out of proportion

No, when Snowden blew the whistle and shouted we were all being watched, he didn’t recommend Firefox, he suggested Tor browser and that was five years ago. The fundamental design of the internet was failing society and in the last five years since, Mozilla hasn’t protected most of its users. It cares about them as much as maybe BP cares about clean energy.

I’ve been complaining to various companies and regulators for years about browsers leaking data. The UK regulator even blogged about my complaint https://iconewsblog.org.uk/2015/09/16/does-your-website-have-a-leak/ as millions of users and several major sites suffered a major problem I found.

Since then I’ve started demonstrating some of the problems I’ve found https://www.youtube.com/channel/UCt0RTPkU-38xn5rUxZsWTig/videos and typically these problems boil down to URLs are leaking personal information in referer headers, tracking IDs are shared in cookies that allow cross referencing of personal information between sites to build up an identifiable tracking picture and third party JavaScript executed in the same context as same-origin scripts can perform complete account takeover and surveillance on a per user basis with little if any ability for a website to audit or realise it happening if an attack uses a little competence.

I’m not alone… browser based attacks are becoming more common and you only have to search Google News briefly to find things like:

Some aren’t even attacks that were intended to be malicious:

The businesses that use analytics, advertising and social media services are often leaking a lot of tracking data and handing over keys to their castles. Their management and often even web developers are so naive about how insecure the web is by default, they don’t realise that users are at risk from what these third parties are allowed to do in the browser.

So why is Mozilla Evil, perhaps they’re just, not the best?

Remember they’re not alone, they have company in their sins, but I’m pointing them out because people fail to and because I feel they are two faced. They are likely a lesser evil than some, but still…

They boast about why you should use them, because they care about privacy.

They boast of features that don’t work properly, like tracking protection, that “mostly” works: what does mostly mean? Would you use a condom that was mostly watertight?

They don’t inform most users. You don’t know that when you visit this blog, your own computer has been used to send tracking data to various other companies… did you read my cookie policy? Do you know who’s got access to this page? Are you reading what I wrote or what the analytics company JavaScript replaced it with?

I’m no angel

I’m not going to tell you this website is secure or private. Maintaining a website requires an operational overhead I feel I might get wrong and put users at a higher risk (it could get cryptojacked) and I’ve delegated instead to wordpress.com. Maybe I should find something better, but the reason I’m not evil, is I’m not lying to you. I’m not pretending this site is something that it isn’t and I’m not advising you to use this site in a manner that would put more users privacy at risk. Can I do better, yes, but then my comment about the risks you face when reading this blog wouldn’t be possible.

 

 

We need to talk about Agile

 

“Agile” is not a software methodology, it is an ideology… it is built on a manifesto, that in practice is often corrupted as meaning something beyond what it states. There’s too many articles on what Agile does or doesn’t mean, but essentially it does demand design, process, documentation and tooling; it just suggests they should be enriched with greater attention to the functions that lead to results. It was created at at time, when you could still buy software off the shelves with user manuals that were hundreds of pages long and if it didn’t work, you couldn’t easily update it.

As an ideology it has a desired goal, which is to enable software development, but promoting ways of working that help reach functional goals. It does also tend to be a bad fit for heavily regulated environments or security conscious environments.

It has very obvious missing goals, it does not address non-functional requirements like compliance, performance, reliability, consistency, accessibility, maintainability, backups, …, the list goes on.

Therefore, it is a an ideology to achieve a function, regardless of how well that function works in an ever changing and complex environment.

So what is missing and what do we need to add to or replace in Agile. I’d like to introduce two very important ideologies that we should add to Agile.

The Right People:  we need expertise, not just devs

Most devs can write great functional code based on business ideas for most logical and presentation requirements but they come unstuck when they need expertise in:

  • Encryption
  • Information Security architecture
  • ACID /transactions
  • Deadlock and concurrency
  • Legal requirements – audit, auth, retention, access control, …
  • Accessibility

An example: Encryption

I do not know a single software developer who can write encryption libraries, like TLS 1.2 level encryption and I include myself. I believe there are lots of mathematicians and computer scientists who could develop individual encryption functions, but combined into a framework that is secure through the whole layer, I’m not sure many can.

I have not met a single developer qualified to identify what encryption libraries are good or bad.

So why do we let software developers pick encryption libraries and configure their implementation?

AES 256 and RSA 4096 are surely all you need? Well, no, you’ll still need to understand at least the following to use them:

  • PRNGs
  • IVs
  • Sources of random
  • Blacklists
  • key lengths
  • key randomisation
  • key management
  • information leakage (especially dangers of using compression, caches or any other indexed data)
  • Appropriateness of re-use

But our team is small and only average software developers and QA?

  • Contract expertise for design, review and testing
  • Delegate features to specialist teams (information security development)
  • Adopt a recognised standard (NIST, OWASP, WCAG, Mozilla recommends server sidde encryption configurations, etc)
  • Adopt a recognised library/application: but does the proprietary or open source library guarantee standards met to help choose adoption? You should probably avoid hashids.
  • Adopt a recognised service: cloud services mean everything except your business logic can likely be bought as a service, so why not do that: then it is someone else’s responsibility… just have fun making sure the cloud is appropriate.
  • Alternatively, don’t do it – if a small building firm can’t build a skyscraper it will find something else to do – some things are not supposed to be done by small teams and startups. Is the business value really there, is it worth employing specialist help? If the business value doesn’t warrant the expertise, then it probably isn’t valuable enough to be worth doing.

Maintenance: it isn’t just bugs

Agile typically does cater for bug fixing, but maintenance is more than that.

  • Legal changes
  • Vulnerability management
  • Licences end, projects die

With GDPR arriving soon, hopefully everyone is reviewing all systems that hold, transport and … access PII. However, it’s not just GDPR that is  and has been changing, especially in more regulated environments with PCI, MIFID, GCP, Accessibility law, etc often have amendments too. Contracts with third parties typically demand levels of security that must be adhered to as well. Some of these changes are passive (you have to discover a legal change) and some are active (you are told of a contractual change) but both need to cycle into the maintenance of what would otherwise be ignored code running in production.

Access is actually a really worrying problem. Many systems are setup with walls at perimeter, but not inside. So the web frontends and web api gates into the system are typically offered some lifecycle management to check for greater maintenance risks, but the other services can be just as dangerous.

XXE, Remote Code Execution (reflection, SQL, etc) and even internal tooling are all often one step away from attackers and that step might not be designed to worry about the concern.

The last concern is licences and discovering that your licence for proprietary software can leave you in a legal dilemma (do you shutdown to respect the licence or steal some more time, hopefully only to migrate or negotiate renewal) or that the open source project you use has died… do you take a risk and continue with unsupported and increasingly likely vulnerable software or refactor off. To do the right thing means knowing about it.

This requires audit

Where is audit mentioned in the Agile manifesto?

Audit is a function of all business environments have and whether it is a legally qualified audit, like an accountancy audit or regulatory audit for compliance that has a high process demand or whether it is a just a regular check, like are the toilets clean with a tickbox to fill, it happens everywhere… except so much in software development.

How do you check your toilets are clean in your website in production? Sounds a bit strange, but how many of us are testing for vulnerabilities, reviewing log cleanliness, etc. We might be doing the functional parts that we know will break the application: is the server running, does the database have disk space… but beyond that too many places have security problems or even embarrassingly suffer problems like their TLS certificates expiring… when that happens, it is not a failure of the dev/ops who set it up, it is a failure of the business to have a business control around it.

So we might clean the toilets with a quarterly automated penetration test, alarms when the disk levels are high and a Jira scheduled ticket (why doesn’t that exist?) for renewing the TLS certs, but what about the other layers of audit.

  • What about the full stock take? Double check everything is correct and proper annually?
  • The expired goods? Third party software validation of not just CVEs but that there still is support from the third party
  • The health and safety risk assessments? Still in their early days, privacy impact assessments are becoming part of software development, but many still don’t do them and are they maintained on a lifecycle basis or on a first release basis? Are the requirements drawn out from them validated?

As a lifecycle event it should be driven by business controls, which means they know what to control and that requires documentation, which is fine in Agile, Agile only really demanded you didn’t write thousand page usage manuals like you used to get with software in the 90s (okay, some were only 600 pages).

This is the key reason why Agile Software Development fails as a project management methodology alone: the business behaviours demanded are always changing and yet the project leaves features dead in Jira: with just bug fixing idling along until a new feature is demanded.

 

 

NPM is lying to you and Facebook misses copyright attribution

Update: Originally titled “NPM is lying to you and Facebook is stealing copyright” I’ve amended it out of respect to those who weren’t happy with this, but this error should reflect on Facebook audit processes (due diligence) of copyright attribution, which would hopefully have caught this. Regarding concerns about attribution to Mozilla in the issue (https://github.com/facebook/react/issues/8789) I think there is a misrepresentation of CC0/dedicated to public domain in the comments: it is not the same as copyright expiry and it’s important that the rights holder (which I believe is still Mozilla) is tracked by Facebook even if not attributed in published bundles. If nobody tracked that Obect.is came from Mozilla, then when the page goes, the first to copy the page can sue everyone.

Firstly, copyright is complicated and getting this right is difficult and I don’t believe that the npm website is trying to lie to you, but that some of the projects on there are (hopefully accidentally) doing so.

No billion dollar company has the right to get this wrong and they should all be running regular audits, but even they might slip up and if they do, SCO vs open source  and Google’s 9 lines were painful moments, so if they could lead by example it would be great. I do hope everyone believes individual developers should be given a little room on accuracy in this domain, we’re unlikely to be lawyers, but if you do spot this kind of thing… please please please let the parties affected know in a respectful fashion that allows them to resolve it sooner rather than later, it is one thing to slip up for a short period of time and another as it gets longer: the longer it is left without resolution, the more dependent projects that might be affected too.

When you look at the licences in a library in npm, you think great it is Apache, BSD, MIT, etc and I can probably use it pretty freely.

When it’s LGPL, GPL, AGPL or EPL it gets more complicated, but may not be impossible… it might even be okay if you wish to adopt these.

Well, those licences aren’t complete in npm for many libraries. Partly because of wonderful technologies like webpack that bundle your code with your dependent code, but don’t, by default, facilitate creating a combined licence file in the process.

npm isn’t the only party getting this wrong, too many open source tools encourage you to label a project as one licence, when in truth it is more likely that your project’s direct code is one licence, but when packaged it is a multi-licence project.

To make matters more complicated, some source code repositories include third party code directly in their source repository (perhaps because it isn’t available from the repository they choose for the project, like npm) and this results in the source code repository itself being mixed licence… how do I fit that in the Github licence option?

If you publish code that is a mix of others work, including in a bundle or even as just accompanying assets, please ensure that the licences are published too. At least we don’t have to make printable booklets to ship with physical products.

react

Facebook is a big multinational software company. They obviously know about copyright law in their legal teams.

Well they’ve missed something… their current version of the React website uses this wonderful JavaScript file  which is full of copyright statements about Facebook, but none for third party libraries.

Hmm… strange, their library has dependencies on object-assign (amongst others).

Let’s npm install it and see what’s in the dist folder. There’s a basic react.min.js file and there’s an add-ons one that’s also available online at the version I’m seeing locally: 15.4.2

Strange, again it only has Facebook copyright in, but no third parties.

Their add-ons page doesn’t exactly tell you about the embedded object-assign copyright licence which is MIT and requires that if you include object-assign in your own works you need to include their MIT licence with it so that users know that parts of the React software include object-assign.

Bad Facebook, not only breaching copyright, but as developers often use them as a reference for how to build web pages, they risk setting a bad example for how to manage copyright. Their legal team should be on top this, ensuring a regular audit happens and helping to oversee it.

They have a similar issue with Draft.js

jsrsasign

I spotted jsrsasign did this, but I’ve seen it before. Sorry to out jsrsasign, it looks like a great project… Javascript encryption enables client-side private keys and object level security instead of passwords over only network level https (mutual auth is great for your enterprises’ servers, but isn’t catching on for the open web).

Make sure you understand encryption export law if you wish to use it, I won’t pretend I know enough to offer advice and ThoughtWorks have been good enough to offer some, but you should check with a legal expert.

This has a hidden ext folder when attempting to determine how to reference open source licences that you would need to publish with your end product, because this isn’t referenced in npm. I think it can be, but unluckily jsrsasign haven’t yet… hopefully they will soon.

 

 

 

 

Complex Primitives

I have a crazy idea: create a cross-platform language, no not Java: something better. Primitives are supposed to the simplest form of data in a programming language. So how hard can it be to work with them…

Typical representations

  • References (pointers)
  • Boolean
  • Integer numbers
  • Floating point numbers (binary and sometimes decimal)
  • Primitive structures (array, list)
  • Character(s)

Boolean is complex

In typical computing systems, everything is a 0 or  a 1, except usually nothing is.

CPUs typically look at numbers at bit sets of length 8, 16, 32 or 64: not usually 1.

Although most have somewhere that this doesn’t hold true, either with longer primitive sizes (128/256), special floating point versions (56, 80, …) or slightly weird 31 bitset sizes (bad IBM).

The easiest way to manage boolean is to choose 0 as true or false (often false) and anything else as the opposite. However, what size of bitset do you use? If you use the defacto int then it might be different in different compilers (32bit vs 64bit).

Luckily, all you need to know is the size of the bitset and the offset in memory.

Integer is much more complex

So boolean is an offset in ram and a size of the bitset to use: all 0s then it’s false and anything else it is true.

Integers share the problem that you need to know the size of the bitset, but suffer a further problem: order and signing.

The signed number part is simple: a bit is reserved at the most significant bit to be used for representing positive (zero) or negative (1: with 2s complement), but order gets complex..

Bits and Byte Ordering

So what is order: well that “most significant bit” is the problem. Endian of bits and bytes comes into play (and they’re not always the same as each other).

The order of bits varies between processors and usually this problem is something that is more likely to affect you at a very low level (drivers, hardware, etc): to make it more fun most computers have more than one cpu. Your sound card, graphics card, network card, etc might all see bits a different way around: nevermind the busses.

At the software level you usually find everything is the same (let me know if I’ve got this wrong), but here you suffer byte order differences where different protocols (network, inter-process, file  formats) can each represent things larger than a byte in different order.  This isn’t too difficult to solve (https://github.com/markalanrichards/bitcoin/blob/master/src/compat/byteswap.h) you just need to remember to use it everywhere your program interfaces with the world.

Floating Points

Luckily floating points are strictly described in IEEE-754, well I say luckily: except the complexities of implementing it mean that not all languages actually adhere to it: https://en.wikipedia.org/wiki/Criticism_of_Java#Floating_point_arithmetic

Characters

Characters and Strings are terrifying.

There are hundreds (maybe thousands) of character sets: in a large part because some base character sets (Latin1) have multiple versions for different languages. Not all are easy to work with (I remember something odd about Turkish EBCDIC and xml processing problems as symbols can be remapped). The simplest solution is to make everyone fit into a box and force UTF-8: then hope that nobody adds a BOM, let’s hope UTF-8 never gets deprecated.

References

So you have a reference to some data…. how do you reference it and what kind of data might you have to reference:

The reference could be a nice compile time fixed size (like a pair of integers, I mean a pair of 64 bit integers).

It might be a variable size (String of characters) holding some JSON: so maybe a block of memory.

Or it might be a continuous stream (/dev/urandom)

Or it might be a channel of offset data (File on disk) with parts that might no longer be available later in the day or new parts that arrive whilst reading.

It’s easiest to manage the fixed size case (c style) and then re-use the fixed size blocks for streams of data, but sometimes you need more complex references like File handles.

So a VLQ (https://en.wikipedia.org/wiki/Variable-length_quantity) might do for the simple case, and then a VLQ that contains References to further VLQs might be usable for the rest of the use cases.

Arrays and Lists

I don’t think I really consider these to be primitive types

Great, I don’t need to write these in my language: I can borrow them? Well maybe, except when it comes to mixing primitives with polymorphic types: well maybe I can still use them I guess I can just put the type into the box as the first entry.

It’s all in the bus

Eventually, much of the data you use ends up moving through the buses on your system and they have different sizes and then on top of that you get fixed sized pages that move over buses, which you hope are an integer multiple of the bus size: typically the one everyone knows that isn’t is the MTU (which varies between broadband, ethernet and modem systems).

So when you use these complex primitives you might not want to just use the language primitives: but optimise for the bus/packet/page sizes involved. Should these be primitives? Well that might depend on your architecture and for a cross platform language I guess you should let it be a language specific optimization to curry in an outside primitive.

So how to solve this for writing a new language?

Copy Scala and Groovy: use the JVM to solve this for you and give you a consistent view of the world and force everyone to map using Java data structure until later… although I’m tempted to checkout the CLR/Mono too.

Keeping in Time

Now it is 15:31 on the 11-09-2016 and I’m in London.

Writing and Reading dates

Always use the order or full year, month then day of month.

  • Text ordering is now time ordered
    • 2016-06-06 is always after 2015-06-07 in text and time
    • 06-07-2016 is before 07-06-2015 and after 05-06-2014 in text ordering, but not in time.
    • This useful in table ordering, like when listing files.
  • Local differences don’t matter so much (Europe vs USA format)

Offset Time

The UK has British Summer Time, so that time point is 1 hour ahead of where it should be, so the time is actually 2016-09-11T15:31:00+01:00 (in ISO 8601 format).

Zoned Time

If I asked you to call me at this time in 6 months, I might not appreciate a call at 2017-03-11T15:31:00+01:00 because I’m then out of Daylight Savings, so instead it’s useful to capture the time zone.

Like this, 2017-03-11T15:31:00 [Europe/London] where Europe/London is an official designation of my time zone. Except this isn’t quite good enough…

What if I wanted a call at 2am on 29th of October, 2017?

2017-10-29T02:00:00 [Europe/London] happens twice and so to be as distinct as possible then perhaps 2017-10-29T02:00:00+00:00 [Europe/London] and 2017-10-29T02:00:00+01:00 [Europe/London] would be enough to know which is which.

GPS Time is ordered?

Be wary of using GPS time. Although GPS time is an unadjusted series of seconds since a point in time, will it always be so? It’s use case is for positioning and if there ever were a need, I would guess adjusting time to fix positioning would be preferred to adjusting positioning to fix time.

Use TAI time when you need time order or better use incrementing ids.

You want to know about the series of events, such as, in a server log or transaction log.

The problem is UTC goes back in time and as Unix time typically uses UTC, your logs will too. https://en.wikipedia.org/wiki/Unix_time

But, it doesn’t have to be this way: TAI time is ordered and not affected by adjustments to keep time in order with our solar cycle, so no leap seconds… that doesn’t mean there aren’t adjustments, but the adjustments should be such much smaller fractions of a second.

But, even with TAI, that fraction of a second can still get bigger: you don’t poll NTP at a rate of a fraction of a second and PC clocks fall out of sync and on VMs sometimes more than expected.

So why not just use an incrementing counter? The order of log lines effectively does this, but assumes that they are written in order and kept in order, so I guess the question here is now that you aggregate your log files to central servers (Splunk, Logstash, etc) are they still in order of events or are they in order of adjustable time stamps.Summary

  • For everyday usage use a Zoned Offset
    • 1999-12-31T23:59:59-04:00 [America/New_York]
  • For scientific calculations use TAI, but how to distinguish from UTC?
    • 2017-03-11T14:31:36
  • For strict ordering of events, don’t use time: but keep a reference for indexing (finding the log lines)

ICO has no powers over webcams

ICO published a letter to Webcam manufacturers… well you don’t have to pay much attention to it if you are one.

Dell decided to break https encryption on their laptops by installed a vulnerable root certificate.

If you run a business and store personal data, you must go through heaps of hoops to ensure you are compliant with data protection law. But the manufacturer of the server, network equipment and laptops you have to use has no requirements: so they can be as insecure as they like and you pick up the bill when the ICO chases down their breach.

Case Reference Number RFA0606701

I write in relation to your concerns about Dell’s new equipment security fault, about which my colleague has previously responded to you.

The DPA works by placing obligations on organisations that hold personal information. The DPA does not however place any obligations on the manufacturers of equipment that may be used for storing personal information.

The security requirement of the DPA (the seventh data protection principle) requires an organisation holding personal data to have adequate technical and organisational measures in place to protect the personal data (taking account of the nature of the information being held, the availability of technology, and the cost of implementing those measures).

As such, an organisation that has purchased Dell equipment subject to the fault for the storage of personal data may be contravening the DPA if they have failed to keep personal data secure as a result of their use of insecure equipment for the storage of personal data.

Dell is not contravening any requirements of the DPA by selling insecure equipment. The DPA does not, in any way, require suppliers of equipment to ensure their products are secure. The obligations arising from the DPA are for organisations using the equipment for the storage of personal data.

Because our powers are specific to the DPA there is therefore no punitive or other action we can take against Dell over its failure to sell secure computer equipment.

CompletableFuture: does it block?

What happens with this (full code below)?

 @Test
    public void obviouslySecondFirst() {
        allOf(
                supplyAsync(() -> first).thenAccept(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenAccept(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }

This will randomly swap between returning [“second”,”first”] and [“first”,”second”] and therefore, randomly block second

 

Repeat it…

 @Test
    public void obviouslySecondFirstWithWaitBeforeCall() {
        final CompletableFuture<String> suppliedFirst = supplyAsyncFirst();
        delay(delay);
        allOf(
                suppliedFirst.thenAccept(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsyncSecond().thenAccept(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }

This is a bad design!

 

If the methods were split between two classes (AsyncCompletableFuture and SyncCompletableFuture) then I might forgive this as I could easily code review the differences, but they’re all thrown in the same one.

To make matters worse, some methods don’t explicitely have an async option.

So there’s a method exceptionally(), but no exceptionallyAsync(), will that block when you do supplyAsync(()->x).exceptionally(t->blockingLogging(t))?

EDIT: 22/06/16:10am

Confusing chaining

 @Test
    public void secondsecondShouldBeFirstFirst() {
        allOf(
                supplyAsync(() -> first).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)),
                supplyAsync(() -> second).thenApply(addDelayed(concurrentLinkedQueue, delay)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2))
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, second, first, first)));
    }

This will randomly block and randomly not: sometimes returning [first,first,second,second] and sometimes [second,first,second,first]

The code…

import org.junit.Test;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.function.Function;
import java.util.function.Supplier;
import static com.google.common.collect.ImmutableList.copyOf;
import static com.google.common.collect.ImmutableList.of;
import static java.util.concurrent.CompletableFuture.allOf;
import static java.util.concurrent.CompletableFuture.supplyAsync;
import static java.util.concurrent.TimeUnit.SECONDS;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.junit.Assert.assertThat;
public class SupplyItAsyncMaybe {
    private void delay(int seconds) {
        try {
            SECONDS.sleep(seconds);
        } catch (InterruptedException e1) {
            e1.printStackTrace();
        }
    }
    final String first = "first", second = "second";
    final int delay = 2;
    final ConcurrentLinkedQueue<String> concurrentLinkedQueue = new ConcurrentLinkedQueue<>();
    private Supplier<String> supplyFirstAfterDelay(int seconds, final String initalValue) {
        return () -> {
            delay(seconds);
            return initalValue;
        };
    }
    private Function<String, String> addDelayed(final ConcurrentLinkedQueue<String> concurrentLinkedQueue, final int seconds) {
        return (e) -> {
            delay(seconds);
            concurrentLinkedQueue.add(e);
            return e;
        };
    }
    @Test
    public void secondShouldBeFirst() {
        allOf(
                supplyAsync(() -> first).thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenApply(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
    @Test
    public void secondsecondShouldBeFirstFirst() {
        allOf(
                supplyAsync(() -> first).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)),
                supplyAsync(() -> second).thenApply(addDelayed(concurrentLinkedQueue, delay)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2))
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, second, first, first)));
    }
    @Test
    public void secondsecondShouldBeFirstFirstAlways() {
        CompletableFuture<String> stringCompletableFuture = supplyAsync(() -> first);
        delay(delay);
        allOf(
                stringCompletableFuture.thenApply(addDelayed(concurrentLinkedQueue, delay * 2)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)),
                supplyAsync(() -> second).thenApply(addDelayed(concurrentLinkedQueue, delay)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2))
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, second, first, first)));
    }
    @Test
    public void secondsecondShouldBeFirstFirstDelayedFutureSupplier() {
        allOf(
                supplyAsync(supplyFirstAfterDelay(delay, first)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2)),
                supplyAsync(supplyFirstAfterDelay(delay, second)).thenApply(addDelayed(concurrentLinkedQueue, delay)).thenApply(addDelayed(concurrentLinkedQueue, delay * 2))
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, second, first, first)));
    }
    @Test
    public void secondIsNeverFirst() {
        final CompletableFuture<String> suppliedFirst = supplyAsync(() -> first);
        delay(delay);
        allOf(
                suppliedFirst.thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenAccept(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
    @Test
    public void secondIsNeverFirstWhenDelayIsLonger() {
        final CompletableFuture<String> suppliedFirst = supplyAsync(supplyFirstAfterDelay(delay, first));
        delay(delay * 2);
        allOf(
                suppliedFirst.thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenAccept(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
    @Test
    public void asyncSimple() {
        allOf(
                supplyAsync(() -> first).thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenAcceptAsync(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
    @Test
    public void asyncWithADelay() {
        final CompletableFuture<String> suppliedFirst = supplyAsync(() -> first);
        delay(delay * 2);
        allOf(
                suppliedFirst.thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenAcceptAsync(concurrentLinkedQueue::add)
        ).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
    @Test
    public void asyncWithMultipleDelays() {
        CompletableFuture<String> stringCompletableFuture = supplyAsync(supplyFirstAfterDelay(delay, first));
        delay(delay * 2);
        allOf(
                stringCompletableFuture.thenApply(addDelayed(concurrentLinkedQueue, delay)),
                supplyAsync(() -> second).thenApply(concurrentLinkedQueue::add)).join();
        assertThat(copyOf(concurrentLinkedQueue), equalTo(of(second, first)));
    }
}

Programming Languages are Broken

In the context of this post, immutability is the surface of the feature that stays the same, allowing it to be reused with reliability.

It’s not just Left-pad

It’s left-pad; dependency, jar or dll hell; segfault, a conflict warning, a NoClassDefFoundError or unexpected behaviour; HTTP errors, marshalling errors and can even be unexpected timeouts, infinite loops and any other unexpected behaviour.

What did SQL get right?

If you model an order system in SQL, you could  contract a SQL guru to do it in 2003 and years later it’d likely still work. I’d be suprised if the average Node app can last months without some form of npm dependency problem.

Banks didn’t trust us

Enterprise software industry was maybe making progress on this: W3C (in the old days), JSRs, OMG and OASIS were making immutable standards with backwards compatibility.

But outside the “Enterprise” umbrella, the rest decided to shun strict xhtml, ebXML, SOAP, CORBA IDL and jumped into HTML5, REST, JSON and agile moving targets that steer and depend on many open source software projects.

Most businesses aren’t going to have a business model that changes very much; so why does the software that supposedly represents it?

Microsoft’s did do something good (never let me say that again), with relatively immutable contracts in their API layers and they weren’t the only ones, resulting in the famous acronym: VRMF.

Enterprise computing was dominated by some degree of immutable contracts from Oracle, Microsoft, IBM, Sun, Intel, etc and we still enjoy their efforts. Now, they weren’t doing all of this for fun… regulators liked interfaces.

So what is relatively immutable?

Most of these…

  • Instruction Sets
  • Assembly
  • POSIX
  • Enterprise programming languages
  • Enterprise Document Formats
  • Network protocols
  • Enterprise database interfaces
  • Filesystem data structures
  • Games console libraries

They share something in common. They are either used in regulated environments like for major industries’ core business (banking systems, medical uses, etc) or governments, run on embedded systems that are hard to update or tied to hardware.

And the rest …

  • Application code: but to be fair, this may not have any consumers except Human Beings
  • Custom, internal integration services and models developed internally in companies. From startup to multinational, their internal services are only as good as the care dedicated to the project. Sometimes you’re lucky to have a spec and other times that specification isn’t as long lasting as you’d hoped.
  • .., and I hate to say it, but if feels like most open source libraries and applications.

Open source projects seem to thrive on the ability to break users of their interfaces and those that don’t often have strong relationships with Enterprise businesses… not always, sometimes ties to Enterprise don’t help either.

Hacking safety in

Some build systems (usually rather poorly) try to enforce immutable versions on top of a programming language and at runtime plugin systems can try to do the same. But it’s not an easy process to work with. Both often require a lot of hand-holding to ensure that migration between immutable models, interfaces or services happens without disruption. If you’re lucky, they’ll warn you about problems and that is great… Maven would be a lot worse without Enforcer, but even in this case the tools aren’t always there by default. They highlight the other problem of strange behaviours in that why would you ever allow multiple versions of a library to be imported at the same time: this isn’t unique to Java or Maven, in fact they are probably a better pairing then most.

Why programming languages are to blame?

Although the languages themselves are relatively Immutable (when did java.lang.String not have backwards compatibility) they encourage software development that isn’t. The first mistake is to use text files for programming languages and depend on REST for build systems. Neither of which are immutable and yet both of which usually underpin the dependencies for a language, either the packaged library modules or the import/require statements use them, but if you imagine a language that only imported by torrent hashes, then breaking compatibility would be much harder, maybe impossible? Using a hash based database might work quite well, code might be unreadable:

import sha512[23123213...]
sha512[23123213...].apply(1))

But then you can map readability:

identify sha512[23123213...] as listbuilder
...
import listbuilder
listbuilder.apply(1)

Other problems are programming languages encourage mutable design patterns with abstract classes, implicits, annotation preprocessors/dynamic dependency injection.

Things often get worse you start using a languages’ custom DSL for XML or JSON: are you creating an XSD first?

Sometimes they embed auto serialization/deserialization to object formats, so code (that mutates) becomes the contract for middleware services, taking a problem at a language level and turning it into one that affects libraries and service layers alike.

Fixing it

  • Let’s create immutable build dependencies and imports.
  • Ban version conflicts
  • Let’s drop JSON and REST
  • Code to interfaces, don’t interface from code
  • Ban inheritance of mutable features and implicits that can modify  the runtime behaviour unexpectedly
  • Ban Javascript
  • Simplify languages
  • Aim for code to last 10 years for business domain logic, or maybe just ask the business more often about whether they realise the risks in the choices the development team is making.