R is the language of choice for many statisticians and data scientists when building predictive models because it has so many packages devoted for predictive modeling. Nevertheless, because most packages in R are for pure data analysis and machine learning purposes, software developers rarely used the language when developing web, mobile, or really any other kind of applications.
If your company has a situation like this where there are multiple groups of programmers using different languages, it can be difficult for them to work together. For example, data scientists deliver machine learning models in R and developers somehow have to utilize the models in your company’s applications but do not know R. Like when people have translation issues, we can solve this “language” problem in four ways:
1. Only one language: do not speak with anyone speaking in other language so that this problem does not exist to begin with
2. A translator: hire a translator
3. A common language: a non-English speaking people travel a foreign country and she starts talking in English
4. A pre-communicated interface: you have foreigners in your company. You and they never understand each other, but when you give them documents in a previously discussed and agreed format, they know what the documents mean and do their job.
When you encounter translation issues, which method do you use? I personally like the second and third ones in real life because a translator can cover many situations and I happen to be able to speak the modern era’s common language (English). Let us take a deeper look at each method in the context of the “programming” language issue.
1. Only one language: limit your human resource pool
Surprisingly, I have met many organizations that insisted within their organization, everyone has to use one programming language. It goes something like “we use Java to develop our applications so we do not want to work with any data scientist who does not use Java.”
This approach clearly has disadvantages. These organizations are essentially limiting their human resource pool. It is like saying “I do not want to travel Japan because they do not speak English.”
2. A translator: the translation might be off
Human beings are smart and in many scenarios marginal errors in translation do not cause big problems.
In contrast, computers are dumb – they can only do what they are instructed to do. If a translation result contains different instructions from originals, computers won’t be able to tell. And, if computers continue to operate with wrong instructions for some time, wrong results will pile up and they can cause customer complaints, lost opportunities, monetary loss, and nightmares for developers and data scientists to figure out why the discrepancies happened.
An organization can hire devoted professionals to manually translate. Nonetheless, if there is a better way that does not require such hiring (which we will discuss in a bit), the organization is better off not spending money for the hiring. In addition, an instruction written in one programming language cannot be translated without any discrepancy to other language because of the nuances and characteristics that the particular language has.
Furthermore, translating programming languages can be impossible and be boring. Generating thousands of lines of code using an existing package in R requires only a couple of clicks but translating it is a different story. Also, what the professional is required to do is translating instructions and instructions are not love stories with adventures and dreams.
3. A common language: burdens creators of programming libraries
There are many attempts to create a “new” common language in solving the “translation” issue (did you get the irony?). For example, there is a language called PMML (Predictive Model Markup Language) whose purpose is to have all different machine learning models expressed in one language. Though I wholeheartedly agree with the intent of it, I frantically disagree the ways it was done because if it adopted widely, it means that every time I make a new useful machine learning algorithm, I also have to provide ways to translate it to PMML, which can be avoided (how? we will discuss how in a bit). Not only that, what about all these existing cool machine learning packages that do not support a translation to PMML? Can I not use them because they do not support PMML?
Also, remember programming languages have different nuances and translating one to another can be extremely tricky. Some languages favor readability over exactness. Some trust developers more than others. Some are entity-oriented and some are more instruction-oriented. So on and on and on…
4. A pre-communicated interface: the way to go
In human life, translation is necessary. What people say and write contain ideas, feelings and philosophies and these are worth sharing in other languages.
In my opinion, in computer science, translation must be avoided except for very special cases. Codes are instructions whose accuracy is the most important. Translating codes introduces another layer in which errors can happen.
In this regard, being able to communicate without translating is critical when your organization has developers using many languages. How can groups of people who do not understand each other’s language communicate without translation?
One day, I went to a Mexican restaurant nearby for lunch. Probably to serve genuine Mexican food, the cooks were Hispanic speaking in Spanish. A server took my order in English and left a note with names and numbers of items to the cooks and without anything translated or spoken, the cooks understood what to do and get their cooking started. That moment, I was very amused.
It happens everyday in our life that it sounds trivial but if you think more carefully, you will realize that it is quite astonishing - in that restaurant, with that piece of note as an interface between cooks and servers, they can still serve the food even if servers speak in French, German, and Italian and cooks speak in Spanish, Portuguese and Khmer. A mutually communicated protocol on how to interact with other party obviates translation.
Going back to our example of the oriental words. we see that we should never really bother translating these words. If you are doing a business with a Chinese entity, you and the Chinese company just define ways to interact with each other (for example, you send a message to them "A: 100" and they ship 100 smartphones). You don't have to explain (i.e. translate) how you will come up with messages, how you will use the products after shipped and etc. - you just give them a message in an agreed format. Same goes to the Chinese company; they never have to explain how they make smartphones and etc. Perhaps they might use the word 上火 to figuratively describe smartphone that are internally too heated up during manufacturing but it is not important at all as long as both parties know how to generate and understand messages and meet their requirements.
In fact, this usage of mutually communicated protocol is not my own new, brilliant idea. It is called API(Application Program Interface) and used all the time. With an API, there is no need to understand the language of the opponent party (in fact, you do not even have to know what language the party used). Furthermore, programmers in any language can communicate via the API. The concept of API well resonates with how the pioneers of computer science solved the design issues when there are multiple modules possibly running in different languages talk to each other.
Design with Contracts
Even though I mentioned APIs are used all the time, it is usually between businesses, not between teams in an organization. It is because building an API requires significant amount of overhead - you have to deploy your software, check every request is valid, constantly monitor the infrastructure supporting your APIs, come up with a document outlining the API’s protocols and so many more.
This is precisely why Knowru came into being. To start with, we’ve automated all these overhead in building APIs for the top two languages data scientists use - R and Python. With Knowru, data scientists and developers alike can focus on what they are good at, what they are needed for. It will let your coders collaborate better ironically by allowing them to communicate less. The end game? No limitation on which language programmers can code, no tedious work on translation, fast and frequent delivery of high-value prediction models, better cooperation among coders and better culture, all of which will lead your organization to the next level of agility.