1. An Apache Beam ML Model Deployment

    Fri 31 July 2020

    Data processing pipelines are useful for solving a wide range of problems. For example, an Extract, Transform, and Load (ETL) pipeline is a type of data processing pipeline that is used to extract data from one system and save it to another system. Inside of an ETL, the data may be transformed and aggregated into more useful formats. ETL jobs are useful for making the predictions made by a machine learning model available to users or to other systems. The ETL for such an ML model deployment lookslike this: extract features used for prediction from a source system, send the features to the model for prediction, and save the predictions to a destination system. In this blog post we will show how to deploy a machine learning model inside of a data processing pipeline that runs on the Apache Beam framework.

    read more
  2. A ZeroRPC ML Model Deployment

    Mon 04 May 2020

    There are many different ways for two software processes to communicate with each other. When deploying a machine learning model, it's often simpler to isolate the model code inside of its own process. Any code that needs to use the model to make predictions then needs to communicate with the process that is running the model code to make predictions. This approach is easier than embedding the model code in the process that needs the predictions because it saves us the trouble of recreating the model's algorithm in the programming language of the process that needs the predictions. RPC calls are also used widely to connect code that is executing in different processes. In the last few years, the rise in popularity of microservice architectures has also caused the rise in popularity of RPC for integrating systems.

    read more
  3. A Websocket ML Model Deployment

    Sat 04 April 2020

    In the world of web applications, the ability to create responsive and interactive experiences is limited when we do normal request-response requests against a REST API. In the request-response programming paradigm, requests are always initiated by the client system and fulfilled by the server and continuously sending and receiving data is not supported. To fix this problem, the Websocket standard was created. Websockets allow a client and service to exchange data in a bidirectional, full-duplex connection which stays open for a long period of time. This approach offers much higher efficiency in the communication between the server and client. Just like a normal HTTP connection, Websockets work in ports 80 and 443 and support proxies and load balancers. Websockets also allow the server to send data to the client without having first received a request from the client which helps us to build more interactive applications.

    read more
  4. A MapReduce ML Model Deployment

    Sun 23 February 2020

    Because of the growing need to process large amounts of data across many computers, the Hadoop project was started in 2006. Hadoop is a set of software components that help to solve large scale data processing problems using clusters of computers. Hadoop supports mass data storage through the HDFS component and large scale data processing through the MapReduce component. Hadoop clusters have become a central part of the infrastructure of many companies because of their usefulness. In this blog post, we'll focus on the MapReduce component of Hadoop since we will be deploying a machine learning model, which is a compute-intensive process. MapReduce is a programming framework for data processing which is useful for processing large amounts of distributed data. MapReduce is able to handle errors and failures in the computation. MapReduce is also inherently parallel in nature but abstracts out that fact, making the code look like single-process code.

    read more
  5. A gRPC Service ML Model Deployment

    Mon 20 January 2020

    With the rise of service oriented architectures and microservice architectures, the gRPC](https://grpc.io/) system has become a popular choice for building services. gRPC is a fairly new system for doing inter-service communication through Remote Procedure Calls (RPC) that started in Google in 2015. A remote procedure call is an abstraction that allows a developer to make a call to a function that runs in a separate process, but that looks like it executes locally. gRPC is a standard for defining the data exchanged in an RPC call and the API of the function through protocol buffers. gRPC also supports many other features, such as simple and streaming RPC invocations, authentication, and load balancing.

    read more

« Page 2 / 4 »

social