IOS GPGPU programming: GPU floating-point calculation and read the results

I have written Transform Feedback related documents:

  • IOS GPGPU programming: GPU floating-point calculation and read the results, described in detail the use of Transform Feedback iOS complete flow.
  • NDK OpenGL ES C/C++ 3 compiler executable file (without JNI calls) describes the NDK configuration OpenGL ES version 3 and above, the development environment and the C++ executable file, the Android calculation can refer to this document for general GPU.
  • IOS GPGPU programming: to solve the Apple Developer Forum crashes Feedback doesn’t (Transform write), modify its code memory mapping specified space size is wrong, resulting in mapping errors.
  • IOS GPGPU programming: Transform Feedback to achieve image contrast adjustment, detailing how to use only vertex shader for image processing, read the results, image generation UIImage.

This series of documents in addition to the local window related code, the rest can be ported to run on the Android. The development environment is Xcode 7, the running environment is iOS 9, and the equipments are iPhone, 6p and iPad Air 2.

This document is the use of OpenGL ES 3 interface, the realization of a simple read by GPU data processing program, describes the use of Transform Feedback, the new content to facilitate follow-up study of particle effects, image processing etc.. For brevity, OpenGL is abbreviated as GL and OpenGL ES abbreviated as ES. For the specific application of Transform Feedback, you can see Soft Kitty OpenGL ES 3 demo, as shown below. Full demo video at Youtube.

IOS GPGPU programming: GPU floating-point calculation and read the results
Transform Feedback application example

In particular, the OpenGL desktop version of program allows only one vertex shader, and does not set fragment shader to work properly. However, ES 3 states that program must match a vertex, shader, and fragment shader, even if they are empty operations, otherwise the link phase is abnormal.

One advantage of Transform Feedback is the vertex shader processing after the write data back to the vertex buffer object (Vertex Buffer, Objects, VBOs, GPU, CPU) to avoid copying data from memory, save time. Since iOS uses a unified memory model, GPU and CPU data are actually stored in main memory, and Android does not necessarily, such as Nexus, 6P, mapping GPU addresses to CPU, tested by our team, consuming about 20ms.

1, algorithm

IOS and Android follow the following algorithm, and the difference is that the ES is configured to bridge the local window (for example, UIKit).

  1. Configuring the EGL context
  2. Provides vertex shaders and chip shaders for computation
  3. Configuring program
  4. Configure the transform feedback output property name before linking program
  5. Link program
  6. Configure the GPU’s input data buffer and upload data to GPU
  7. Configure the output data buffer for GPU and bind to transform feedback
  8. Disable rasterization and other rendering pipeline subsequent operations
  9. Go into the transform feedback mode
  10. Drawing call
  11. End the transform feedback mode
  12. Synchronous GPU
  13. Mapping GPU memory
  14. Read the GPU calculation results
  15. De mapping

The operation 1~8 can be placed in the initialization function, and the subsequent steps begin the rendering operation. After testing, the use of Transform Feedback on iOS must match GLKViewController. The context information used by the initialization process is declared on the stack or heap, and does not affect the result of the operation. For more complex operations, such as configuring GPUImage, you should save EAGLContext and set the context on normal threads.

2, example implementation

1, configure the EGL context and refer to my other document, iOS OpenGL ES 3, programming 1:, “Hello world””. For simplicity, create the Game type project here and write subsequent code in the GLKViewController subclass. Self derived GLKViewController needs to be configured correctly. GLKView may be encountered in Section 3.1, and GLKit does not use the GLView invalid description.

2 provide vertex shaders and slice shaders for computation. Because only the calculations do not show the screen, then the Vertex Shader is the real place to work.

#version 300, ES, layout (location = 0), in, float, inValue, out, float, outValue, void, main (), {outValue = sqrt (inValue)}

Fragment Shader is empty operation.

#version 300, ES, void, main () {{}

We use GPU in order to exploit its parallel features, so how many unified parallel computing units are there in iPad Air 2? This cannot be checked with the OpenGL ES interface. You need to find the appropriate chip manual. The following query code output is invalid. Here is an example.

#import < OpenGLES/ES1/glext.h> printf; ////////////////////////////////////////////////////// ("%s/n", glGetString (GL_VERSION)); GLint vertexUnits; glGetIntegerv (GL_MAX_VERTEX_UNITS_OES, & vertexUnits);

Execution results:

OpenGL, ES-CM 1.1, Apple, A8X, GPU - 77.14, vertex, units = 4

3, configure program. Be careful not to link to program immediately.

GLuint vertShader, NSString *vertShaderPathname, *fragShaderPathname fragShader; shader program.; / / Create = glCreateProgram (_program); / / Create and compile vertex shader. vertShaderPathname [[NSBundle mainBundle] pathForResource:@ = "Shader" ofType:@ "VSH"]; if ([self compileShader:& vertShader type:GL_VERTEX_SHADER! File:vertShaderPathname]) {NSLog ("Failed to compile vertex @ shader"); return NO Create and compile fragment;} / / shader. = fragShaderPathname [[NSBundle mainBundle] pathForResource:@ "Shader" ofType:@ "FSH"]; if ([self compileShader:& fragShader type:GL_FRAGMENT_SHADER! File:fragShaderPathname]) {NSLog ("Failed to compile fragment @ shader"); return NO;} / / Attach vertex shader to program. glAttachS Hader (_program, vertShader); / / Attach fragment shader to program. glAttachShader (_program, fragShader);

4. Configure the transform feedback output property name before linking to program.

After careful observation, the Vertex Shader used in this article is slightly different from the shader used for normal drawing: no vertex coordinates are used for Fragment Shader. Therefore, you need glTransformFeedbackVaryings to tell ES to capture the property attributes of the output buffer.

GLchar *varyings[] = {outValue}; glTransformFeedbackVaryings (_program, sizeof (varyings) / sizeof (varyings[0]), varyings, GL_INTERLEAVED_ATTRIBS);

GlTransformFeedbackVaryings requires the number and name of variables to be output, specifying the attribute name to be output in the varyings array specifying Vertex Shader.

  • GL_INTERLEAVED_ATTRIBS specifies that the output property values are interleaved into a buffer. Interleaved data needs to specify the read – write span (stride).
  • GL_SEPARATE_ATTRIBS specifies multiple target buffers for output attributes, writes one to one or different offsets to a buffer.

5, link program. The linking operation includes an examination of the link state (glLinkProgram), find the compilation errors in debug mode can also verify whether the current state of the ES executable program program (glValidateProgram), which is to find out the runtime error, according to the check, the output error information. GlValidateProgram operations consume more resources, and this function is not usually called in Release mode. Examples are as follows.

1, check / link state glLinkProgram (_program); GLint linkStatus = GL_FALSE; glGetProgramiv (_program, GL_LINK_STATUS, & linkStatus); if (linkStatus = = GL_FALSE) {GLint logLength = 0; glGetProgramiv (_program, GL_INFO_LOG_LENGTH, & logLength); if (logLength > 0) {GLchar *logBuffer = calloc (1 logLength; glGetProgramInfoLog (_program), logLength, NULL, logBuffer); printf (%s, logBuffer); free (logBuffer);}} / / 2, verify whether the Shader can execute glValidateProgram (_program); glGetProgramiv (_program, GL_INFO_LOG_LENGTH, & logLength); if (logLength > 0) {GLchar *log = (GLchar *) malloc (logLength); glGetProgramInfoLog (_program, logLength, & logLength, log); NSLog (@ P Rogram validate log:/n%s ", log); free (log);} glGetProgramiv (_program, GL_VALIDATE_STATUS, & status); if (status = = 0) {return NO}; return YES;

6, configure the GPU input data buffer. It should be noted that if you upload data directly, you can’t map the upload buffer to the main memory to view the uploaded data. Using the buffer (Vertex, Buffer, Object) is normal.

Before you use VBO, you can configure VAO without configuring or affecting the running results.

GLuint VAO; glGenVertexArrays (1, & VAO); glBindVertexArray (VAO);

Upload data mode A, upload data directly to the GPU implementation, yourData array.

GLfloat yourData[] = {2, 3, 4, 5, 6}; glEnableVertexAttribArray (0); / / layout (location = 0) index glVertexAttribPointer (0, 1, designated GL_FLOAT, GL_FALSE, 0, yourData);

For dynamically allocated memory, the normal rendering of triangles is fine, but there is no proper running result here.

GLfloat *data; data = malloc (sizeof (GLfloat) * 5 (int); for I = 0; I 5; < ++i) {data[i] = I + 2}; glEnableVertexAttribArray (0); / / layout (location = 0) index glVertexAttribPointer (0, 1, designated GL_FLOAT, GL_FALSE, 0. YourData); / / with glDrawArrays (GL_POINTS, 0, 5); the number of drawing elements specified, the length information of dynamic memory allocation. / / however, invalid Transform Feedback.

Upload data mode B, Object Buffer (Vertex buffer).

GLfloat data[] = {2, 3, 4, 5, 6}; glGenBuffers (1, & _vertexBuffer); glBindBuffer (GL_ARRAY_BUFFER, _vertexBuffer); glBufferData (GL_ARRAY_BUFFER, sizeof (data), data, GL_STATIC_DRAW); glEnableVertexAttribArray (0); glVertexAttribPointer (0, 1, 0, GL_FLOAT, GL_FALSE, NULL);

Although most of the OpenGL PC driver can be intelligent analysis of glBufferData memory usage and the use of appropriate management methods, but WWDC, a speech, the apple OpenGL ES Driver Development Engineer advice according to our data is used, the transfer memory management parameters suitable for the suggested value to the system. Therefore, on this occasion, the data is computed only once, so the GL_STATIC_DRAW is passed.

GlEnableVertexAttribArray (0). The specified index, if not written in shader, can be obtained by means of GLint inputAttribIndex = glGetAttribLocation (program, inValue).

7, configure the output data buffer of GPU and bind to transform feedback

GlGenBuffers (1, & _gpuOutputBuffer); glBindBuffer (GL_ARRAY_BUFFER, _gpuOutputBuffer); glBufferData (GL_ARRAY_BUFFER, sizeof (data), NULL, GL_STREAM_DRAW); glBindBuffer (GL_ARRAY_BUFFER, 0); glBindBufferBase (GL_TRANSFORM_FEEDBACK_BUFFER, 0, _gpuOutputBuffer);

When the output buffer calls glBufferData, it does not specify the data source (passing NULL), which is only the memory needed to load the GPU calculation results, depending on the output result, in bytes. The use here is the same as that of Pixel Buffer Object. For the impact of Pxiel Buffer Object on performance, refer to my other document, OpenGL ES PBO (Pixel Buffer Object), performance measurement. The
glBindBufferBase completes the actual binding of the Transform Feedback output to the data buffer.

  • Parameter GL_TRANSFORM_FEEDBACK_BUFFER: Specifies that the Transform Feedback buffer is used.
  • Parameter 0: indicates the use of first output attributes. This example Vertex Shader, although not explicitly modified by layout (location = 0), is based on 0 growth, so the first one is represented by 0.
  • Parameter _gpuOutputBuffer: Specifies the VBO used for the binding, that is, GPU writes the result data to the specified location.

8, disable rasterization and other rendering pipeline subsequent operations

Rendering operations such as rasterization, fragmentation, shaders, depth testing, rendering, and subsequent pipeline operations are redundant, disabling and saving resources. This time, all the unified computing units execute the code in the vertex shader asynchronously in the OpenGL ES command queue for this application. It is worth mentioning that the modern GPU has no distinction between vertex processing unit and fragment processing unit, they are referred to as unified processing unit (Uniform Process Unit), that is, with a processing unit, will deal with the vertex shader code, and then execute the code fragment shader.

GlUseProgram (_program); glEnable (GL_RASTERIZER_DISCARD);

GlUseProgram (_program) call immediately after the link, and then use Buffer to prepare the data for the GPU, or you can upload the data before the drawing call, the order does not affect the results of the implementation.

9, enter transform feedback mode glBeginTransformFeedback (GL_POINTS). Although it is specified as a point method, you do not actually see these points. In addition, although it is only a GLfloat type, and the point coordinates of space need three components, we still take it as a point of view. In addition, the correct rendering pattern should be used and consistent with the glDrawArrays according to the business requirements.

10, drawing calls, glDrawArrays (GL_POINTS, 0, 5); the drawing method is the same as entering the mode of transformation feedback.

11, end transform feedback mode, glEndTransformFeedback ();.

12, synchronous GPU

Because GPU is asynchronous execution, then before mapping memory, you need to make sure that the previous ES instructions are executed. There are three ways to synchronize:

  1. GlFlush () refreshes the ES command queue and enforces the GPU command queue in a limited amount of time.
  2. GlFinish () blocks the current thread and waits for all the GPU instructions to complete.
  3. The glWaitSync () needs to be synchronized with the synchronization object. The programming is a bit of a hassle, and the subsequent documentation is not detailed here.

For simplicity, glFinish () is used here.

13, mapping GPU memory to prepare for reading GPU processing results

Now you need to map the GPU Transform Feedback buffer space to the CPU address space. Desktop version of GL is very convenient to operate:

GLfloat feedback[5]; glGetBufferSubData (GL_TRANSFORM_FEEDBACK_BUFFER, 0, sizeof (feedback), feedback);

ES no glGetBufferSubData, the operation must be winding. In ES, you can use glMapBufferRange to map GPU memory.

Float, *gpuMemoryBuffer = glMapBufferRange (GL_ARRAY_BUFFER, 0, sizeof (data), GL_MAP_READ_BIT);

14, read the GPU calculation results

(if! GpuMemoryBuffer) {printf (@ "gpuMemoryBuffer = null");} for (int i = 0; I 5; < ++i) {printf (gpuMemoryBuffer[%d] =%f////////t, I, gpuMemoryBuffer[i]);} printf ("////////n");

In the data source’GLfloat data[] = {2, 3, 4, 5, 6} ‘; for example, in the Vertex Shader as the root operation on it, print the following values:

GpuMemoryBuffer[0] = 1.414214, gpuMemoryBuffer[1] = 1.732051, gpuMemoryBuffer[2] = 2, gpuMemoryBuffer[3] = 2.236068, gpuMemoryBuffer[4] = 2.449490

15, lifting mappings, glUnmapBuffer (GL_ARRAY_BUFFER);.

That is, the seventh step uses glBindBuffer (GL_ARRAY_BUFFER, _gpuOutputBuffer); the GPU output buffer binding for the GL_ARRAY_BUFFER type, so the subsequent BufferData, and UnmapBufferRange MapBufferRange using the same parameters. However, for Transform Feedback, GL_TRANSFORM_FEEDBACK_BUFFER can also be used to read data as long as the current bound buffer is consistent.

3, common problems

3.1, GLKit does not use GLView, causing the Transform Feedback operation to be invalid

Self.context = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES3]; if (! Self.context) {NSLog ("This application requires OpenGL ES @ 3"); (abort);} / / GLKView = *view (GLKView * self.view); / / view.context / / view.drawableDepthFormat = self.context; [EAGLContext = GLKViewDrawableDepthFormat24; setCurrentContext:self.context];

If you only do calculations, it is generally considered that you do not need to be annotated, after all, do not show. However, the test also found that Transform Feedback can not read the results of the return, program, shader and so on all work.

3.2, Transform Feedback result buffer result is 0

The buffer returned with glMapBufferRange is not NULL, but when it is read, it is 0.0f. You can use the glMapBufferRange mapping GPU to write buffers to see if the data is properly uploaded. As shown in the following code, print only and do not change the uploaded data.

Float *gpuInputMemoryBuffer = glMapBufferRange (GL_ARRAY_BUFFER, 0, sizeof (data), GL_MAP_WRITE_BIT); if (! GpuInputMemoryBuffer) {printf ("gpuInputMemoryBuffer = null");} for (int i = 0; I 5; < ++i) {printf (gpuInputMemoryBuffer[%d] =%f////////t, I, gpuInputMemoryBuffer[i] (printf);} the "////////n"); glUnmapBuffer (GL_ARRAY_BUFFER);

Direct upload data can not use this way to print data, upload data directly, sample code is as follows.

GlEnableVertexAttribArray (0); glVertexAttribPointer (0, 1, GL_FLOAT, GL_FALSE, 0, gpuInputDataArray);

Another case is that BufferData and glVertexAttribPointer are used to specify data sources, such as:

GLuint VBO; glGenBuffers (1, & VBO); glBindBuffer (GL_ARRAY_BUFFER, VBO); glBufferData (GL_ARRAY_BUFFER, sizeof (data), data, GL_STREAM_DRAW); / / the specified data source parameters glVertexAttribPointer upload data ((GLuint) inValuePos, 1, GL_FLOAT, GL_FALSE, 0, data);

It’s not recommended.

Simple mathematical calculation can not reflect the advantages of GPU, usually in the image processing and other occasions, there will be more obvious GPU processing faster than CPU phenomenon.
next document: iOS GPGPU programming: Transform Feedback to achieve image contrast adjustment.

Recommended reading

  • Exploring GPGPU on iOS
  • Transform feedback
  • TransformFeedback Java implementation
  • Noise-Based, Particles, Part, II
  • Particle, System, using, Transform, Feedback
  • Boids
  • GlMapBufferRange, returning, all, zeros, in, Android
  • GlMapBufferRange (), returns, all, zeros, in, Android, OpenGLES, 3, using, TrasnformFeedback