Skip to content

Conversation

@aangerma
Copy link
Contributor

@aangerma aangerma commented Jul 11, 2018

This pull-request improves latency and CPU utilization of align processing block.
The solution combines SSE4 optimization for Intel architecture with changes to the algorithm to reduce overhead.

Following-up on: #1189, #1105

OS Microsoft Windows 10 Enterprise

Processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 2301 Mhz, 2 Core(s), 4 Logical Processor(s)

Align depth to color

    Depth 640x480 Depth 1280x720
Color 640x480 NAIVE  (microseconds) 11217.4 26023.6
SSE    (microseconds) 4346.96 4360.56
Speedup   x 2.58 x 5.96
Color 1280x720 NAIVE  (microseconds) 14868.2 44311.9
SSE     (microseconds) 10497.1 8327.42
Speedup   x 1.41 x 5.32

Align color to depth

    Color 640x480 Color 1280x720
Depth 640x480 NAIVE    (microseconds) 14904.4 14642.8
SSE       (microseconds) 3454.34 3863.21
Speedup   x 4.31 x 3.79
Depth 1280x720 NAIVE  (microseconds) 24630.5 22478.8
SSE     (microseconds) 3218.72 3292.07
Speedup   x 7.65 x 6.82

Ubuntu 16.04

Intel® Core™ i7-6700 CPU @ 3.40GHz × 8

Align depth to color

    Depth 640x480 Depth 1280x720
Color 640x480 NAIVE (microseconds) 13006.9 19729.7
SSE   (microseconds) 1941.03 1120.02
Speedup   x 6.70 x 17.61
Color 1280x720 NAIVE (microseconds) 8325.66 21869.3
SSE    (microseconds) 4256.76 3838.1
Speedup   x 1.95 x 5.69

Align color to depth

    Color 640x480 Color 1280x720
Depth 640x480 NAIVE (microseconds) 14049.8 10048.8
SSE    (microseconds) 1475.04 1518.32
Speedup   x 9.52 x 6.61
Depth 1280x720 NAIVE (microseconds) 22304.3 25391.8
SSE    (microseconds) 1430.76 1526.98
Speedup   x 15.58 x 16.62

@aangerma aangerma force-pushed the development branch 3 times, most recently from 85bd4a4 to 4fe7f05 Compare July 12, 2018 14:29
@dorodnic dorodnic closed this Jul 12, 2018
@dorodnic dorodnic reopened this Jul 12, 2018
@dorodnic dorodnic closed this Jul 15, 2018
@dorodnic dorodnic reopened this Jul 15, 2018
Copy link
Contributor

@ev-mp ev-mp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

byte* other_aligned_to_depth = const_cast<byte*>(aligned_frame.frame->get_frame_data());
memset(other_aligned_to_depth, 0, depth_intrinsics.height * depth_intrinsics.width * aligned_bytes_per_pixel);
#ifdef __SSSE3__
auto uid = other_frame->get_stream()->get_unique_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used ?

auto data = (int16_t*)depth_frame->get_frame_data();

#ifdef __SSSE3__
auto uid = other_frame->get_stream()->get_unique_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

}
}

} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation + add empty line at the end of file

{
}

void image_transform::pre_compute_x_y_map(float offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments what the pre-compute comprise of

bottom_right = _pixel_bottom_right_int;
}

switch (bpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch case seem to be redundant

template<>
inline void image_transform::distorte_x_y<RS2_DISTORTION_MODIFIED_BROWN_CONRADY>(const __m128& x, const __m128& y, __m128* distorted_x, __m128* distorted_y, const rs2_intrinsics& to)
{
__m128 c[5];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for the hard-coded values - [5] == distortion coefficients,
for maintainability

const rs2_intrinsics& to,
const rs2_extrinsics& from_to_other)
{
//mask for shuffle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add pseudo-code to describe the flow for maintainability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants