News on China's scientific and technological development.

N00813

Junior Member
Registered Member
Neural-network optimisation for the SW26010 chip; expansion of capabilities for the TaihuLight.

Please, Log in or Register to view URLs content!


China Tunes Neural Networks for Custom Supercomputer Chip
July 11, 2017
Please, Log in or Register to view URLs content!


sunway-taihulight-bw-200x152.jpg



Supercomputing centers around the world are preparing their next generation architectural approaches for the insertion of AI into scientific workflows. For some, this means retooling around an existing architecture to make capability of double-duty for both HPC and AI.

Teams in China working on the top performing supercomputer in the world, the Sunway TaihuLight machine with its custom processor, have shown that their optimizations for theSW26010 architecture on deep learning models have yielded a 1.91-9.75X speedup over a GPU accelerated model using the Nvidia Tesla K40m in a test convolutional neural network run with over 100 parameter configurations.

Efforts on this system show that high performance deep learning is possible at scale on a CPU-only architecture. The Sunway TaihuLight machine is based on the 260-core Sunway SW26010, which
Please, Log in or Register to view URLs content!
from both a chip and systems perspective. The convolutional neural network work was bundled together as swDNN, a library for accelerating deep learning on the TaihuLight supercomputer

According to Dr. Haohuan Fu, one of the leads behind the swDNN framework for the Sunway architecture (and associate director at the National Supercomputing Center in Wuxi, where TaihuLight is located), the processor has a number of unique features that couple potentially help the training process of deep neural networks. These include “the on-chip fusion of both management cores and computing core clusters, the support of a user-controlled fast buffer for the 256 computing cores, hardware-supported scheme for register communication across different cores, as well as the unified memory space shared by the four core groups, each with 65 cores.”

Please, Log in or Register to view URLs content!


Despite some of the features that make the SW26010 a good fit for neural networks, there were some limitations teams had to work around, the most prominent of which was memory bandwidth limitations—something that is a problem on all processors and accelerators tackling neural network training in particular. “The DDR3 memory interface provides a peak bandwidth of 36GB/s for each compute group (64 of the compute elements) for a total bandwidth of 144 GB/s per processor. The Nvidia K80 GPU, with a similar double-precision performance of 2.91 teraflops, provides aggregate memory bandwidth of 480 GB/s…Therefore, while CNNs are considered a compute-intensive kernel care had to be taken with the memory access scheme to alleviate the memory bandwidth constraints.” Further, since the processor does not have a shared buffer for frequent data communications as are needed in CNNs, the team had to rely on a fine-grained data sharing scheme based on row and column communication buses in the CPU mesh.

“The optimized swDNN framework, at current stage, can provide a double-precision performance of over 1.6 teraflops for the convolution kernels, achieving over 50% of the theoretical peak. The significant performance improvements achieved from a careful utilization of the SW26010s architectural features and a systematic optimization process demonstrate that these unique features and corresponding optimization schemes are potential candidates to be included in future DNN architectures as well as DNN-specific compilation tools.”

According to Fu, “By performing a systematic optimization that explores major factors of deep learning, including the organization of convolution loops, blocking techniques, register data communication schemes, as well as reordering strategies for the two pipelines of instructions, the SW26010 processor on the Sunway TaihuLight supercomputer has managed to achieve a double-precision performance of over 1.6 teraflops for the convolution kernel, achieving 54% of the theoretical peak.”

Please, Log in or Register to view URLs content!

To further get around the memory bandwidth limitations, the team created a three-pronged approach to memory for its manycore architecture. Depending on what is required, the CPE (compute elements) mesh can access the data items either directly from global memory or from the three-level memory hierarchy (register, local data memory and larger, slower memory).

Part of the long-term plan for the Sunway TaihuLight supercomputer is to continue work on scaling traditional HPC applications to exascale, but also to continue neural network efforts in a companion direction. Fu says that TaihuLight teams are continuing the development of swDNN and are also collaborating with face++ for facial recognition applications on the supercomputer in addition to work with Sogou for voice and speech recognition. Most interesting (and vague) was the
Please, Log in or Register to view URLs content!
of a potential custom chip for deep learning, although he was non-committal.

The team has created a customized register communication scheme that targets maximizing data reuse in the convolution kernels, which reduces the memory bandwidth requirements by almost an order of magnitude, they report in
Please, Log in or Register to view URLs content!
(IEEE subscription required). “A careful design of the most suitable pipelining of instructions was also built that reduces the idling time of the computation units by maximizing the overlap of memory operation and computation instructions, thus maximizing the overall training performance on the SW26010.”

Please, Log in or Register to view URLs content!

Double precision performance results for different convolution kernels compared with the Nvidia Tesla K40 using the cuDNNv5 libraries.

To be fair, the Tesla K40 is not much of a comparison point to newer architectures, including Nvidia’s Pascal GPUs. Nonetheless, the Sunway architecture could show comparable performance with GPUs for convolutional neural networks—paving the way for more discussion about the centrality of GPUs in current deep learning systems if CPUs can be rerouted to do similar work for a lower price point.

The emphasis on double-precision floating point is also of interest since the trend in training and certainly inference is to push lower while balancing accuracy requirements. Also left unanswered is how convolutional neural network training might scale across the many nodes available—in short, is the test size indicative of the scalability limits before the communication bottleneck becomes too severe to make this efficient. However, armed with these software libraries and the need to keep pushing deep learning into the HPC stack, it is not absurd to think Sunway might build their own custom deep learning chip, especially if the need arises elsewhere in China—which we suspect it will.

More on the deep learning library for the Sunway machine can be
Please, Log in or Register to view URLs content!
.
 

N00813

Junior Member
Registered Member
Please, Log in or Register to view URLs content!


First Object Teleported from Earth to Orbit
Researchers in China have teleported a photon from the ground to a satellite orbiting more than 500 kilometers above.
Last year, a Long March 2D rocket took off from the Jiuquan Satellite Launch Centre in the Gobi Desert carrying a satellite called Micius, named after an ancient Chinese philosopher who died in 391 B.C. The rocket placed Micius in a Sun-synchronous orbit so that it passes over the same point on Earth at the same time each day.

Micius is a highly sensitive photon receiver that can detect the quantum states of single photons fired from the ground. That’s important because it should allow scientists to test the technological building blocks for various quantum feats such as entanglement, cryptography, and teleportation.

Today, the Micius team announced the results of its first experiments. The team created the first satellite-to-ground quantum network, in the process smashing the record for the longest distance over which entanglement has been measured. And they’ve used this quantum network to teleport the first object from the ground to orbit.

Teleportation has become a standard operation in quantum optics labs around the world. The technique relies on the strange phenomenon of entanglement. This occurs when two quantum objects, such as photons, form at the same instant and point in space and so share the same existence. In technical terms, they are described by the same wave function.

quantum-orbit.png

The curious thing about entanglement is that this shared existence continues even when the photons are separated by vast distances. So a measurement on one immediately influences the state of the other, regardless of the distance between them.

Back in the 1990s, scientists realized they could use this link to transmit quantum information from one point in the universe to another. The idea is to “download” all the information associated with one photon in one place and transmit it over an entangled link to another photon in another place.

This second photon then takes on the identity of the first. To all intents and purposes, it becomes the first photon. That’s the nature of teleportation and it has been performed many times in labs on Earth.

Teleportation is a building block for a wide range of technologies. “Long-distance teleportation has been recognized as a fundamental element in protocols such as large-scale quantum networks and distributed quantum computation,” says the Chinese team.

In theory, there should be no maximum distance over which this can be done. But entanglement is a fragile thing because photons interact with matter in the atmosphere or inside optical fibers, causing the entanglement to be lost.

As a result, the distance over which scientists have measured entanglement or performed teleportation is severely limited. “Previous teleportation experiments between distant locations were limited to a distance on the order of 100 kilometers, due to photon loss in optical fibers or terrestrial free-space channels,” says the team.

But Micius changes all that because it orbits at an altitude of 500 kilometers, and for most of this distance, any photons making the journey travel through a vacuum. To minimize the amount of atmosphere in the way, the Chinese team set up its ground station in Ngari in Tibet at an altitude of over 4,000 meters. So the distance from the ground to the satellite varies from 1,400 kilometers when it is near the horizon to 500 kilometers when it is overhead.

To perform the experiment, the Chinese team created entangled pairs of photons on the ground at a rate of about 4,000 per second. They then beamed one of these photons to the satellite, which passed overhead every day at midnight. They kept the other photon on the ground.

Finally, they measured the photons on the ground and in orbit to confirm that entanglement was taking place, and that they were able to teleport photons in this way. Over 32 days, they sent millions of photons and found positive results in 911 cases. “We report the first quantum teleportation of independent single-photon qubits from a ground observatory to a low Earth orbit satellite—through an up-link channel— with a distance up to 1400 km,” says the Chinese team.

This is the first time that any object has been teleported from Earth to orbit, and it smashes the record for the longest distance for entanglement.

That’s impressive work that sets the stage for much more ambitious goals in the future. “This work establishes the first ground-to-satellite up-link for faithful and ultra-long-distance quantum teleportation, an essential step toward global-scale quantum internet,” says the team.

It also shows China’s obvious dominance and lead in a field that, until recently, was led by Europe and the U.S.—Micius would surely have been impressed. But an important question now is how the West will respond.

Ref:
Please, Log in or Register to view URLs content!
: Ground-to-satellite quantum teleportation
 

N00813

Junior Member
Registered Member
Bit late, interesting if true:

Please, Log in or Register to view URLs content!


Makers of TaihuLight Supercomputer Offer Commercial Version
Please, Log in or Register to view URLs content!
| June 23, 2017 18:33 CEST


One of the more unusual pieces of news at this year’s ISC High Performance conference was the announcement by the National Supercomputing Center in Wuxi that it will be offering a cut-down version of the Sunway TaihuLight supercomputer for more mainstream HPC users.

TaihuLight is the reigning champ on the TOP500 list, delivering a whopping 93 petaflops on the Linpack benchmark. Besides being the number one system, it’s other big claim to fame is that it is constructed almost entirely from Chinese-made componentry. In particular, the system is powered by the 260-core ShenWei processor, known as the SW26010. Each of TaihuLight’s 40,960 ShenWei chips delivers three teraflops of peak performance.

The commercial version they announced at ISC is called the Sunway Micro and is based a dual-socket SW26010 server node. The system is aimed at a broad spectrum of industrial and research applications including “deep learning, oil & gas exploration, climate modeling, etc.”



sunway-micro-board-790x386.png
Source: National Supercomputing Center in Wuxi



The two-processor design means each node delivers a very respectable six peak teraflops. Unlike the TaihuLight supercomputer, whose single-socket nodes were outfitted with a scant 32 GB of memory, the Sunway Micro can be equipped with 64 GB to 256 GB. That gives Micro buyers the option to have lot more local memory to feed these high-flying ShenWei chips. Each node is also equipped with 12 GB of local storage of undefined type and origin.

While talking with some of the folks at the Wuxi booth during the ISC exhibition, they revealed that the Micro nodes can be clustered together via a network based on InfiniBand technology, which apparently is similar, but not identical to the TaihuLight network implementaion. Given that these servers will be used in relatively small clusters, they didn’t have to develop a network for supercomputer-level scalability.

One of the most unusual aspects of the Sunway Micro is that it is being sold by the National Supercomputing Center in Wuxi. That might seem like an odd thing for a supercomputing center to do, given its public mission. But since the center supplies the system software and developer toolset for these ShenWei-based machines, they basically act as system integrators for the commercial offering. As for the TaihiLight, the Micro was developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC).

Software support includes C/C++ and Fortran compilers for the ShenWie, as well as supporting runtime libraries. For parallel software development, Wuxi includes MPI, OpenACC and Athread implementations targeted to the ShenWei platform. An integrated development environment, with a debugger and performance monitor, are also included.

Besides selling the standard version of the Micro, the Wuxi center will also provide customized solutions. Pricing for the system was not made public.
 

Hendrik_2000

Lieutenant General
Chinese city to launch ‘unhackable’ quantum network

Tests on system in Jinan in Shandong province complete and service for nearly 200 users to begin next month, state-run media report

PUBLISHED : Monday, 10 July, 2017, 7:02am

4af50bee-64be-11e7-badc-596de3df2027_1280x720_235917.JPG



China’s first citywide commercial communications system using “unhackable” quantum technology is expected to be up and running next month, mainland media reported on Sunday.

Tests on the system in Jinan in the east province of Shandong had been completed and the network would start operations next month to provide extremely secure communication for nearly 200 users, state-run China Central Television reported.

Zhou Fei, assistant to the director at the Jinan Institute of Quantum Technology, said the first users would be in the government, military, finance and electricity sectors.

“This is a milestone for quantum communication in China and the world,” CCTV quoted Zhou as saying.

Please, Log in or Register to view URLs content!


The quantum network uses particles of light to encrypt information. If a third party tries to intercept the information, the particles change characteristics, making it impossible to steal the information without alerting the network. It is understood to be impossible for any computer to decipher a message encrypted by a quantum key.
4b79d936-6488-11e7-badc-596de3df2027_1320x770_235917.JPG



China built its first large-scale quantum communication network in Hefei, Anhui province, in 2012, according to People’s Daily. Work finished last year on the world’s longest land-based quantum link between Beijing and Shanghai, while a number of other big cities including Wuhan, are also building their own quantum networks.

Though also as touted commercially viable, these systems were at least in part sharing existing optical fibre lines with traditional telecommunications networks. The “hybrid” structure might compromise security in some cases.

But the Jinan network was an “exclusive” system dedicated to quantum communications, CCTV reported. The information exchange between two users was protected by more than 4,000 qubits per second to achieve “absolute secrecy”.

Please, Log in or Register to view URLs content!


The network had more than 50 rounds of tests at terminals in Jinan government agencies and various Communist Party offices. The users were spread across several hundred square kilometres, and the test results were “satisfactory”, the report said.

China last month announced its quantum satellite has successfully distributed a pair of entangled photons to two stations on land.

Here is the video
 

antiterror13

Brigadier
Bit late, interesting if true:

Please, Log in or Register to view URLs content!


Makers of TaihuLight Supercomputer Offer Commercial Version
Please, Log in or Register to view URLs content!
| June 23, 2017 18:33 CEST


One of the more unusual pieces of news at this year’s ISC High Performance conference was the announcement by the National Supercomputing Center in Wuxi that it will be offering a cut-down version of the Sunway TaihuLight supercomputer for more mainstream HPC users.

TaihuLight is the reigning champ on the TOP500 list, delivering a whopping 93 petaflops on the Linpack benchmark. Besides being the number one system, it’s other big claim to fame is that it is constructed almost entirely from Chinese-made componentry. In particular, the system is powered by the 260-core ShenWei processor, known as the SW26010. Each of TaihuLight’s 40,960 ShenWei chips delivers three teraflops of peak performance.

The commercial version they announced at ISC is called the Sunway Micro and is based a dual-socket SW26010 server node. The system is aimed at a broad spectrum of industrial and research applications including “deep learning, oil & gas exploration, climate modeling, etc.”



sunway-micro-board-790x386.png
Source: National Supercomputing Center in Wuxi



The two-processor design means each node delivers a very respectable six peak teraflops. Unlike the TaihuLight supercomputer, whose single-socket nodes were outfitted with a scant 32 GB of memory, the Sunway Micro can be equipped with 64 GB to 256 GB. That gives Micro buyers the option to have lot more local memory to feed these high-flying ShenWei chips. Each node is also equipped with 12 GB of local storage of undefined type and origin.

While talking with some of the folks at the Wuxi booth during the ISC exhibition, they revealed that the Micro nodes can be clustered together via a network based on InfiniBand technology, which apparently is similar, but not identical to the TaihuLight network implementaion. Given that these servers will be used in relatively small clusters, they didn’t have to develop a network for supercomputer-level scalability.

One of the most unusual aspects of the Sunway Micro is that it is being sold by the National Supercomputing Center in Wuxi. That might seem like an odd thing for a supercomputing center to do, given its public mission. But since the center supplies the system software and developer toolset for these ShenWei-based machines, they basically act as system integrators for the commercial offering. As for the TaihiLight, the Micro was developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC).

Software support includes C/C++ and Fortran compilers for the ShenWie, as well as supporting runtime libraries. For parallel software development, Wuxi includes MPI, OpenACC and Athread implementations targeted to the ShenWei platform. An integrated development environment, with a debugger and performance monitor, are also included.

Besides selling the standard version of the Micro, the Wuxi center will also provide customized solutions. Pricing for the system was not made public.

great approach selling it to the public to get enough fund to develop more advanced system

It has 2 nodes, so it will have 12 peak teraflops. Just for a comparison, in June 2004 the top supercomputer the Earth Simulator had 40 peak terraflops
Please, Log in or Register to view URLs content!


and in 2001, the top supercomputer was ACSI White with 12 peak teraflops (the same performance as Sunway Micro) :):):)
Please, Log in or Register to view URLs content!
 

Equation

Lieutenant General
Who needs Baywatch life guard when you got this?;)

China launches self-driving patrol boats to rescue people from drowning

YI SHU NGJul 18, 2017
Robotic speedboats now ply the waters of a large lake in China, which has seen multiple deaths from drowning over the years.

The autonomous watercraft is equipped with GPS, cameras, and acoustic and infrared sensors.

It's aiding lifeguards at the Tian'e Lake in Hefei, eastern China, which receives tens of thousands of people each day during peak periods, according to park officials.

SEE ALSO:
Please, Log in or Register to view URLs content!




The boat is designed to detect "moving targets" in a lake that has seen 15 people drown in 2016, according to
Please, Log in or Register to view URLs content!
. 66 people drowned in total since the park along the lake was first inaugurated in 2004.

https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F537824%2F0123aec8-eb06-4ef8-9c62-36013638e29c.jpg

IMAGE:
WEIBO


https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F537858%2F5da083ad-1c64-4bb1-8a54-62796d77a50e.jpg

IMAGE:
HEFEI POLICE/WEIBO


https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F537827%2F6e0a193a-0cfc-4a3a-9da4-f30917bba04b.jpg

IMAGE:
WEIBO
It's part of a system of around 20 optical and infrared sensors built along the shore and a radio transmitter, which divides the 172-acre (70 hectare) lake into safe and dangerous sectors.

The boat will alert swimmers who stray into danger zones to stay away, and transmit the swimmer's location to the lake's management. Swimmers who can grab hold of the boat could be brought to safety.

"If someone struggles in the lake, the patrol boat can use sonar and other underwater detectors to track the location of the swimmer and call for help," Wang Xu, a branch director of Hefei police, told
Please, Log in or Register to view URLs content!
.

Aside from the boat, lifeguards in Tian'e Lake are also equipped with three drones, which can send life preservers, food and medical supplies to people in distress.

https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F537826%2Fbf6ddfa9-3dd2-4297-9778-a815d1fe4af6.png

IMAGE:
WEIBO
No one has drowned since the boat's trial first kicked off in November last year, according to Zhang Bao, deputy general manager at Anhui CAS-Huacheng Intelligent Technology, a company which helped develop the autonomous boat.

The boat is expected to ease the burden of some 33 lifeguards stationed at the lake, say local papers.

The autonomous lifeguard is the latest development in self-driving watercraft that has come out of China.

The country is currently building a
Please, Log in or Register to view URLs content!
for autonomous boats in the Pearl River Delta, and has developed unmanned
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
. But so far, these developments have been made for the military.

Outside of China, the UK Royal Navy is also testing autonomous patrol boats.

And in Ireland, officials
Please, Log in or Register to view URLs content!
autonomous drones for search-and-rescue operations along its Atlantic coast, while autonomous lifebuoys are being used to
Please, Log in or Register to view URLs content!
in the Mediterranean.

"In the future, most of the lifeguards will be replaced by the robot," said Zhang. "The boat...can be used to patrol rivers, reservoirs, lakes and seas -- not only for security purposes but also for environmental surveillance and data collection."

Please, Log in or Register to view URLs content!
 
Top